Diverse queries and feature type selection for plagiarism discovery: Notebook for PAN at CLEF 2013

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

SUCHOMEL Šimon KASPRZAK Jan BRANDEJS Michal

Year of publication 2013
Type Article in Proceedings
Conference 2013 Cross Language Evaluation Forum Conference, CLEF 2013, CEUR Workshop Proceedings Volume 1179
MU Faculty or unit

Faculty of Informatics

Citation
Web http://ceur-ws.org/Vol-1179/
Field Informatics
Keywords suspicious document; plagiarism detection; search engine; source retrieval; stop word; text alignment; contextual n gram; word n gram; representative sentence; overlapping detection; snippet similarity; global postprocessing
Description This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the Detailed Comparison task, we discuss feature type selection and global postprocessing. Resulting performance is significantly better with the described modifications, and further improvement is still possible.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info