Určování autorství anonymních textů na základě automaticky nalezených charakteristických znaků

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Title in English Determining Authorship of Anonymous Texts Based on Automatically Discovered Characteristic Features
Authors

RYGL Jan

Year of publication 2011
MU Faculty or unit

Faculty of Informatics

Citation
Description Master's thesis. The work is based on the most successful methods for determining authorship of anonymous documents. We combine, optimize and revise these methods and create new techniques for three main tasks: Automatic assignment of the authorship with the given set of documents, Verification of the authorship of the document by selected author, Clustering of documents according to their authorships. Our implemented algorithms are tested on the Czech documents, but system is modular and if we remove or replace some language-dependent components, we can process documents written in any language. Everything is coded in the Python. The system contains tools for preprocessing of Czech data and for management of stored documents in the PostgreSQL database. The thesis also makes empirical observations of performance of the most popular methods for determining authorship of Czech documents. Most measurements were performed on English texts (books, newspaper articles, rarely e-mails) and until now the statistics for Czech data were missing.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info