Classification of Errors in Text

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	JAKUBÍČEK Miloš BUŠTA Jan HLAVÁČKOVÁ Dana PALA Karel
Year of publication	2009
Type	Article in Proceedings
Conference	RASLAN 2009 : Recent Advances in Slavonic Natural Language Processing
MU Faculty or unit	Faculty of Informatics
Citation
Web	http://nlp.fi.muni.cz/raslan/2009/
Field	Linguistics
Keywords	errors in text; classification of errors
Description	This paper presents two classifications of errors in Czech texts. As a basic resource we use the corpus (Chyby -- Errors) which has been continuously developed from 1999--2000 ([1]). The corpus text contains various kinds of errors such as spelling, typographical, grammatical, semantic, lexical, and stylistic ones. They have been corrected manually and annotated according to the classification of errors (annotation scheme) developed for this purpose. For the annotation we implemented a tool named WinCorr. We mention the first annotation scheme and discuss the second one which has been designed recently to obtain more adequate description of the errors occurring in texts. We also discuss the principles on which both classifications are based.
Related projects:	Centrum komputační lingvistiky Prostředky tvorby komplexní báze znalostí pro komunikaci se sémantickým webem v přirozeném jazyce