Building Evaluation Dataset for Textual Entailment in Czech

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	NEVĚŘILOVÁ Zuzana
Year of publication	2012
Type	Article in Proceedings
Conference	Sixth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2012
MU Faculty or unit	Faculty of Informatics
Citation
web	https://nlp.fi.muni.cz/raslan/2012/paper03.pdf
Field	Informatics
Keywords	textual entailment; evaluation data set; Czech language; paraphrasing
Description	Recognizing textual entailment (RTE) is a subfield of natural language processing (NLP). Currently several RTE systems exist in which some of the subtasks are language independent but some are not. Moreover, large datasets for evaluation are prepared almost exclusively for English language. In this paper we describe methods for obtaining test dataset for RTE in Czech. We have used methods for extracting facts from texts based on corpus templates as well as syntactic parser. Moreover, we have used reading comprehension tests for children and students. The main contribution of this article is the classification of “difficulty levels” for particular RTE questions.
Related projects:	Temporální aspekty znalostí a informací Projekt LINDAT-Clarin - Vybudování a provoz českého uzlu pan-evropské infrastruktury pro výzkum