Between Comparable and Parallel: English-Czech Corpus from Wikipedia

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

ŠTROMAJEROVÁ Adéla BAISA Vít BLAHUŠ Marek

Year of publication 2016
Type Article in Proceedings
Conference RASLAN 2016 Recent Advances in Slavonic Natural Language Processing
MU Faculty or unit

Faculty of Informatics

Citation
web https://nlp.fi.muni.cz/raslan/2016/paper03-Stromajerova_Baisa_Blahus.pdf
Field Informatics
Keywords parallel corpora; comparable corpora; Wikipedia
Description We describe the process of creating a parallel corpus from Czech and English Wikipedias using methods which are language independent. The corpus consists of Czech and English Wikipedia articles, the Czech ones being translations of the English ones, is aligned on sentence level and is accessible in Sketch Engine corpus manager.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info