Between Comparable and Parallel: English-Czech Corpus from Wikipedia
Authors | |
---|---|
Year of publication | 2016 |
Type | Article in Proceedings |
Conference | RASLAN 2016 Recent Advances in Slavonic Natural Language Processing |
MU Faculty or unit | |
Citation | |
Web | https://nlp.fi.muni.cz/raslan/2016/paper03-Stromajerova_Baisa_Blahus.pdf |
Field | Informatics |
Keywords | parallel corpora; comparable corpora; Wikipedia |
Description | We describe the process of creating a parallel corpus from Czech and English Wikipedias using methods which are language independent. The corpus consists of Czech and English Wikipedia articles, the Czech ones being translations of the English ones, is aligned on sentence level and is accessible in Sketch Engine corpus manager. |
Related projects: |