SkELL Corpora as a Part of the Language Portal Sonaveeb: Problems and Perspectives

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

KOPPEL Kristina KALLAS Jelena KHOKHLOVÁ Maria SUCHOMEL Vít BAISA Vít MICHELFEIT Jan

Year of publication 2019
Type Article in Proceedings
Conference Proceedings of the 6th Biennial Conference on Electronic Lexicography
MU Faculty or unit

Faculty of Informatics

Citation
Web Konferenční sborník
Keywords GDEX; SkELL; learner corpus; Estonian; Russian
Description The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sonaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sonaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info