Effective Corpus Virtualization

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

JAKUBÍČEK Miloš RYCHLÝ Pavel KILGARRIFF Adam

Year of publication 2014
Type Article in Proceedings
Conference Challenges in the Management of Large Corpora (CMLC-2)
MU Faculty or unit

Faculty of Informatics

Citation
Web http://corpora.ids-mannheim.de/cmlc.html
Field Informatics
Keywords corpus; corpus linguistics; virtualization; indexing; database
Attached files
Description In this paper we describe an implementation of corpus virtualization within the Manatee corpus management system. Under corpus virtualization we understand logical manipulation with corpora or their parts grouping them into new (virtual) corpora. We discuss the motivation for such a setup in detail and show space and time efficiency of this approach evaluated on a 11 billion word corpus of Spanish.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info