Fast syntactic searching in very large corpora for many languages

Investor logo
Investor logo
Investor logo

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

JAKUBÍČEK Miloš RYCHLÝ Pavel KILGARRIFF Adam MCCARTHY Diana

Year of publication 2010
Type Article in Proceedings
Conference PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation
MU Faculty or unit

Faculty of Informatics

Citation
Field Informatics
Keywords corpus search; large corpora; CQL; syntactic search
Description For many linguistic investigations, the first step is to find examples. In the 21st century, they should all be found, not invented. Thus linguists need flexible tools for finding even quite rare phenomena. To support linguists well, they need to be fast even where corpora are very large and queries are complex. We present extensions to the CQL ("Corpus Query Language") for intuitive creation of syntactically rich queries, and demonstrate that they can be computed quickly within our tool even on multi-billion word corpora.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info