A Lexicographer-Friendly Association Score

Warning

This publication doesn't include Faculty of Sports Studies. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

RYCHLÝ Pavel

Year of publication 2008
Type Article in Proceedings
Conference RASLAN 2008
MU Faculty or unit

Faculty of Informatics

Citation
Web https://nlp.fi.muni.cz/raslan/2008/papers/13.pdf
Field Linguistics
Keywords corpus linguistics tools; grammatical relations in the Sketch Engine; the logDice score
Description Finding collocation candidates is one of the most important and widely used feature of corpus linguistics tools. There are many statistical association measures used to identify good collocations. Most of these measures define a formula of a association score which indicates amount of statistical association between two words. The score is computed for all possible word pairs and the word pairs with the highest score are presented as collocation candidates. The same scores are used in many other algorithms in corpus linguistics. The score values are usually meaningless and corpus specific, they cannot be used to compare words (or word pairs) of different corpora. But endusers want an interpretation of such scores and want a score’s stability. This paper present a modification of a well known association score which has a reasonable interpretation and other good features.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info