Automatic Identification of Legal Terms in Czech Law Texts

Pala,  Karel; Rychlý,  Pavel; Šmerk,  Pavel

Automatic Identification of Legal Terms in Czech Law Texts

Varování

Publikace nespadá pod Fakultu sportovních studií, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Název česky	Automatická identifikace právních termínů v českých právních textech
Autoři	PALA Karel RYCHLÝ Pavel ŠMERK Pavel
Rok publikování	2010
Druh	Článek ve sborníku
Konference	Semantic Processing of Legal Texts
Fakulta / Pracoviště MU	Fakulta informatiky
Citace	PALA, Karel, Pavel RYCHLÝ a Pavel ŠMERK. Automatic Identification of Legal Terms in Czech Law Texts. In Semantic Processing of Legal Texts. Berlin: Springer, 2010, s. 83-94. ISBN 978-3-642-12836-3. Dostupné z: https://dx.doi.org/10.1007/978-3-642-12837-0_5.
Doi	http://dx.doi.org/10.1007/978-3-642-12837-0_5
Obor	Jazykověda
Klíčová slova	terminology extraction; natural language processing; legal language
Popis	Law texts including constitution, acts, public notices and court judgements form a huge database of texts. As many texts from small domains, the used sublanguage is partially restricted and also different from general language (Czech). As a starting collection of data, the legal database Lexis containing approx. 50,000 Czech law documents has been chosen. Our attention is concentrated mostly on noun groups, which are the main candidates for law terms. We were able to recognize 3992 such different noun groups in the selected text samples. The paper also presents results of the morphological analysis, lemmatization, tagging, disambiguation, and the basic syntactic analysis of Czech law texts as these tasks are crucial for any further sophisticated natural language processing. The verbs in legal texts have been explored preliminarily as well. In this respect, we are trying to explore how the linguistic analysis can help in identification of the semantic nature of law terms.
Související projekty:	Centrum komputační lingvistiky Právní e-slovník - PES