In English

Unsupervised Disambiguation of Abstract

Oscar Kalldal ; Maximilian Ludvigsson
Göteborg : Chalmers tekniska högskola, 2018. 50 s.
[Examensarbete på avancerad nivå]

Disambiguating natural text is the task of choosing the correct meaning among several possible interpretations. This thesis focus on disambiguating parse trees created by Grammatical Framework — a formal language that represent meaning of natural language sentences with abstract syntax trees in order to do machine translation. Since one tree represents a meaning, for every sentence there exists several interpretations for which the most probable one should be chosen.

In order to achieve this, a language model on trees is defined. This is then used to compare possible trees and choose the one with the highest probability. In order to estimate the parameters of the model, the probability of the different meanings behind a word needs to be estimated. This is done using the Expectation Maximization algorithm.

Experiments are done on seven different languages to show that the method is generalizable. Different smoothing techniques as well as different dictionaries are evaluated. A novel merged Wordnet is constructed in order to avoid sparseness.

The method is evaluated by doing word sense disambiguation (a subtask of tree disambiguation) on standard data sets. The model is shown to be comparable to other unsupervised methods in the SemEval 2015.

Nyckelord: natural language processing, grammatical framework, language models, expectation maximization

Publikationen registrerades 2018-06-27. Den ändrades senast 2018-06-27

CPL ID: 255307

Detta är en tjänst från Chalmers bibliotek