Publikation

Four Methods for Supervised Word Sense Disambiguation

Kinga Schumacher

In: Springer (Hrsg.). Proceedings of the 12th International Conference on Applications of Natural Language to Information Systems (NLDB) June 27-29, 2007, CNAM, Paris, France. International Conference on Applications of Natural Language to Information Systems (NLDB) Springer Seiten 317-328 LNCS 4592 6/2007.

Abstrakt

Abstract. Word sense disambiguation is the task to identify the intended meaning of an ambiguous word in a certain context, one of the central problems in natural language processing. This paper describes four novel supervised disambiguation methods which adapt some familiar algorithms. They built on the Vector Space Model using an automatically generated stop list and two different statistical methods of finding index terms. These proceedings allow a fully automated and language independent disambiguation. The first method is based upon Latent Semantic Analysis, an automatic indexing method employed for text retrieval. The second one disambiguates via co-occurrence vectors of the target word. Disambiguation relying on Naive Bayes uses the Naive Bayes Classifier and disambiguation relying on SenseClusters1 uses an unsupervised word sense discrimination technique. These methods were implemented and evaluated to experience their performance, to compare the different approaches and to draw conclusions about the main characteristic of supervised disambiguation. The results show that the classification approach using Naive Bayes is the most efficient, scalable and successful method.

Projekte

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence