WikiPedia Mining


Computational Lingustics
Master Programme
Winter Semester 2009



The submission deadline for the written report is 1st April 2010. You may submit an electronic or a printed copy of your report.


General Information

Moderator: Günter Neumann


In this seminar, we will study state of the art methods and technology for mining meaning from Wikipedia. After a short introduction into the topic of text based information management, we will review different natural language problems in which Wikipedia has been used (e.g., semantic relatedness, word sense disambiguation, coreference resolution). In the second part of the seminar, we will study application-oriented approaches, e.g., Wikipedia-oriented information extraction and question answering, and Wikipedia-based construction of ontologies.

Seminar Language: English

Available Certificate Modalities:

Placement in Study Programme:



Session Number
Organisational meeting
Günter Neumann
Günter Neumann
Günter Neumann
Xuchen Yao
Daniel Müller
Dominikus Wetzel
Andreas Kirkedal
Gerald Schoch
no topic
Happy Christmas!
Joo-Eon Feit
Maria Sukhareva
Kang Ji
Annemarie Friedrich
Miriam Käshammer
TzuYi Kuo
Mihai Grigore

*We have to move to Friday because I am out of office on Monday.

Please click on the session number to jump to the corresponding references. If available, the topics of the presentations will be linked to the slides of the presentations.



Overview and foundations

Olena Medelyan, David Milne, Catherine Legg, Ian H. Witten (2008), Mining Meaning From Wikipedia, Working Paper: 11/2008, September 2008, University of Waikato, New Zealand

Semantic relatedness

Strube, Michael; Ponzetto, Simone Paolo (2006). WikiRelate! Computing Semantic Relatedness Using Wikipedia. In: AAAI '06, pp.1419-1424.

Simone Paolo Ponzetto and Michael Strube (2007). Knowledge derived from Wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research, 30:181-212.

Explicit Semantic Analysis

Evgeniy Gabrilovich and Shaul Markovitch. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: IJCAI '07, pp. 1606-1611.

Evgeniy Gabrilovich. Feature Generation for Textual Information Retrieval Using World Knowledge. Ph.D. Thesis, Department of Computer Science, Technion Israel Institute of Technology, Haifa, Israel, 2006.

Learning to link in Wikipedia

Milne, D. and Witten, I.H. (2008) Learning to link with Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2008), Napa Valley, California.

Milne, D. and Witten, I.H. (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of the first AAAI Workshop on Wikipedia and Artificial Intelligence (WIKIAI'08), Chicago, I.L.

Mihalce and Csomai (2007) Wikify! Linking Documents to Encyclopedic Knowledge, CIKM’07, November 6–8, 2007, Lisboa, Portugal.

Disambiguation of named entities

Bunescu and Pasca (2006) Using Encyclopedic Knowledge for Named entity Disambiguation, EACL

Cucerzan (2007) Large-Scale Named Entity Disambiguation Based on Wikipedia Data, EMNLP.

Anthony Fader, Stephen Soderland, and Oren Etzioni (2009) Scaling Wikipedia-based Named Entity Disambiguation to Arbitrary Web Text, WIKIAI.

Automatising the Learning of Lexical Patterns

Maria Ruiz-Casado, Enrique Alfonseca and Pablo Castells (200) FromWikipedia to Semantic Relationships: a Semi-automated Annotation Approach, CEUR-WS, Vol. 26

Maria Ruiz-Casado Enrique Alfonseca Pablo Castells (2007) Automatising the Learning of Lexical Patterns: an Application to the Enrichment of WordNet by Extracting Semantic Relationships from Wikipedia, Data Knowl. Eng. 61(3): 484-499

Query expansion with Wikipedia

Milne, D., Witten, I.H. and Nichols, D.M. (2007). A Knowledge-Based Search Engine Powered by Wikipedia. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM'2007), Lisbon, Portugal.

POTTHAST, M., STEIN, B., AND M. A. ANDERKA (2008). Wikipedia-Based Multilingual Retrieval Model. In Proceedings of the 30th European Conference on IR Research, ECIR'08, Glasgow.

Wiktionary and MT

Oren Etzioni, Kobi Reiter, Stephen Soderland, and Marcus Sammer (2007). Lexical Translation with Application to Image Search on the Web. Machine Translation Summit XI, Copenhagen, Denmark, Europe.

Stephen Soderland, Christopher Lim, Mausam, Bo Qin, Oren Etzioni, and Jonathan Pool (2009). Lemmatic Machine Translation. Machine Translation Summit XII. Ottawa, Ontario, Canada.

Wikipedia and Wiktionary for CLIR

Christof Müller and Iryna Gurevych (2009). Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval. Evaluating Systems for Multilingual and Multimodal Information Access, LNCS Series, Springer.

Named Entity Recognition

J. Nothman, T. Murphy, and J. R. Curran (2009). Analysing Wikipedia and Gold Standard Corpora for NER Training. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics.

Joel Nothman (2008). Learning Named Entity Recognition from Wikipedia. Honours Thesis , School of Information Technologies, University of Sydney, 2008.

Relation Extraction

Yulan Yan, Naoaki Okazaki, Yutaka Matsuo, Zhenglu Yang and Mitsuru Ishizuka (2009). Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web. ACL, 2009.

Yulan Yan, Yutaka Matsuo, and Mitsuru Ishizuka (2009). An Integrated Approach for Relation Extraction from Wikipedia Texts, CAW, 2009, Madrid, Spain, Europe.

Gang Wang, Yong Yu and Haiping Zhu (2007). PORE: Positive-Only Relation Extraction from Wikipedia Text. The 6th International Semantic Web Conference(ISWC 2007).

Ontology extraction

Yago Home page.

Fabian M. Suchanek, Gjergji Kasneci and Gerhard Weikum (2007). Yago - A Core of Semantic Knowledge. 16th international World Wide Web conference (WWW 2007).

Gerard de Melo, Fabian M. Suchanek, Adam Pease (2008). Integrating YAGO into the Suggested Upper Merged Ontology. 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2008).

The investigation of Wikipedia Revision History

Rani Nelken and Elif Yamangil (200). Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms. WikiAI08

Elif Yamangil and Rani Nelken (2008). Mining Wikipedia Revision Histories for Improving Sentence Compression. ACL-HLT 2008


Written Report

Students enrolled in the Master's programme can choose to submit a written report (see available certificate modalities). The length of the written report is restricted to eight pages, disregarding bibliographical sources. For this purpose, the linked conference-style template should be used (available for Latex and MS Word). The submission deadline is 1st April 2010. The written report should have the the style of conference proceedings. We expect you to digest the material related to your topic and perform further research. In your report, you should add value to the available information by comparing, criticizing, and highlighting plus points. We want to encourage you to think and develop your own opinion, and will disapprove of copy-pasting. If you have questions on the written report, we will be happy to help you.

You can turn in your report in electronic or print form. Electronic copies should be submitted via e-mail to the following addresses: