Information Extraction and Question-Answering Systems: Foundations and methods


Attentenzione: still not complete!


Doug Appelt's "Introduction to Information Extraction Technology". This is a very good introduction of information extraction technology. In my course I will follow Doug's structure of presenting IE.

Günter Neumann "Informationsextraktion." This is a short overview of current IE technology written in German. It is published in Klabunde et al (eds): Computerlinguistik und Sprachtechnologie - Eine Einführung. Spektrum Akademischer Verlag, Heidelberg, 2001, which I also use as course material.

Ricardo Baeza-Yates & Berthier Ribeiro-Neto: Modern Information Retrieval, Addison Wesley Longman Publishing Co. Inc., 1999.

Maria Pazienza (Ed.) "Information Extraction; Towards Scalable, Adaptable Systems", Lecture Notes in Artificial Intelligence, 1714, 1999.

Eugene Charniak  "Statistical Language Learning", MIT-Press, 1993.
A compact introduction into major aspects of stastical methods used for NLP.

Cristopher Manning & Heinrich Schütze "Foundations of Statistical Natural Language Processing", MIT-Press, 1999.
An extensive introduction into major aspects of stastical methods used for NLP.


Information extraction: Message Understanding Conference Proceedings, MUC-7

Answer extraction: TREC Conference series, in particular TREC-9

Part of Speech Tagging

Cristopher Manning & Heinrich Schütze "Foundations of Statistical Natural Language Processing", MIT-Press, 1999, chapter 10.

Eugene Charniak  "Statistical Language Learning", MIT-Press, 1993, chapter 3, 4.

Eric Brill "Tranformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging", Computational Linguistics, Volume 21, Number 14, 1995.

Adwait Ratnaparkhi "A Maximum Entropy Part-Of-Speech Tagger.",  In Proceedings of the Empirical Methods in Natural Language
Processing Conference, May 17-18, 1996. University of Pennsylvania

Named Entity Recognition

Bikel, Miller, Schwartz and Weischedel, Nymble: a High-Performance Learning Name-finder, In proceedings
of ANLP-1997, Washington, DC, pages 195-201.

A. Borthwick, A Maximum Entropy Approach to Named Entity Recognition, Ph.D. (1999)  New York
University.  Department of Computer Science, Courant Institute.

Valentin Tablan, Cristian Ursu, Hamish Cunningham et. al, Software Architecture for Language Engineering,
Slides, they have a nice view on NE, which I also will make use in my course

Robust parsing of unrestricted German text

G. Neumann: The SMES system: Robust parsing of unrestricted German text. Homepage & free Software

Scientific papers about SMES (all paper are downloadable from my publication list)

T. Declerck and G. Neumann  A Cascaded Shallow Approach to Reference Resolution
In Proceedings of EuroConference on Recent Advances in NLP, RANLP-2001, Tzigov Chark, Bulgaria, 5-7 September 2001.

G. Neumann, C. Braun and J. Piskorski: A Divide-and-Conquer Strategy for Shallow Parsing of German Free Texts
In proceedings of ANLP-2000, Seattle, Washington, pages 239-246

G. Neumann and G. Mazzini: Domain adaptive information extraction. Technical Report, 1999.

G. Neumann, R. Backofen, J. Baur, M. Becker, C. Braun: An Information Extraction Core System for Real World German Text Processing. In Proceedings of 5th ANLP, Washington, March, 1997.

G. Neumann: Methoden zur intelligenten Informationsextraktion im Internet. In Proceedings of 20th European Congress Fair for Technical Communication, ONLINE '97, Hamburg,, 1997.

Information extraction learning

Ion Muslea, "Extraction Patterns for Information Extraction Tasks: A Survey", AAAI-99.

M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam and S. Slattery. "Learning to Extract Symbolic Knowledge from the WWW", AAAI-98. (check also: CMU World Wide Knowledge Base (Web->KB) project).

Finkelstein-Landau and Morin, Extracting Semantic Relationships between Terms: Supervised vs. Unsupervised Methods. In Actes, International Workshop on Ontological Engineering on the Global Information Infrastructure, pages 71-80, Dagstuhl Castle, Germany, 1999.

M. Califf and R Mooney, "Relational Learning of Pattern-Match Rules for Information Extraction", Proceedings of the AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 1998.

S. Soderland, "Learning Text Analysis Rules for Domain Specific Natural Language Processing" Phd Thesis,
University of Massachusetts Amherst,1997.

Answer extraction

NIST Special Publication 500-249: The Ninth Text REtrieval Conference, (TREC 9).
Here you can find papers about all answer extraction systems which particpated during the TREC-9
and an overview paper written by Ellen Voorhees.