[ESSLLI 2004] ESSLLI 2004 - The 16th European Summer School in
Logic, Language and Informatio

Homepage of the Course on

Intelligent Information Extraction

Günter Neumann and Feiyu Xu

Language Technology Lab
DFKI, Saarbrücken

Course Description

We will present the state-of-the-art in intelligent information extraction (IE). The lecture will be subdivided into four major topics: introduction, core technologies, machine learning (ML) methods and applications. We start with a historical overview and explain the different tasks and evaluation methods of IE (e.g., template filling, domain ontologies). We summarize the core IE functionality by contrasting rule-based and corpus-based system design. This will also cover advanced NLP aspects like integration of shallow and deep processing. Secondly, the participants will be faced with major IE challenges wrt. domain adaptivity, e.g., portability, and multi-linguality. Consequently, we then focus on advanced ML methods for the different IE tasks under various dimensions (supervised, unsupervised, multi-lingual). Finally, we present different exciting applications that embed IE as a major component, viz. open-domain question answering, text summarization, text data mining, and Semantic Web services. 

Course Overview

Part 1:   Introduction
Part 2:   Core Functionality
Part 3:   Machine Learning Approaches
               Subpart 3.1: Machine Learning for Named Entity
               Subpart 3.2: Machine Learning for Template Filling
Part 4:   Advanced Topics and Application
               Part 1: cross-lingual IR & semantics for IE
               Part 2: question-answering & semantic web

Course Material

  1. Introduction
    1. D. Appelt and D. Israel. Introduction to Information Extraction Technology. IJCAI-99 Tutorial.
    2. R. Grishman and B. Sundheim. Message Understanding Conference –6: A Brief History. In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen, June 1996.
    3. D. Harman and N. Chinchor. Introduction to Information Extraction. NIST homepage, last update 15-Mar-01.

  2. Core Technology (slides)
    1. FASTUS
    2. GATE
    3. Information Extraction Technologies for Germany Texts
    4. SPROUT
    5. Combine Shallow and Deep NLP for Information Extraction
  3. Machine Learning for Named Entity Recognition
    1. D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. nymble: a high-performance learning name-finder. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 194-201, Washington, D.C., 1997.
    2. Andrew Borthwick, John Sterling, Eugene Agichtein, and Ralph Grishman. 1998. Nyu: Description of the mene named entity system as used in muc-7. In Proceedings of the Seventh Message Understanding Conference (MUC-7).
    3. M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999.
    4. Roman Yangarber, Winston Lin and Ralph Grishman.Unsupervised Learning of Generalized Names. In Proceedings of the 19th International Conference on Computational Linguistics: COLING-2002 (2002) Taipei, Taiwan.
  4. Automatic Acquisition of Rules for Template Filling
    1. N. Kushmerick. Wrapper induction: Efficiency and Expressiveness, Artificial Intelligence, 2000.
    2. I. Muslea. Extraction Patterns for Information Extraction. AAAI-99 Workshop on Machine Learning for Information Extraction.
    3. Riloff, E. and R. Jones. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99) , 1999, pp. 474-479.
    4. R. Yangarber, R. Grishman, P. Tapanainen and S. Huttunen. Automatic Acquisition of Domain Knowledge for Information Extraction. In Proceedings of the 18th International Conference on Computational Linguistics: COLING-2000, Saarbrücken.
  5. Advanced Topics, e.g.,
    1. Language Technology and the Semantic Web
    2. Ontology-driven Information Extraction

Additional Stuff


Last modified: 15. June 2004 by GN