Current Trend in Information Extraction

Hauptseminar: Computerlinguistik, 2. Studienabschnitt

Leitung: Günter Neumann
Ort: Geb. C7 1, Besprechungsraum U.15
Zeit: Do 11 - 13 Uhr
Beginn: 2nd November
Geeignet für: B.Sc./M.Sc.

Abstract

In this course we will consider new methods and strategies for Information Extraction (IE) with a focus on Machine Learning and Evaluation. We will start by presenting a brief overview of IE. We will then present recent Machine Learning approaches for Named Entity recognition, relation extraction, and event extraction. This will also cover their use in specific applications, e.g., bioinformatics. Beside the presentation of these IE methods,  we will also present major trends in evaluation of IE systems, e.g., Automatic Content Extraction (ACE), BioCreative challenge.
Seminar language: English

Certificates
talk or talk + paper + participation (no less than 8 sessions).

Stellung im Studienplan
als Hauptseminar im B.Sc.: Regelstudienzeit 6. Semester;
as seminar in M.Sc. Programm: Standard time slot 2nd semester. specialization area: LT

Credit points:
M.Sc.: talk only 4 LP (5LP); talk + paper 7 LP (9LP); B.Sc.: talk + paper 7LP (9LP)

Schedule (Topics)

02.11.2006
Initial
Theme discussion
Speaker
Presentation





09.11.2006
Session 1
IE overview
Günter Neumann
IE-overview.pdf
16.11.2006
No Session
No Topic


23..11.2006
Session 2
The Automatic Content Extraction (ACE) Program
Akira Kakinohana
ACE.pdf
30.11.2006
Session 3
BioCreative Challenge - Assessment for IE in biology
Jana Besser
Caroline Maginot
BioCreative-1.pdf
BioCreative2.pdf
06.12.2006
Session 4
Shallow Parsing for biomedical text
Philip Kellner
Stefan Kazalski
postponed to 15.02.2007
ClosedClassParser.pdf
14.12.2006
No Session
No Topic


21.12.2006
Session 5
Named Entity recognition for biomedical text
Achmad Yani
Yu Fu
BioNER.pdf
HMMBioNER.pdf
11.01.2007
Session 6
Complex Relation Extraction with Applications to Biomedical IE
Michaela Regneri
Niko Felger
FindingClieques.pdf
ExtractComplexRelations.pdf
18.01.2007
Session 7
Preemptive IE Andrea Heyl OnDemand&PreemptiveIE.pdf
25.01.2007
Session 8
Expressing Implicit Semantic Relations without Supervision
Yu Chen
Lu Zhang
SemanticSimilarity.pdf
UnsupervisedRelSim.pdf
01.02.2007
Session 9
Paraphrase Extraction
Teresa Herrmann
Anna Mündelein
ParaExtrRTE.pdf
UnsupervisedParaExtr.pdf
08.02.2007
Session 10
Machine Learning for Question Answering
Olga Cander
Michael Wirth
 
AnswerPrediction.pdf
15.02.2007
Session 11
Shallow Parsing for biomedical text
Final meeting and discussion
Philip Kellner
ALL


Paper Report

For the report writing, we will use a standard conference style which is available for Latex and MS-word. The ZIP file contains corresponding versions for the format instructions. Important:

References

Session 1: IE overview

Session 2: The Automatic Content Extraction (ACE) Program

Main links of AC:
ACE annotation tools:

Session 3: BioCreative Challenge - Assessment for IE in biology

Main link to BioCreative:
Summary of BioCreative 1 challenge:

Session 4: Shallow Parsing for biomedical text

Main papers:
  1.  A Shallow Parser Based on Closed-Class Words to Capture Relations in Biomedical Text. Gondy Leroy, Hsinchun Chen, Jesse D. Martinez (2003)
    [PDF]
  2. Extensible Shallow Parsing for Semantic Nets. J. Connell (2001)
    [PDF]
  3. A language--independent shallow--parser compiler.  A. Kinyon (2001)
    [PDF]

Session 5: Named Entity recognition for biomedical text

Main links:

Session 6: Complex Relation Extraction with Applications to Biomedical IE

Main paper and additional resources:
  1. Simple Algorithms for Complex Relation Extraction with Applications to Biomedical IE. R. McDonald, F. Pereira, S. Kulick, S. Winters, Y. Jin and P. White
    [PDF]
  2. MALLET - Advanced Machine Learning for Language - a Java based ML Tool box
  3. Algorithm 457: finding all cliques of an undirected graph. C. Bron and J. Kerbisch (1973)
    [PDF]

Session 7: Preemptive IE

Main papers:

Session 8: Expressing Implicit Semantic Relations without Supervision

Main papers:
  1. Expressing implicit semantic relations without supervision. Turney, P.D. (2006)
    [PDF]

  2. Similarity of semantic relations. Turney, P.D. (2006)
    [PDF]

Session 9: Paraphrase Extraction

Main papers:

Session 10: Machine Learning for Question Answering

Main papers:

Links

My ESSLLI 2004 course in Information Extraction

ACE - Automatic Content Extraction

PBioCreative challenge evaluation

Evaluating Machine Learning for Information Extraction