Current Trend in Information Extraction
Hauptseminar: Computerlinguistik, 2. Studienabschnitt
Leitung: Günter Neumann
Ort: Geb. C7 1, Besprechungsraum U.15
Zeit: Do 11 - 13 Uhr
Beginn: 2nd November
Geeignet für: B.Sc./M.Sc.
Abstract
In this course we will consider new methods and strategies
for
Information Extraction (IE) with a focus on Machine Learning and
Evaluation. We will start by presenting a brief overview of IE. We will
then present recent Machine Learning approaches for Named Entity
recognition,
relation extraction, and event extraction. This will also cover their
use in specific applications, e.g., bioinformatics. Beside the
presentation of these IE methods, we will also present
major
trends in evaluation of IE systems, e.g., Automatic Content Extraction
(ACE), BioCreative challenge.
Seminar language: English
Certificates
talk or talk + paper + participation (no less than 8 sessions).
Stellung im Studienplan
als Hauptseminar im B.Sc.: Regelstudienzeit 6. Semester;
as seminar in M.Sc. Programm: Standard time slot 2nd semester.
specialization area: LT
Credit points:
M.Sc.: talk only 4 LP (5LP); talk + paper 7 LP (9LP); B.Sc.: talk +
paper 7LP (9LP)
Schedule (Topics)
Paper Report
For the report writing, we will use a standard conference style which
is available for Latex
and MS-word. The ZIP file contains corresponding versions for the
format instructions. Important:
- The paper
lenght is restricted to eight (8)
pages.
- Deadline for
submission is the will be announced.
- Submission should be electronically made to my email address!
References
Session 1: IE overview
Session 2: The Automatic Content
Extraction (ACE) Program
Main links of AC:
ACE annotation tools:
Session 3: BioCreative Challenge -
Assessment for IE in biology
Main link to BioCreative:
Summary of BioCreative 1 challenge:
- Overview of
BioCreAtIvE: critical assessment of information extraction for biology.
Lynette Hirschman, Alexander Yeh, Christian Blaschke, Alfonso Valencia
BMC Bioinformatics 2005, 6(Suppl 1):S1 (24 May 2005)
[PDF]
- BioCreAtIvE
Task 1A: gene mention finding evaluation.
Alexander Yeh, Alexander Morgan, Marc Colosimo, Lynette Hirschman
BMC Bioinformatics 2005, 6(Suppl 1):S2 (24 May 2005)
[PDF]
- Evaluation of BioCreAtIvE
assessment of task 2.
Christian Blaschke, Eduardo Andres Leon, Martin Krallinger, Alfonso
Valencia
BMC Bioinformatics 2005, 6(Suppl 1):S16 (24 May 2005)
[PDF]
Session 4: Shallow Parsing for biomedical
text
Main papers:
- A Shallow Parser Based
on Closed-Class Words to Capture Relations in Biomedical Text.
Gondy Leroy, Hsinchun Chen, Jesse D. Martinez (2003)
[PDF]
- Extensible Shallow Parsing for Semantic Nets. J. Connell (2001)
[PDF]
- A language--independent shallow--parser compiler. A. Kinyon (2001)
[PDF]
Session 5: Named Entity recognition for
biomedical text
Main links:
Session 6: Complex Relation Extraction
with Applications to Biomedical IE
Main paper and additional resources:
- Simple Algorithms for Complex
Relation Extraction
with Applications to Biomedical IE. R. McDonald, F. Pereira, S.
Kulick, S. Winters, Y. Jin and P. White
[PDF]
- MALLET -
Advanced Machine
Learning for Language - a Java based ML Tool box
- Algorithm 457: finding all
cliques of an undirected graph. C. Bron and J. Kerbisch (1973)
[PDF]
Session 7: Preemptive IE
Main papers:
-
Preemptive
Information Extraction using Unrestricted Relation Discovery.
Yusuke Shinyama, Satoshi Sekine (2006)
[
PDF]
-
On-Demand
Information Extraction. Satoshi Sekine (2006)
[
PDF]
Session 8: Expressing Implicit Semantic
Relations without Supervision
Main papers:
-
Expressing implicit semantic
relations without supervision. Turney, P.D. (2006)
[PDF]
-
Similarity of semantic relations.
Turney, P.D. (2006)
[PDF]
Session 9: Paraphrase Extraction
Main papers:
- Unsupervised Paraphrase Acquisition
via Relation Discovery. Takaaki Hasegawa (2005)
[PDF]
- Paraphrase
Substitution for Recognizing
Textual Entailment. Wauter Bosma and Chris Callison-Burch (2006)
[PDF]
Session 10: Machine Learning for Question
Answering
Main papers:
- Language
Independent Answer Prediction from the Web. A. Figueroa and G.
Neumann (2006)
[PDF]
- Instance-Based Question
Answering: A Data Driven Approach. L. Lita and J. Carbonell
(2004)
[PDF]
Links
My ESSLLI 2004
course in Information Extraction
ACE - Automatic Content
Extraction
PBioCreative challenge
evaluation
Evaluating
Machine Learning for Information Extraction