Information Extraction in the Biomedical Domain

 

Hauptseminar
Computational Lingustics
Advanced Bachelor Programme / Master Programme
Summer Semester 2008


 

News

The submission deadline for the written report is October the 1st, 2008, 23:59. You may submit an electronic or a printed copy of your report.

 

General Information

Moderator: Günter Neumann
Tutor: Alejandro Pironti

Abstract:

Biomedical information extraction (IE) makes use of terminologies, statistics and natural language processing to transform unstructured text into a tabular form which can be further processed. This seminar will provide an introduction in which essential biological knowledge will be refreshed, and an overview on IE will be given. Thereafter, the semiar will review available IT resources to support IE in the biomedical domain (terminologies from databased and ontologies, corpora) as well as different strategies to perform the steps that lead to IE. Last but not least, we will discuss state-of-the-art solutions to perform IE as well as their assessment.

Seminar Language: English

Available Certificate Modalities:

Placement in Study Programme:

 

Schedule

Session Number
Date
Topic
Speaker
1
21.04.2008
Organisational meeting
2
28.04.2008
Biological Foundations
Alejandro Pironti
3
05.05.2008
Biological Foundations / IT Ressources
Alejandro Pironti
4
19.05.2008
IT Ressources / IE Overview
Alejandro Pironti / Günter Neumann
5
26.05.2008
Gene Name Identification / Mentioning at BioCreative Challenge 2
Qian Sai
6
09.06.2008
Gene Name Normalization at BioCreative Challenge 2
Stefan Fischer
7
16.06.2008
Relationship Extraction: PPI-IPS at BioCreative Challenge 2
Danielle Ben-Gera
8
Cancelled
Text Classification at BioCreative Challenge 2: IAS, and Text Tilling
Max Jakob
9
07.07.2008
Question Answering
Joo-Eun Feit
10
14.07.2008
Literature-based discovery
Stefan Kazalski

Please click on the session number to jump to the corresponing references. If available, the topics of the presentations will be linked to the slides of the presentations.

 

References

Session 5: Gene Name Identification / Mentioning at BioCreative Challenge 2

R. K. Ando (2007), BioCreative II Gene Mention Tagging System at IBM Watson, Proceedings of the Second Biocreative Challenge Evaluation Workshop.

Y.M. Chang, C.J. Kuo, H.S. Huang, Y.S. Lin and C.N. Hsu (2007), Analysis and Enhancement of Conditional Random Fields Gene Mention Taggers in BioCreative II Challenge Evaluation, Short Paper Proceedings of the Second International Symposium on Languages in Biology and Medicine.

Further mandatory references available in print form. Please contact the seminar organisers.

Session 6: Gene Name Normalization at BioCreative Challenge 2

J. Hakenberg, L. Royer, C. Plake, H. Strobelt, and M. Schroeder (2007), Me and my friends: gene mention normalization with background knowledge. Proceedings of the Second BioCreative Challenge Evaluation Workshop, 141-144.

D. Hanisch, K. Fundel, H.T. Mevissen, R. Zimmer, and J. Fluck
(2005), ProMiner: rule-based protein and gene entity recognition.
BMC Bioinformatics. 6(Suppl 1): S14.

Further mandatory references available in print form. Please contact the seminar organisers.

Session 7: Relationship Extraction: PPI-IPS at BioCreative Challenge 2

R. Sætre, K. Yoshida, A. Yakushiji, Y. Miyao, Y. Matsubayashi, and T. Ohta, AKANE System: Protein-Protein Interaction Pairs in the BioCreAtIvE2 Challenge, PPI-IPS subtask.

T. Ninomiya, Y. Tsuruoka, Y. Miyao, K. Taura, and J. Tsujii (2005), Fast and Scalable HPSG Parsing

Further mandatory references available in print form. Please contact the seminar organisers.

Session 8: Text Classification at BioCreative Challenge 2: IAS, and Text Tilling

M. Lan, C. L. Tan, and J. Su, A Term Investigation and Majority Voting for
Protein Interaction Article Sub-task 1 (IAS)
. Proceedings of the Second BioCreative Challenge Evaluation Workshop, 183-185.

R. Tzong-Han Tsai, H.C. Hung, H.-J. Dai, Y.W. Lin, and W.-L. Hsu (2008), Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles. BMC Bioinformatics, 9(Suppl 1): S3.

R. Tzong-Han Tsai, H.C. Hung, H.-J. Dai, Y.W. Lin, and W.-L. Hsu (2007), Protein-protein interaction abstract identification with contextual bag of words. Short Paper Proceedings of the Second International Symposium on Languages in Biology and Medicine.

A. Figueroa and G. Neumann (2007), Identifying Protein-Protein interactions in biomedical publications. Proceedings of the Second BioCreAtIvE Challenge Evaluation Workshop, 217-225.

Text Tilling

Further mandatory references available in print form. Please contact the seminar organisers.

Session 9: Question Answering

P. Zweigenbaum (2003), Question answering in biomedicine. Procedings of Natural Language Processing for Question Answering.

J. Lin and D. Demner-Fushman (2007), Answering Clinical Questions with Knowledge-Based and Statistical Techniques. Computational Linguistics, 33(1):63-103.

H. Yu and D. Kaufman (2007), A cognitive evaluation of four online search engines for answering definitional questions posed by physicians. Pacific Symposium on Biocomputing 12: 328-339.

Session 10: Literature-Based Discovery

Arrowsmith 3.0

BITOLA

LitLinker

 

Written Report

Students enrolled in the Master's programme can choose to submit a written report (see available certificate modalities). Submission of a report for B.Sc. students is mandatory. The length of the written report is restricted to eight pages, disregarding bibliographical sources. For this purpose, the linked conference-style template should be used (available for Latex and MS Word). The submission deadline is October the 1st, 2008, 23:59. The written report should have the the style of conference proceedings. We expect you to digest the material related to your topic and perform further research. In your report, you should add value to the available information by comparing, criticizing, and highlighting plus points. We want to encourage you to think and develop your own opinion, and will disapprove of copy-pasting. If you have questions on the written report, we will be happy to help you.

You can turn in your report in electronic or print form. Electronic copies should be submitted via e-mail to the following addresses: neumann@dfki.de and s9alpiro@stud.uni-saarland.de.

 

 

Links