Language Technology Lab
About me
I am a researcher and IT professional with broad, long-term experience in research, software
development and consulting for industry and the European
institutions. My current research focus lies in natural language processing:
text mining, multilingual information extraction, NLP for scientific publication analysis, NLP software architectures, question answering, semantic search in digital libraries, knowledge representation, knowledge management, semantic web, deep and hybrid parsing. I like Java, Python, XML, Linux and open source technologies.
News
Research Projects
- 2009 - 2011: TAKE - Technologies for Advanced Knowledge Extraction for Aggregating Semantic Models of Human Life and Thought
- 2008 - : DFG Cluster of Excellence Multimodal Computing and Interaction (M2CI) - Robust, Efficient and Intelligent Processing of Text, Speech, Visual Data and High Dimensional Representations - Open Science Web
- 2007 - 2008: LangGrid (NICT Grant) - Integrating Language Grid and Heart of Gold as Hybrid Language Processing Service
- 2006 - 2008: HyLaP - Hybrid language processing technologies for a personal associative information access and management application
- 2003 - 2005: QUETAL - Multilingual Hybrid Question Answering
- 2002 - 2004: DeepThought - Hybrid Deep and Shallow Methods for Knowledge-intensive Information Extraction
- 2000 - 2002: Whiteboard - Multilevel Annotation for Dynamic Free Text Processing
Systems
|
- ACL Anthology Searchbench - Semantic, Fulltext and Bibliographic Search in more than 25,000 papers of the ACL Anthology (try the system here!)
The Searchbench is a web application combining sentence-semantic search (statements search) with fulltext, terminology and bibliographic search in scholarly papers. The system works domain-independently: even domain terminology ('topics') is extracted fully automatically from collections and individual papers.
The statements search helps finding information more precisely. Users can specify subject, predicate and object of sentences expected to be found in the papers, also with only parts of the sentence structure given. The Searchbench can find similar statements, abstract from passive voice, and optionally exclude negated statements or synonym predicates. Search results are displayed in the original PDF and may be emailed or bookmarked.
The semantic statements search can also be used as an open online domain term glossary: sample query for dependency parsing.
A graphical citation browser helps exploring related work by displaying citation sentence information in the graph. By clicking on nodes or edges, the original citation context can be viewed in the PDF.
Altogether, the system is meant to be used as a workbench for search, therefore the name Searchbench.
- Graphical Citation Browser for the ACL Anthology. Try the system here (sample link)!
- Scientist's Workbench - Semantic Search and Typed Citation-based Visual Navigation in Scientific Papers
- SProUT - Shallow Processing with Unification and Typed Feature Structures
- Heart of Gold - Middleware for Combining Deep and Shallow Natural Language Processing Components
|
Publications
mostly openly accessible
[BibTeX file of all publications]
[DFKI's] [LT Lab's list of publications] [DBLP]
[ACM] [CSB] [CiteseerX] [ACL Anthology Searchbench] [Google Scholar Citations] [DBLP Visualization]
2013
- M. Minev, C. Schommer, U. Schäfer, and T. Grammatikos. Dimensionality Reduction of News Texts Using Composite Features. In Bridging between Information Retrieval and Databases, PROMISE WS, Bressanone, Italy, February 2013. (Best Poster Award)
2012
- Ulrich Schäfer, Bernd Kiefer, Christian Spurk, Jörg Steffen, Rui Wang, Benjamin Weitz, Magdalena Wolska: The Searchbench - Combining Sentence-semantic, Full-text and Bibliographic Search in Digital Libraries, LIBER quarterly Journal, Vol. 22, no. 4 (2012) 285-309, ISSN: 1435-5205, e-ISSN: 2213-056X.
- Melanie Reiplinger, Ulrich Schäfer, Magdalena Wolska: Extracting Glossary Sentences from Scholarly Articles: A Comparative Evaluation of Pattern Bootstrapping and Deep Analysis.
Proceedings of the ACL-2012 Main Conference Workshop on Rediscovering 50 Years of Discoveries, pages 55-65. Jeju Island, Republic of Korea, 2012. bibtex.
- Ulrich Schäfer, Jonathon Read, Stephan Oepen: Towards an ACL Anthology Corpus with Logical Document Structure. An Overview of the ACL 2012 Contributed Task.
Proceedings of the ACL-2012 Main Conference Workshop on Rediscovering 50 Years of Discoveries, pages 88-97. Jeju Island, Republic of Korea, 2012. bibtex. ACL Anthology Corpus.
- Ulrich Schäfer, Benjamin Weitz: Combining OCR Outputs for Logical Document Structure Markup. Technical Background to the ACL 2012 Contributed Task.
Proceedings of the ACL-2012 Main Conference Workshop on Rediscovering 50 Years of Discoveries, pages 104-109. Jeju Island, Republic of Korea, 2012. bibtex.
- Benjamin Weitz, Ulrich Schäfer: A Graphical Citation Browser for the ACL Anthology. Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012), pages 1718-1722, ISBN 978-2-9517408-7-7, ELRA, Istanbul, Turkey, 2012. bibtex. Try the system here (sample link)!
- Ulrich Schäfer: Satzsemantische Suche - präziser Finden mit der TAKE Searchbench.
DOK.magazin - Technologien, Strategien & Services für das digitale Dokument, volume 2/2012, pages 28-31, ISSN 1864-8398. Dasing, Germany, 2012.
- Ulrich Schäfer, Magdalena Wolska: Automatische Terminologie-, Taxonomie- und Glossarextraktion.
DOK.magazin - Technologien, Strategien & Services für das digitale Dokument, volume 6/2012, pages 62-65, ISSN 1864-8398. Dasing, Germany, 2012.
- Ulrich Schäfer, Christian Spurk, Jörg Steffen: A fully Coreference-annotated Corpus of Scholarly Papers from the ACL Anthology. Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pages 1059-1070, Mumbai, India, 2012. bibtex. The annotated data is available at take.dfki.de.
2011
- Ulrich Schäfer, Bernd Kiefer, Christian Spurk, Jörg Steffen, Rui Wang: The ACL Anthology Searchbench. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT 2011), System Demonstrations, pages 7-13, 2011. ISBN 978-1-932432-90-9. Portland, OR, USA. bibtex. Try the system here.
- Ulrich Schäfer, Bernd Kiefer: Advances in Deep Parsing of Scholarly Paper Content. Book Chapter in: Raffaella Bernardi, Sally Chambers, Björn Gottfried, Frédérique Segond, Ilya Zaihrayeu (eds.): Advanced Language Technologies for Digital Libraries.
Springer LNCS Theoretical Computer Science Series, LNCS 6699, ISBN 978-3-642-23159-9, pages 135-153, 2011.
- Arif Bramantoro, Ulrich Schäfer, Toru Ishida: Pipelining Software and Services for Language Processing. Book Chapter 16 in: Toru Ishida and Donghui Lin (eds.): The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. ISBN 978-3-642-21177-5, Springer LNCS Cognitive Technologies Series, pages 247-262, 2011.
- Peter Adolphs, Martin Theobald, Ulrich Schäfer, Hans Uszkoreit, Gerhard Weikum: YAGO-QA: Answering Questions by Structured Knowledge Queries. Proceedings of the Fifth IEEE International Conference on Semantic Computing (ICSC-2011), pages 158-161, IEEE Computer Society, ISBN 978-0-7695-4492-2, 2011. Los Alamitos, CA, USA.
- Cailing Dong, Ulrich Schäfer: Ensemble-style Self-training on Citation Classification. Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP2011), pages 623-631, 2011. ISBN 978-974-466-564-5. Chiang Mai, Thailand. bibtex.
- Magdalena Wolska, Ulrich Schäfer, The Nghia Pham: Bootstrapping a Domain-specific Terminological Taxonomy from Scientific Text.
9th International Conference on Terminology and Artificial Intelligence (TIA), pages 17-23, Paris, France, 2011.
2010
- Ulrich Schäfer, Christian Spurk: TAKE Scientist's Workbench: Semantic Search and Citation-based Visual Navigation in Scholar Papers. Proceedings of the Fourth IEEE International Conference on Semantic Computing (ICSC-2010), pages 317-324, ISBN 978-0-7695-4154-9, September 2010, IEEE Computer Society, Los Alamitos, CA, USA.
- Ulrich Schäfer, Uwe Kasterka: Scientific Authoring Support: A Tool to Navigate in Typed Citation Graphs. Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2010) Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids (CL&W-2010), pages 7-14, June 2010, Los Angeles, CA. bibtex.
- Hans-Ulrich Krieger, Ulrich Schäfer: DL Meet FL: A Bidirectional Mapping between Ontologies and Linguistic Knowledge. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pages 588-596, August 2010, Beijing, China. bibtex.
- Arif Bramantoro, Ulrich Schäfer, Toru Ishida: Towards an Integrated Architecture for Composite Language Services
and Multiple Linguistic Processing Components. Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), pages 3506-3511, ISBN 2-9517408-6-7, May 2010, Valletta, Malta. bibtex.
2008
- Ulrich Schäfer, Hans Uszkoreit, Christian Federmann, Torsten Marek, Yajing Zhang: Extracting and Querying Relations in Scientific Papers. In Proceedings of the 31st Annual German Conference on Artificial Intelligence, KI 2008, Springer LNCS 5243, pages 127-134, September 2008, Kaiserslautern, Germany.
- Ulrich Schäfer: Shallow, Deep and Hybrid Processing with UIMA and Heart of Gold. In Proceedings of the LREC-2008 Workshop Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP, 6th International Conference on Language Resources and Evaluation (LREC-2008), pages 43-50. May 31, 2008, Marrakesh, Morocco.
- Ulrich Schäfer, Hans Uszkoreit, Christian Federmann, Torsten Marek, Yajing Zhang: Extracting and Querying Relations in Scientific Papers on Language Technology. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-2008), pages 3040-3046, May 2008, Marrakesh, Morocco. bibtex.
- Ulrich Schäfer: Integrating
Natural Language Processing Components with XML and XSLT. VDM Verlag
Dr. Müller, ISBN 978-3-836490-27-6, April 2008. Saarbrücken, Germany.
- Arif Bramantoro, Masahiro Tanaka, Yohei Murakami, Ulrich Schäfer, Toru Ishida: A Hybrid Integrated Architecture for Language Service Composition. In Proceedings of the IEEE International Conference on Web Services (ICWS-2008), pages 345-352, ISBN 978-0-7695-3310-0. September 2008, Beijing, China.
2007
- Ulrich Schäfer: Integrating Deep and Shallow Natural Language Processing
Components - Representations and Hybrid Architectures. PhD
dissertation. Faculty of Mathematics and Computer Science, Saarland University,
Saarbrücken, Germany. June 2007. Also as Volume 22 of the Saarbrücken
dissertation series in Computational Linguistics and Language Technology, 356
pages, ISBN 978-3-933218-21-6. Saarbrücken, Germany, 2007.
- Anette Frank, Hans-Ulrich Krieger, Feiyu Xu, Hans Uszkoreit, Berthold
Crysmann, Brigitte Jörg, Ulrich Schäfer: Question Answering from Structured
Knowledge Sources. In: Journal of Applied Logic. Vol. 5, No. 1. Elsevier Science, pages
20-48. doi:10.1016/j.jal.2005.12.006, 2007.
2006
- Ulrich Schäfer, Daniel Beck: Automatic Testing and Evaluation of
Multilingual Language Technology Resources and Components. Proceedings of
the 5th International Conference on Language Resources and Evaluation
LREC-2006, pages 173-178, May 2006, Genoa, Italy. bibtex.
- Ulrich Schäfer: Middleware for Creating and Combining Multi-dimensional
NLP Markup. Proceedings of the 5th Workshop
on NLP and XML (NLPXML-2006): Multi-dimensional
Markup in Natural Language Processing, 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2008), pages 81-84. April 2006, Trento,
Italy. bibtex.
- Ulrich Schäfer: OntoNERdIE - Mapping and Linking Ontologies to Named
Entity Recognition and Information Extraction Resources. Proceedings of the
5th International Conference on Language Resources and Evaluation LREC-2006,
pages 1756-1761. May 2006, Genoa, Italy. bibtex
- Ben Waldron, Ann Copestake, Ulrich Schäfer, Bernd Kiefer: Preprocessing
and Tokenisation Standards in DELPH-IN Tools. Proceedings of the 5th
International Conference on Language Resources and Evaluation LREC-2006, pages
2263-2268. May 2006, Genoa, Italy. bibtex.
- Christian Bering, Ulrich Schäfer: JTaCo & SProUTomat - Automatic
Evaluation and Testing of Multilingual Language Technology Resources and
Components. In Proceedings of the LREC-2006 Workshop on Quality Assurance
and Quality Measurement for Language and Speech Resources, pages 42-47, May
2006, Genoa, Italy.
2005
- Witold Drozdzynski, Hans-Ulrich Krieger, Jakub Piskorski, Ulrich Schäfer:
SProUT - a General-Purpose NLP Framework Integrating Finite-State and
Unification-based Grammar Formalisms. In Proceedings of the 5th
International Workshop on Finite-State Methods and Natural Language Processing,
Springer LNCS 4002. pages September 2005, Helsinki,
Finland.
- Anette Frank, Hans-Ulrich Krieger, Feiyu Xu, Hans Uszkoreit, Berthold Crysmann,
Brigitte Jörg, Ulrich Schäfer: Querying Structured Knowledge Sources. Proceedings of the AAAI-05 Workshop on Question Answering in Restricted
Domains, pages 10-19, July 2005, Pittsburgh, Pennsylvania.
2004
- Witold Drozdzynski, Hans-Ulrich Krieger, Jakub Piskorski, Ulrich Schäfer: A
Multilingual Content Production Tool for the Semantic Web. European
Conference on Knowledge Engineering and Knowledge Management (EKAW-04), October
2004, Northamptonshire, UK.
- Hans-Ulrich Krieger, Witold Drozdzynski, Jakub Piskorski, Ulrich Schäfer,
Feiyu Xu: A Bag of Useful Techniques for Unification-Based Finite-State
Transducers. In Proceedings of 7th KONVENS, pages 105-112. September 2004,
Vienna, Austria.
- Hans Uszkoreit, Ulrich Callmeier, Andreas Eisele, Ulrich Schäfer, Melanie
Siegel, Jakob Uszkoreit: Hybrid Robust Deep and Shallow Semantic Processing
for Creativity Support in Document Production. In Proceedings of 7th
KONVENS, pages 209-216. September 2004, Vienna, Austria.
- Ulrich Schäfer: Using XSLT for the Integration of Deep and Shallow
Natural Language Processing Components. In: Kiril Simov, Erhard Hinrichs:
Proceedings of the ESSLLI 2004 Workhop on Combining Shallow and Deep Processing
for NLP, pages 31-40. August 2004, Nancy, France.
- Anette Frank, Kathrin Spreyer, Witold Drozdzynski, Hans-Ulrich Krieger,
Ulrich Schäfer: Constraint-Based RMRS Construction from Shallow
Grammars. Proceedings of the HPSG04 Conference Workshop on Semantics in
Grammar Engineering, Center for Computational Linguistics, Katholieke
Universiteit Leuven, Belgium, pages 393-413. August 2004. CSLI Publications,
Stanford, CA.
- Ulrich Callmeier, Andreas Eisele, Ulrich Schäfer, Melanie Siegel: The
DeepThought Core Architecture Framework. In Proceedings of the 4th
International Conference on Language Resources and Evaluation (LREC) 2004,
pages 1205-1208, European Language Resources Association, May 2004. Lisbon,
Portugal. bibtex.
- Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Ulrich
Schäfer, Thierry Declerck, Syd Bauman, Harry Bunt, Lionel Clément, Tomaz
Erjavec, Azim Roussanaly, Claude Roux: Towards an international standard on
feature structure representation (2). In Proceedings of the LREC-2004
workshop on A Registry of Linguistic Data Categories within an Integrated
Language Resources Repository Area, pages 63-70, May 2004. Lisbon,
Portugal.
- Witold Drozdzynski, Hans-Ulrich Krieger, Jakub Piskorski, Ulrich Schäfer,
Feiyu Xu: Shallow Processing with Unification and Typed Feature Structures -
Foundations and Applications. In: Künstliche Intelligenz, Volume 1,
pages 17-23. 2004.
2003
- Anette Frank, Markus Becker, Berthold Crysmann, Bernd Kiefer, Ulrich
Schäfer: Integrated Shallow and Deep Parsing: TopP meets
HPSG. Proceedings of ACL-2003, pages 104-111, July 2003, Sapporo,
Japan. bibtex.
- Stephan Busemann, Witold Drozdzynski, Hans-Ulrich Krieger, Jakub Piskorski,
Ulrich Schäfer, Hans Uszkoreit, Feiyu Xu: Integrating Information Extraction
and Automatic Hyperlinking. Proceedings of the Interactive
Posters/Demonstration at ACL-2003, pages 117-120. July 2003, Sapporo,
Japan. bibtex.
- Ulrich Schäfer: WHAT: An XSLT-based Infrastructure for the Integration
of Natural Language Processing Components. Proceedings of the Workshop on
the Software Engineering and Architecture of Language Technology Systems
(SEALTS), HLT-NAACL03, pages 9-16. May 2003. Edmonton, Canada. bibtex.
- Christian Bering, Witold Drozdzynski, Gregor Erbach, Clara Guasch, Petr
Homola, Sabine Lehmann, Hong Li, Hans-Ulrich Krieger, Jakub Piskorski, Ulrich
Schäfer, Atsuko Shimada, Melanie Siegel, Feiyu Xu, Dorothee Ziegler-Eisele:
Corpora and evaluation tools for multilingual named entity grammar
development. Proceedings of Multilingual Corpora Workshop at Corpus
Linguistics 2003, pages 42-52, March 2003, Lancaster, UK.
2002 and earlier
- Markus Becker, Witold Drozdzynski, Hans-Ulrich Krieger, Jakub Piskorski,
Ulrich Schäfer, Feiyu Xu: SProUT - Shallow Processing with Typed Feature
Structures and Unification. Proceedings of the International Conference on
Natural Language Processing (ICON 2002). December 2002, Mumbai, India.
- Berthold Crysmann, Anette Frank, Bernd Kiefer, Hans-Ulrich Krieger, Stefan
Müller, Günter Neumann, Jakub Piskorski, Ulrich Schäfer, Melanie Siegel, Hans
Uszkoreit, Feiyu Xu: An Integrated Architecture for Shallow and Deep
Processing. Proceedings of ACL-2002, Association for Computational
Linguistics 40th Anniversary Meeting, pages 441-448. July 2002, Philadelphia,
PA. bibtex.
- Günter Neumann, Ulrich Schäfer: Whiteboard - Eine XML-basierte
Architektur für die Analyse natürlichsprachlicher Texte. Proceedings of
Online-2002, 25th European Congress Fair for Technical Communication, volume C,
pages 635.01-635.12, January 2002, Düsseldorf, Germany.
- Ulrich Schäfer: Parameterized Type Expansion in the Feature Structure
Formalism TDL. Diplomarbeit, Fachbereich Informatik, Universität des
Saarlandes, November 1995. Saarbrücken, Germany.
- Hans-Ulrich Krieger, Ulrich Schäfer: Efficient Parameterizable Type
Expansion for Typed Feature Formalisms. In Proceedings of the 14th
International Joint Conference on Artificial Intelligence (IJCAI), volume 2,
pages 1428-1434. August 1995. Montreal, Canada.
- Hans-Ulrich Krieger, Ulrich Schäfer: TDL - A Type Description Language
for HPSG, Part 2: User Manual. DFKI Document No. D-94-14. December
1994. Saarbrücken, Germany.
- Hans-Ulrich Krieger, Ulrich Schäfer: TDL - A Type Description Language
for HPSG, Part 1: Overview. DFKI Research Report No. RR-94-37. November
1994. Saarbrücken, Germany.
- Hans-Ulrich Krieger, Ulrich Schäfer: TDL - A Type Description Language
for Constraint-Based Grammars. In Proceedings of the 15th International
Conference on Computational Linguistics, COLING-94, pages 893-899, August
1994. Kyoto, Japan. bibtex
- Hans-Ulrich Krieger, Ulrich Schäfer: TDL ExtraLight User's
Guide. DFKI Doc. No. D-93-09. December 1993. Saarbrücken, Germany.
- Hans-Ulrich Krieger, Ulrich Schäfer: TDL - A Type Description Language
for Unification-Based Grammars. In Proceedings Neuere Entwicklungen der
deklarativen KI-Programmierung, KI-93 Workshop, pages
67-82. Humboldt-Universität, September 1993. Berlin, Germany.
Collaborations
|
- DELPH-IN (Deep Linguistic Processing with HPSG) -
Research sites world-wide have joined forces in a collaborative effort aimed at deep linguistic processing of human language. The goal is the combination of linguistic and statistical processing methods for getting at the meaning of texts and utterances.
- DELPH-IN wiki
|
Lectures & Seminars
Courses
Program Committees
- ACL 2012 Main Conference Workshop Rediscovering 50 Years of Discoveries and Contributed Task, 2012, Jeju, Korea
- Workshop on Partial Parsing: Between Chunking and Deep Parsing (proceedings) at LREC 2008, Jun 1, 2008, Marrakech, Morocco
- PACLING-2007
- Conference of the Pacific Association for Computational Linguistics (proceedings), Sep 19-21, 2007, Melbourne, Australia
Supervised Theses
- Øyvind Raddum Berg: High precision text extraction from PDF documents. 1-year Master's Thesis, University of Oslo, Department of Informatics, 2011. (my rôle: external reviewer)
- Pham The Nghia: NLP-based Extraction of Ontology Information from Scientific Papers on Language Technology. Master's Thesis, Faculty of Computational Linguistics and Phonetics, Saarland University, Saarbrücken, 2011.
- Benjamin Weitz: Ein grafischer Zitationsbrowser für die ACL Anthology. Bachelor Thesis, Faculty of Computational Linguistics and Phonetics, Saarland University, Saarbrücken, 2011.
- Melanie Reiplinger: Automatic Glossary Extraction from Scientific Papers. Bachelor Thesis, Faculty of Computational Linguistics and Phonetics, Saarland University, Saarbrücken, 2012.
- Uwe Kasterka: Ein generisches Framework für die Visualisierung von und Navigation in getypten Zitationsgraphen. Bachelor Thesis, Faculty of Mathematics and Computer Science, Saarland University, Saarbrücken, 2010.
- Daniel Beck: Hierarchical Classification Using NLP Techniques. Master's Thesis, Faculty of Mathematics and Computer Science, Saarland University, Saarbrücken, September 2007.
- Daniel Beck: OnTarget - A Framework for Editing Text Assisted by Visualization and Navigation of Ontology Data. Bachelor Thesis, Faculty of Mathematics and Computer Science, Saarland University, Saarbrücken, 2006.
Misc
|
- SProUTomat - a tool for daily automatically compiling, testing and evaluating SProUT software and lingware (anno 2005).
- FS2LaTeX - a tool for typesetting SProUT grammars, typed feature structures, RMRS structures (anno 2004).
- TDL
mode for emacs (anno 1993).
|
last modification 2013-02-21