Bibliographic Meta-Data Extraction Using Probabilistic Finite State Transducers

Martin Krämer; Hagen Kaprykowsky; Daniel Keysers; Thomas Breuel

In: Proceedings of the 9th Conference on Document Analysis and Recognition (ICDAR-2007), September 23-26, Curitiba, Brazil. International Conference on Document Analysis and Recognition (ICDAR), Pages 609-613, Vol. 2, IEEE, 9/2007.


We present the application of probabilistic finite state transducers to the task of bibliographic meta-data extraction from scientific references. By using the transducer approach, which is often applied successfully in computational linguistics, we obtain a trainable and modular framework. This results in simplicity, flexibility, and easy adaptability to changing requirements. An evaluation on the Cora dataset that serves as a common benchmark for accuracy measurements yields a word accuracy of 88.5%, a field accuracy of 82.6%, and an instance accuracy of 42.7%. Based on a comparison to other published results, we conclude that our system performs second best on the given data set using a conceptually simple approach and implementation.

BibMetaDataExtMkHkDkTmb.pdf (pdf, 153 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence