On Compact Storage Models for GazetteersJakub Piskorski
In: Proceedings of the 5th International Workshop on Finite-State Methods and Natural Language Processing. International Workshop on Finite-State Methods and Natural Language Processing, Helisnki, Finland, Springer, Lecture Notes in Artificial Intelligence, 9/2005.
This paper describes a compact architecture for storing gazetteers using state-of-the-art finite-state technology. In particular, we compare the standard method based on numbered indexing automata associated with an auxiliary storage device against a pure finite-state representation, the latter being superior in terms of space and time complexity when applied to real-world test data. Further, we pinpoint some pros and cons for both approaches and provide some results of empirical experiments which form a kind of handy guidelines for selecting a suitable data structure for implementing a gazetteer.