Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence
ESSLI 2005, Heriot-Watt University, Edinburgh, Scotland 8/2005.
Still, there exist a number of excellent books and tutorials addressing the NLP applications of finite-state machines. However, most of them are closely tied to a specific formalism, such as the XFST toolkit (Karttunen and Beesley 2003) or the INTEX system (Silberztein 1999a). A very good synopsis of finite-state algorithms is provided by Roche and Schabes (1997), but it is limited to the 65 pages of the introductory section. As a result, the relevant knowledge is spread across a vast spectrum of textbooks, journal articles and conference papers: often excellent, but using different notational conventions and requiring different levels of mathematical or computer science background from the reader. All this makes us conclude that there is a need for a textbook explaining the basic finite-state concepts from an NLP-oriented point of view. The primary aim of this book is to convey the basic intuitions behind finite-state machines and show the readers a number of generic use cases for finite-state methods. Starting from a simple pattern recognition task at the beginning of chapter 1, we introduce different variants of finite-state machines and algorithms for manipulating them. Chapter 2 introduces regular expressions as a convenient formal notation equivalent to finite-state machines. A number of generic formalization techniques are explained in chapter 3. Finally, chapter 4 discusses implementation issues. We have attempted to keep the mathematical side of this book as simple as possible. Everyone with an undergraduate-level knowledge of set theory and function notation should be able to understand the mathematical underpinning of the concepts and algorithms presented in our book. No previous knowledge of automata theory is required, although some acquaintance with it will certainly be helpful, as will familiarity with basic NLP and Unix/Perl regular expressions. Of course, it is difficult to cover all finite-state NLP applications in a single book. Conscious of this restriction, we have decided to limit the scope of the present edition, published as a reader for a course at the 2005 European Summer School on Logic, Language and Information (ESSLLI), to finite-state automata and their applications, thus leaving out such important concepts as weighted automata and transducers. We envisage extending our book to these topics at a later time. This book could not have been possible without the support of several people and institutions. We would like to thank Matthew Aylett and Paul Taylor for their valuable comments. Also, we acknowledge the support we received from Rhetorical Systems Ltd in Edinburgh, the German Research Centre for Artificial Intelligence in Saarbr¨ucken and the Gda´nsk University of Technology. The organizing committee of the ESSLLI held in August 2005 at the Heriot-Watt University in Edinburgh provided for a publication of our book as a course reader. Special thanks go to Sue Fitt, who kindly agreed to do the proof-reading. All remaining mistakes are, of course, ours.