DFKI-LT - A DevOps Manifesto for Speech Corpus Management

Ingmar Steiner
A DevOps Manifesto for Speech Corpus Management
in: Jürgen Trouvain, Ingmar Steiner, Bernd Möbius (eds.):
2 28th Conference on Electronic Speech Signal Processing (ESSV), Pages 160-166, Saarbrücken, Germany, TUD Press, Dresden, 3/2017
In this paper, we introduce certain concepts from the DevOps philosophy, and more generally from the software development lifecycle. We argue that the separation between source code and how it is built and released for distribution can be applied to speech corpora as well. We draw a distinction between the developers and maintainers of a speech corpus on one hand, and the researchers who use it on the other. We propose conventions to efficiently manage corpus metadata like source code, and speech data like static assets that can be retrieved automatically. Finally, we mention several use cases which illustrate the merits of these conventions.
Files: BibTeX, Steiner_2.pdf, Steiner_2.pdf