Linguistic Data Science deals with the empirical aspects of linguistics, e.g., corpus annotations, or statistical evaluation of speech and language data with the help of linguistic methods.
An important topic of this research area that the Speech and Language Technology Lab is focussing on is the question how to characterize the quality of text.
While there has been a long tradition of text quality assessment in linguistics, this topic still has not made it into the mainstream of research in language technology. One of the reasons might be that existing methods typically rely on human assessment, which is difficult to operationalize in technology development cycles. Another challenge is that text quality (like many other notions of quality) is not an intrinsic and once-and-for-all feature of the text at hand, but rather emerges from being used by humans in a certain context, e.g., communication goal, situation of perception, or focus of investigation.
Yet, the growing maturity of text-generating AI systems together with the demand for successful Human-Machine-Communication in many areas of life calls for making text quality assessment in a broader notion a first class citizen in NLP research.
Our own research in this area addresses all types of machine-generated text, as well as human-generated text processed through machines. It aims at covering the potentially high number of influencing factors mentioned above by developing different types of evaluation methods, such as: