Automated Text Readability Assessment for German Language: A Quality of Experience Approach

Babak Naderi, Salar Mohtaj, Karan Karan, Sebastian Möller

In: 11th International Conference on Quality of Multimedia Experience (QoMEX 2019). International Conference on Quality of Multimedia Experience (QoMEX-2019) June 5-7 Berlin Germany QoMEX IEEE 2019.


Data-driven approaches towards readability assessment, using automated linguistic analysis and machine learning methods, is a viable road forward for readability rankings. This paper describes the development of an automated readability assessment estimator based on supervised learning algorithms over German text corpora. For this purpose, natural language processing tools are used to extract 73 linguistic features grouped in traditional, lexical and morphological features. Feature engineering approaches are employed to select informative features. Different supervised learning models are implemented, with the top-ranked features fed as input. The results obtained depict that Random Forest Regressor yielding best result (0.847) for RMSE measure.

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz