Skip to main content Skip to main navigation


Perceptual Ratings of Voice Likability Collected through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing

Laura Fernández Gallardo; Rafael Zequeira Jiménez; Sebastian Möller
In: 18th Annual Conference of the International Speech Communication Association (INTERSPEECH 2017). Conference in the Annual Series of Interspeech Events (INTERSPEECH-2017), August 20-24, Stockholm, Sweden, Pages 2233-2237, Vol. 1, ISBN 978-1-5108-4876-4, International Speech Communication Association, 2017.


Human perceptions of speaker characteristics, needed to perform automatic predictions from speech features, have generally been collected by conducting demanding in-lab listening tests under controlled conditions. Concurrently, crowdsourcing has emerged as a valuable approach for running user studies through surveys or quantitative ratings. Micro-task crowdsourcing markets enable the completion of small tasks (commonly of minutes or seconds), rewarding users with micro-payments. This paradigm permits effortless collection of user input from a large and diverse pool of participants at low cost. This paper presents different auditory tests for collecting perceptual voice likability ratings employing a common set of 30 male and female voices. These tests are based on direct scaling and on paired-comparisons, and were conducted in the laboratory and via crowdsourcing using micro-tasks. Design considerations are proposed for adapting the laboratory listening tests to a mobile-based crowdsourcing platform to obtain trustworthy listeners’ answers. Our likability scores obtained by the different test approaches are highly correlated. This outcome motivates the use of crowdsourcing for future listening tests investigating e.g. speaker characterization, reducing the efforts involved in engaging participants and administering the tests on-site.