Computational Approaches to Subjective Interpretation of Multimedia Messages
PhD-Thesis, TU Kaiserslautern, kluedo, 2/2020.
Nowadays a large part of communication is taking place on social media platforms such as Twitter, Facebook, Instagram, or YouTube, where messages often include multimedia contents (e.g., images, GIFs or videos). Since such messages are in digital form, computers can in principle process them in order to make our lives more convenient and help us overcome arising issues. However, these goals require the ability to capture what these messages mean to us, that is, how we interpret them from our own subjective points of view. Thus, the main goal of this dissertation is to advance a machine's ability to interpret social media contents in a more natural, subjective way. To this end, three research questions are addressed. The first question aims at answering "How to model human interpretation for machine learning?" We describe a way of modeling interpretation which allows for analyzing single or multiple ways of interpretation of both humans and computer models within the same theoretic framework. In a comprehensive survey we collect various possibilities for such a computational analysis. Particularly interesting are machine learning approaches where a single neural network learns multiple ways of interpretation. For example, a neural network can be trained to predict user-specific movie ratings from movie features and user ID, and can then be analyzed to understand how users rate movies. This is a promising direction, as neural networks are capable of learning complex patterns. However, how analysis results depend on network architecture is a largely unexplored topic. For the example of movie ratings, we show that the way of combining information for prediction can affect both prediction performance and what the network learns about the various ways of interpretation (corresponding to users). Since some application-specific details for dealing with human interpretation only become visible when going deeper into particular use-cases, the other two research questions of this dissertation are concerned with two selected application domains: Subjective visual interpretation and gang violence prevention. The first application study deals with subjectivity that comes from personal attitudes and aims at answering "How can we predict subjective image interpretation one would expect from the general public on photo-sharing platforms such as Flickr?" The predictions in this case take the form of subjective concepts or phrases. Our study on gang violence prevention is more community-centered and considers the question "How can we automatically detect tweets of gang members which could potentially lead to violence?" There, the psychosocial codes aggression, loss and substance use serve as proxy to estimate the subjective implications of online messages. In these two distinct application domains, we develop novel machine learning models for predicting subjective interpretations of images or tweets with images, respectively. In the process of building these detection tools, we also create three different datasets which we share with the research community. Furthermore, we see that some domains such as Chicago gangs require special care due to high vulnerability of involved users. This motivated us to establish and describe an in-depth collaboration between social work researchers and computer scientists. As machine learning is incorporating more and more subjective components and gaining societal impact, we have good reason to believe that similar collaborations between the humanities and computer science will become increasingly necessary to advance the field in an ethical way.