Skip navigation.

Open MSc thesis topics

Doing an MSc thesis in Human-Robot Interaction

Robots are fun. When things work, you see what's going on. But, robots are also very complicated. Robot perception is noisy, and inaccurate. Software architectures are distributed systems, running on different machines. They include modules covering diverse capabilities such as dialogue processing, planning, vision, and robot control. What this means for doing an MSc thesis in the area of human-robot interaction is simply this: It means teamwork, and an openness and willingness to work with people from different scientific backgrounds, possibly from all over world. If you believe you have what it takes, then we can offer you a stimulating environment in which you can get the support, supervision, and every opportunity to make the most out of your MSc thesis!

Contact

If you are interested in pursuing an MSc thesis in our group, please contact Geert-Jan Kruijff at gj@dfki.de. Ideally, you would indicate what topic(s) you are interested in, and your motivations for doing research in this area.

Download the recent flyer with MSc thesis topics [ PDF ]

Download the recent poster for doing an MSc in HRI [ PDF ]



Context-sensitive understanding of vague scalar predicates
in visually situated dialogue

When is an object big, or small? When is it high, or low? Big, low, slow -- these are all "vague scalars", the interpretation of which is very much dependent on the context in which they are used. The goal of the thesis is to expand on work by DeVault & Stone, to develop an approach for interpreting vague scalar predicates applied to visual objects, used in situated dialogues / human-robot interaction for visual object learning and object manipulation. See also [ www ]

Id: MSc.2007.1 Status: Open Supervisor: Kruijff Posted: March 2007


Producing deictic gestures to accompany spatial referring expressions
in human-robot interaction

Imagine a robot in a cluttered room. It has just passed inbetween a box, and a wastebin, and is wondering whether it has just gone through a door -- or not. so, at some point, it raises the question: "is there a door here?" [ video ] without making clear where "here" is supposed to be (the position of the robot? or, this room? or, somewhere nearby the robot?). The human user has no way of knowing what the robot is asking something about, and may thus fail to give a proper answer. the goal here is to develop methods for planning and realizing deictic gestures to accompany spatial referring expressions in human-robot interaction. As deictic gestures the intention is to use a combination of looking (pan-tilt unit), pointing (arm), and turning & positioning (robot pose, movement-to-position). The methods are to be integrated into a dialogue system for human-robot interaction, to be evaluated with users on the comprehensibility of questions and assertions modulo the presence/absence of deictic gestures. Possible dialogue domains are visual learning, object manipulation or human-augmented mapping.

Id: MSc.2007.2 Status: Open Supervisor: Kruijff Posted: March 2007


Adaptive interleaving of dialogue- and action planning
for manipulation plan execution

If a robot is given a possibly complex plan for manipulating objects in a scene, the robot should provide feedback to indicate that it has understood what to do. However, such "dialogue grounding" does not need to be done purely verbally -- performing particular initial manipulation actions while producing verbal feedback can also function non-verbal feedback, while at the same time already executing the plan. The goal is (a) to investigate appropriate mixtures of verbal- and non-verbal feedback in manipulation tasks, using a corpus of human-human collaborative dialogue (Baufix corpus), and (b) to develop methods for interleaving dialogue- and action-planning to integrate planned manipulation actions as non-verbal feedback in the production of dialogue grounding moves. Because the robot may be interrupted in its action execution (e.g. if it does the wrong thing, or if the situation changes), the approach needs to integrate with a continuous planning model being developed within the project. See also [ PDF ]

Id: MSc.2007.3 Status: Open Supervisor: Kruijff Posted: March 2007


Producing verbal and non-verbal communication engagement acts
in human-robot interaction

How should a robot indicate that it wants to talk to you? Or, that it hands over the turn to you? Or that it would like to move on, and say goodbye? These are "meta-level" acts that guide how dialogue proceeds as a form of engagement. The goal here is to expand on work by Sidner et al to develop methods for planning, and realizing, verbal and non-verbal communication in human-robot interaction to indicate open and closing an engagement, and maintaining or handing over turns. The approach should be evaluated with expert and naive users on the effect of the naturalness (Likert-scale) of dialogue given a use (non-use) of varying forms of non-verbal communication (gaze; posture; deictic gestures) in human-robot interaction in (simple) office assistant scenarios. See also [ www ].

Id: MSc.2007.4 Status: Open Supervisor: Kruijff, Staudte Posted: March 2007


Comprehending perspectivization in spatial references
to locations relative to landmarks

When referring to an aspect of spatial organization, people do not always adopt an ego-centric perspective. very often, an "allo-centric" perspective is adopted that is centered not on the perceiver, but on something in the environment -- e.g. a salient landmark. The goal is to develop methods for resolving spatial references to locations relative to landmarks, using the robots knowledge about the spatial organization of the environment. The first step is to resolve references to locations relative to landmarks that the robot knows about -- landmarks marked in a map the robot has built up. The second step is to resolve references to locations in known areas, but relative to landmarks that the robot does not know about. This requires the robot to resolve the reference by constructing a (continuous) plan for traveling to the known area, and then performing simple visual search to identify the referred-to location. The final step is to handle references to locations in unknown areas. this requires the robot to construct a continuous plan for traveling to the unknown area (assuming that area can be identified relative to known areas; e.g. "the kitchen [unknown] is at the end of the corridor [ known ] to your left"), and then perform visual search to identify the referred-to location. The approach is to be evaluated at each stage using a constructed test bed involving simple office assistant tasks ("take the X to the Y and put it R-RELATIVE to the Z"), measuring success in terms of recall (#resolutions construed) and precision (#correct resolutions).

Id: MSc.2007.5 Status: Open Supervisor: Kruijff Posted: March 2007


Stressing what is contextually salient when communicating
assertions about spatial organization in a visual scene

The goal is to expand on recent work by Cassell et al [ www ] and Mutlu et al [ www ] to develop methods for (a) determining a shallow form of information structure for an assertion that the robot needs to communicate, on the basis of a combination of discourse salience and what is salient in the current visuo-spatial scene, and (b) to use that shallow information structure to determine what intonation to use, and where to look, when realizing the assertion. For synthesizing speech with varying intonation, the Mary system is to be used [ www ]. To evaluate the approach, the work is to focus on the production of assertions that describe visual scenes, i.e. collections of objects and their spatial relations, measuring how well expert and naive users are capable of identifying what objects the robot is talking about. Of particular interest here is the trade-offs between the use of non-verbal (gaze) and paralinguistic (intonation) means to stress saliency, and the complexity of spatial referring expressions [ PDF]. See also [ PDF ]

Id: MSc.2007.6 Status: Open Supervisor: Kruijff Posted: March 2007


Using contrastive intonation in clarification questions
for resolving situational ambiguity

In a clarification question, a robot needs to contrast what it does understand about a situation, with that which it does not. the goal here is to develop methods for planning and realizing contrastive intonation in clarification questions raised in situations when the robot is uncertain about how to interpret something -- for example, "is there a door here?", "is this an box or a ball?", "is the box red or orange?". For synthesizing speech with varying intonation, the Mary system is to be used [ www ]. The resulting methods should be evaluated with expert and naive users on naturalness, and comprehensibility (what is the question about?), possibly using off-line web-experiments.

Id: MSc.2007.7 Status: Closed (Raveesh Meena) Supervisor: Kruijff Posted: March 2007


Layered short-term scene understanding

Imagine a robot, capable of picking up objects and taking them from one place to another. when the robot wants to pick up an object on a table, it needs to focus on the table, construct a model (A) of the spatial organization of what is on the table, and pick up the object. Next, when moving around the room, it should maintain a model (B) of the current organization in the room -- e.g. the trash bin may have moved from where it was last time. Assume the robot needs to drop the object into the trash bin; it needs to go there, focus on the trash bin, basically create yet another model (C) of how the trash bin is positioned relative to the robot, and then drop the object into the bin -- task completed, mission accomplished. The question we can raise here is how models A, B, and C are related. They clearly have a different level of granularity in spatial organization -- a table, a room, the area around a trash bin. Yet, they are related -- humans do not forget what they saw on a table, or where the trash bin was when they last saw it. The goal here is to investigate how, in visuo-spatial working memory, the robot can construct and maintain models of visuo-spatial organization at different levels of granularity. For varying granularity the focus is on models that can be anchored on landmarks or identifiable areas. Issues like how these models are related among themselves, and how they can be grounded in e.g. a map of the environment, need to be addressed.

Id: MSc.2007.8 Status: Open Supervisor: Kruijff Posted: March 2007


Semantic integration for robust spoken dialogue comprehension

Automatic speech recognition (ASR) is never 100% perfect: it may wrongly recognize words, "mishearing" what was said, possibly yielding ungrammatical strings as output. As a result, (incremental) parsing of such strings may only be able to yield a collection of partial semantic analyses. The goal here is to develop a method for integrating these partial analyses, using salient structures in the discourse context, categorical knowledge about how meanings can prototypically be combined, and knowledge about salient objects and spatio-temporal organization in the situated context. The resulting approach will be evaluated on spoken-out-loud constructed input (test-bed) and corpus sentences (Home Tour corpus). See also [ PDF ].

Id: MSc.2007.9 Status: Closed (Pierre Lison) Supervisor: Kruijff Posted: March 2007


Using situational awareness to improve spoken dialogue recognition

Automatic speech recognition (ASR) is never 100% perfect: it may wrongly recognize words, "mishearing" what was said, possibly yielding ungrammatical strings as output. The idea here is to expand on work by Purver [ www], and by Roy [ www ] to use the robot's knowledge of the (visual) situation to resolve competing ASR interpretations to one that is situationally best supported. For example, if the situation contains a ball, but no box, and the top-ranked strings are "this is a ball" and "this is a box", then the idea is to choose for "this is a ball" even when "this is a box" may have a (marginally) higher confidence score. The goal of the thesis is to develop a post-processing module, which takes a word lattice from the ASR engine, and outputs a string which has both a high confidence and which is supported by the situational context. To elaborate the notion of "situational support", the thesis should explore how information from different modalities can be explored -- visual-spatial working memory, but also discourse context, and information about where the human is looking in a scene. See also [ PDF ] and [ PDF ].

Id: MSc.2007.10 Status: Open Supervisor: Kruijff Posted: March 2007


Resolving clarification ellipsis in situated dialogue

Clarification questions are often answered with short phrases in which much of the contextually salient content is simply omitted; e.g. "What color is the box?" "Orange." Empirically, the phenomena can be observed in corpora for human-human dialogue (e.g. Brown corpus; Baufix corpus) and human-robot dialogue (Home Tour corpus). The goal is to expand on the work by Purver [ www] and Ginzburg [ www ]to resolve clarification ellipsis in a situated manner, drawing not only on the discourse context but also on what is salient in the situation to resolve ellipsis. The resulting approach is to be implemented and evaluated in situated dialogues in human-robot interaction for visual object learning or human-augmented mapping. See also [ PDF ] and [ PS ].

Id: MSc.2007.11 Status: Open Supervisor: Kruijff Posted: March 2007