Knut Hartmann, Bernhard Preim and Thomas Strothotte

Describing Abstraction in Rendered Images through Figure Captions

[Previous Version]
[send contribution]
[debate procedure]

Overview of interactions

No Comment(s) Answer(s) Continued discussion
1 7.5.99 Ehud Reiter
28.5.99 Knut Hartmann, Bernhard Preim and Thomas Strothotte

C1. Ehud Reiter (University of Aberdeen) (7.5.99):

I think this is a very interesting topic, but I must admit that at times I found the paper a bit frustrating because it left out many details.

In particular, from a Natural-Language Generation perspective, I think the most interesting part of this work is content determination, that is deciding what information to include in captions. But there isn't much detail given about this. I for one would very much like to know

  1. What kind of graphical manipulations (abstractions) can be performed by the underlying ZOOM ILLUSTRATOR system? This is an important part of defining the kind of information that the caption-generator may wish to communicate. In general, it would be nice to have more of a description of ZOOM ILLUSTRATOR in this paper.
  2. What is the relationship was between the person choosing graphical manipulations in ZOOM ILLUSTRATOR; the person deciding what kind of caption should be generated (eg, filling out the GUI in Figure 3); and the person reading the caption? This is an important aspect of determining what the best content of a caption should be. For example, if these are all the same person, then perhaps it would be better for the caption simply to recap the manipulations the user has performed? If they are different people, then the expected use of the hypertext mechanism (where users click on elements of a caption) is a bit mysterious, and should be explained.
  3. The actual content-determination mechanism is also somewhat mysterious. Section 7.2 gives the impression that its done by rules (production rules?) triggered by patterns in the data and the user model; but Section 7.3 describes the process more as top-down expansion of a schema (macrostructure) with conditional elements. This really should be made clear. A complete worked example would be useful here.
  4. I also wondered where the content rules or schemas came from. Did the developers derive them from a corpus analysis? From discussions with domain experts? From user trials with different rules or schemas? From some other source?
With regard to the discussion of templates and linguistically motivated generation, there are a number of packages now which support "mixed" systems, and which the authors may wish to investigate. For example, CoGenTex's Exemplars system ( and DFKI's TG/2 system (

A2. Knut Hartmann, Bernhard Preim and Thomas Strothotte (28.5.99):

Ehud Reiter asked:

1. What kind of graphical manipulations (abstractions) can be performed by the underlying ZOOM ILLUSTRATOR system?

The ZOOM ILLUSTRATOR presents one or two views onto an object, together with labels which contain annotations.

There may be several annotations of an object in different levels of detail. Whenever the user selects an object or its annotation, more detailed object descriptions are presented within the accompanying label, whereas the amount of text in other labels or the number of labels has to be decreased. So the varying amount of text presented in one annotation enforces a rearrangement of all labels. In order to determine the maximal bounding box of all labels, the ZOOM algorithm [3] is applied. This algorithm preserves the topological relations between all objects in 2D or in 3D whenever the size of objects is changed. One effect of this algorithm is to preserve contextual information while changing the focus of an illustration.

Another consequence of the request about additional information for an object is the application of several techniques of graphical emphasis. Emphasized objects should be clearly visible and recognizable. This could be achieved by altering presentation variables of the object itself, or those of other objects. Frequently, those objects occluding the object to be emphasized are drawn semi-transparently, while the saturation of the object to be emphasized is increased.

Furthermore, geometric manipulations, such as the 3D-Zoom, may be applied to highlight objects. Finally, the system may determine another viewing position using some heuristics, for instance to increase the visible portion of the object to be highlighted.

There is an online description of the ZOOM ILLUSTRATOR system available as well as a gallery of screen-shots, which may be useful to illustrate the ideas presented above. Furthermore, several papers ([1], [2], [3]) discuss various aspects of the ZOOM ILLUSTRATOR system in more detail.

2. What is the relationship between the person choosing graphical manipulations in ZOOM ILLUSTRATOR; the person deciding what kind of caption should be generated (e.g., filling out the GUI in Fig. 3); and the person reading the caption?


3. [unclear description of the] content-determination mechanism
[production rules or schema expansion?]
[bottom-up or top-down?]

In the scenario described in the paper, all these actions are performed by the same person. The configuration of the figure captions content (see Fig. 3) was inspired by the configuration of menus and tool-bars in usual graphical interfaces. The description of all manipulations -- whether initiated by the user or the system -- would lead to long figure captions. So our idea is to sum up the most important manipulations in the figure caption.

To illustrate the generation process, we will frequently refer to the system architecture presented in Fig. 4 and the configuration dialog in Fig. 3. Both visualization components, the graphic and the text display, inform the context expert about changes to the visualization. In technical terms, both modules send events containing information both about the modification as well as the initiator of the modification, which are recorded in the context expert.

The status of the Update Frequency option in Fig. 3 determines when an update, i.e. a regeneration, of figure captions is triggered. These options correspond to an update after every modification, whenever a threshold is reached or according to users explicit request.

The options Information Selection and Select Tracing Item control event filter. Finally, the value of the Information Level option controls which structural elements are activated in the macrostructure presented in Fig. 5.

The macrostructure defines the linear order of the structural elements and triggers their generation using classes of conditional templates. In this process, first of all, a template which conditions holds, is selected. Then, values for the template variables are estimated. Finally, within the lexical mapping, phrases describing (numerical) values are determined.

To sum up, the options in the configuration dialogue together with the predefined macrostructure control content determination, whereas the templates and the lexical mapping of template variables control the content realization.

We hope that this overview of the generation process makes the discussion in Section 7.2 and Section 7.3 somewhat clearer. Nevertheless, the authors will rewrite Section 7 in the final version.

4) I also wondered where the content rules or schematas came from. Did the developers derive them from a corpus analysis? From discussions with domain experts? From user trials with different rules or schematas? From some other source?

At the very beginning of this work, we analyzed the structure of figure captions in several anatomic textbooks, such as the Sobotta atlas see [20] and [21] in the references of our paper). Moreover, the structure of other textbooks was studied in order to generalize the results from our analysis of anatomic figure captions. Furthermore, we ask the domain experts, i.e. undergraduates studying anatomy, what they expect to find in figure captions.

Recently, a first evaluation of the interaction techniques used by the ZOOM ILLUSTRATOR for the explanation of special phenomena was carried out ([4]). One goal was to evaluate whether undergraduates studying medicine find figure captions useful for a correct interpretation of anatomic illustrations generated interactively by the ZOOM ILLUSTRATOR.

On a scale between 0 (redundant) and 10 (most indispensable) all 9 probands ranked the usefulness between 9 and 10 (9.8 on average). No other feature of the ZOOM ILLUSTRATOR reached those ratings. Despite of the small number of probands, that is a very impressive result, which clearly emphasizes the importance of figure captions, also in interactive environments.


B. Preim, A. Ritter,Th. Strothotte , D. R. Forsey and L. Bartram. Consistency of Rendered Images and Their Textual Label,
In: Proc. of CompuGraphics, pp. 201-210, Alvor, Portugal, December 1995
B. Preim, A. Ritter and Th. Strothotte. Illustrating Anatomic Models - A Semi-Interactive Approach,
In: Visualization in Biomedical Computing , pp. 23-32, Hamburg, 22.-25. September 1996
Bernhard Preim, Andreas Raab and Thomas Strothotte. Coherent Zooming of Illustrations with 3D-Graphics and Textual Labels,
In: Proc. of Graphics Interface, S. 105-113, 19.-23. Mai 1997
Ian Pitt, Bernhard Preim and Stefan Schlechtweg. Evaluating Interaction Techniques for the Explanation of Spatial Phenomena,
In: U. Arend, E. Eberleh and K. Pitschke (eds.), Software-Ergonomie '99 - Design von Informationswelten, pp. 275-286, Teubner Stuttgart Leipzig, Walldorf, 8.-11. March 1999,

Additional questions and answers will be added here.
To contribute, please click [send contribution] above and send your question or comment as an E-mail message.
For additional details, please click [debate procedure] above.
This debate is moderated by Elisabeth André.