A Template-Based Approach to Summarize XML Collections

Gudrun Fischer, Igor Jacy Lino Campista

Abstract

Existing summarization approaches for XML concentrate on extracting common structure and compressing the data, to optimize storage and speed up queries. Neither compression, nor structure extraction suffices for advanced, content-based summarization tasks. We present a set of tools for semi-automatic summarization of XML collections, where the user can specify semantically relevant features for an XML collection in a template, and define rules for summarization. The system assists the user in generating one or several such templates, selects applicable templates for a given collection, and applies them for automatic summarization. In experiments on the INEX collection (among others), we investigate the merits and limitations of our approach.

[article]