The semantics of frequent subgraphs: Mining and navigation pattern analysis

Bettina Berendt

Abstract

The search for frequent subgraphs is a useful extension of common approaches in Web mining. For example, it allows the study of revisitation patterns in Web usage and the discovery of richer navigation structures such as "landmarks" or "hubs" that serve to organize a user's conceptual map of a site or a part of the Web. Any use of graph structures in Web usage mining, however, should also take into account that it is essential to integrate background knowledge into the analysis, and that behaviour must be studied at different levels of abstraction. To capture these needs, we propose to use taxonomies in mining and to extend the standard notions of interestingness frequency/support by the notion of context-induced interestingness. The AP-IP mining problem then consists of nding all frequent abstract patterns and the individual patterns that constitute them and are therefore interesting in this context (even though they may be infrequent). The paper presents the AP-IP algorithm that uses a taxonomy to search for the abstract and individual patterns, We also show that the search for label-abstracted but isomorphic subgraphs does not always give an accurate image of navigation strategies, and we develop a procedure for mining at the concept level to solve this problem. A case study of a real-life Web site shows the advantages of the proposed solutions.