Preprocessing of Object-Oriented Source Code for Code Retrieval

Jörg Rech

Abstract

Object oriented source code occurs in diverse programming languages with documentation using miscellaneous standards, comments in individual styles, or associated test cases that are hard to exploit through information retrieval or knowledge discovery techniques. Typically, the information about object-oriented source code for a software system is distributed across several different sources, which makes processing complex. In this paper we describe the morphology of object-oriented source code and how we preprocess it to improve the retrieval of source code for further reuse. Results from two studies showed that the preprocessed index increases the precision of the search by at least 13% for queries encompassing a whole class and 33% for queries consisting of the class name.

[article]