Swedish Institute of Computer Science
User-adaptive systems rely on modelling aspects of the user [Kobsa and Wahlster 1989]. There exists many approaches to user modelling, both knowledge-based and statistical. A common problem for all user modelling approaches is that they require boot-strapping. When the system is designed, the designers will not know how users will react to it and behave within it, and consequently, they cannot construct the appropriate ways for systems to adapt beforehand. In knowledge-based approaches to user modeling, the bootstrap problem appears as a serious design problem. The rules for adaptations cannot be formed or weighted unless the system has been put to test with users, and once in place, they will affect user behaviour so that they may no longer apply. This is what we have chosen to call the 'hen and egg problem' of bootstrapping system adaptivity.
The problem is to some extent overcome in purely statistically based systems such as collaborative filtering systems [Resnick and Varian 1997]. In a collaborative filtering system, there are no a priori rules for adaptations: instead, the system collects information about user behaviour for similar users, and aggregates this information in presenting recommendations to individuals. This turns the user modelling bootstrapping problem into one of selecting an aggregation algorithm with desireable machine learning properties [Breese et al. 1998]. But it does not entirely eliminate the hen and egg problem - it is instead turned into a run time problem. New recommenation systems give poor recommendations, which in turn causes users to stop trusting them, or even stop using them.
In this paper, we report an early study of a user-adaptive information filtering system that combines aspects of knowledge-based and statistically based user modelling in a way that allows for both design-time and run-time bootstrapping. This study is used as background for a discussion of appropriate methods for bootstrapping filter adaptivity.
In the EdInfo project, we have developed an approach to recommendation systems that aims at combining a knowledge-based approach to filtering with a collaborative/statistical approach [Waern et al. 1998, 1999]. The idea is to combine human expertise (an editor or information broker) with machine intelligence in order to achieve a high quality of the filtered information provided to the end user. The combination is achieved by using a keywords model for representing both users and filtered documents. Each user sets up a profile as a set of interesting terms, and documents are in the same way indexed by the editor with a set of interesting terms. A user recieves all documents for which at least one keyword fit.
Human expertise is used to select the set of keywords. This can be done while initializing the system, but will also be done during execution of the system. The set of keywords used in profiles and annotations is not fixed, but open to alteration by both end readers and editors. Machine intelligence is used to suggest changes to user profiles. The system maintains a candidate profile in terms of a vector associating weights to terms, which is used to provide suggestions to the user about how to improve on the current filter. (Details on the adaptation model and the underlying algorithms can be found in [Waern et al 1998].) In the next version of the system, machine intelligence will also be used to suggest keyword annotations to new documents to editors.
Since both the human and the machine intelligence is deployed at runtime, it is potentially possible to tune the adaptation model (the way the candidate profile is maintained) either at design time or at runtime. Run-time testing of a system leads to other problems, however, in particular in providing all experiment subjects with identical conditions for the testing. Here, we report on an experiment where an evaluative study of the system was combined with gathering data for bootstrapping the adaptivity of the system. In such a setting, it is highly desireable that the experiment subjects are faced with as identical conditions as possible.

Figure 1. The window used to inspect and modify the user's individual profile.
The ConCall [Waern et al 1998, 1999] system is a call-for-paper/participation (CFP) filtering service, built on the EdInfo approach to filtering systems. In this service, users are presented with selections of calls for papers and participation for conferences, based on their individual profiles of interests. ConCall provides the users with a candidate profile, which is a list of suggestions for keywords that the system think should suite the user. The candidate profile is based on implicit feedback from the user's behavior in dealing with presented CFPs. If the user saves or sets up a reminder on a particular call, this call is judged as interesting to the user. If he or she instead deletes it, it is judged as uninteresting. A screen dump from the tested version of ConCall is shown in figure 1. This view shows the window in which users can inspect their filter and review system suggestions for changes to the filter.
The study performed had several objectives. Since this was the first study performed on the implemented system, we aimed at evaluating the general functionality and interface as part of the study. The evaluation has been reported in [Averman forthcoming, Waern et al. forthcoming] and are not further discussed here. In addition to this evaluation, we also aimed at collecting data in terms of logs and user profiles, that would later be used to tune the adaptation algorithms for user profiles, as well as provide future editors with a suitable collection of keywords to start annotating with.
There were 11 subjects in the experiment. They all had extensive experience with computers and most of the subjects (eight out of eleven) read and handled CFPs on a regular basis.
The test was conducted in three steps. The first step was a 10-minute part where the test participant were put before two interview questions and then asked to fill out a questionnaire. Then ensued the actual running of and interacting with the system, taking approximately 20 to 30 minutes. The test-run was followed by a part where the participant got a paper stack of conference calls, containing all the conference calls within the database. The participant was asked to sort through the calls and indicate which, if any, calls they would have liked to have seen, i.e. any of interest regardless if they been presented to them during the hands-on testing of the system. This information was later used to estimate how well the system performed in terms of precision (how large part of the retrieved calls were relevant to the user) and recall (how large part of the relevant calls were retrieved) [Van Rijsbergen 1979]. The last part of the study was a questionnaire with 10 questions evaluating post-reactions and possible extensions to the system where put to the user.
In addition to the information collected by the editor, the system logged all relevant user actions (save, delete, remind, look at full call), as well as changes to user profiles. The intention is that this information will later be used in tuning adaptive functionality as well as comparing different algorithms for user adaptation.
From user comments and questionnaire answers, we could conclude that there was a true need for the envisioned service. Users expressed a wish for support in three different tasks: one, a reminder service, two, a filtering service, and three, to be able to sort CFPs into categories. The tested prototype supports the two first of these functions.
The candidate profile was received with mixed responses. The suggestions presented in the candidate profile sometimes agreed with the user’s area of interest, but it did also come back with suggestions that were of no relevance, or once or twice even with something contradicting the user's interests. This effect can be attributed to the fact that the adaptivity was not tuned, both in terms of what features were used, and in terms of how the user modelling algorithms updated the weights for features. Despite this behaviour, the suggestions were mostly welcome by the participants. The use of the candidate profile suggestions was quite low, but this can be attributed to the experimental situation: subjects were preoccupied with examining their results from the profile filtering, and few went back and changed their profile at all.

Figure 2. Precision, recall and number of filter items per user in the ConCall study.
The effects of the experimental situation were most clear when reviewing filter performance in terms of presicion and recall. On average both precision and recall were very low in this study. However, the results from the study show that it was possible to achieve good performance with this type of filter. It is useful to compare the results for users number seven and four (see figure 2). User number seven achieved fairly good results in both precision and recall, whereas the results were poor for user number four. These users were similar in many other respects: Both entered a fairly low amount of terms into their profiles (14 and 10 terms respectively), and both changed their profiles more than the other participants. The difference lay entirely in there motivation and simulation of a real usage situation: user number seven was highly motivated, and simulated as near a real-life situation this test could make possible. User number four mostly flicked through the system and did not bother about being as sincere in using it. Another difference was that user number seven had more experience in his/her field when it came to handling calls for papers.
The low figures in terms of precision and recall show that the profiles obtained during the study actually were bad profiles. This poses a serious problem for using this data in bootstrapping adaptivity, as the profiles that the system should generate are still largely unknown. However, it is possible to instead generate optimal profiles 'backwards', from the ratings of documents by users. Instead of using the profiles that users themselves constructed, we can construct a profile for the user that best fit the calls that they judged as relevant, and use these profiles as the optimal target for adaptive behaviour of the system.
Since the adaptive functionality of ConCall merely is advisory (users set up their actual filters manually), it is possible to tune ConCall entirely at run time by data mining logs from actual usage. An attractive feature of moving the bootstrap problem into a run-time setting is that the actual usage of the system will determine its adaptivity, rather than deducing adaptivity rules from a more or less faked experimental situation. As was seen in the ConCall study, the experimental situation itself lead to undesireable effects. In particular, the experiment subjects were not prone to set up their filters in a way that was consistent with their actual preferences - instead, they played around with the system experimenting with its capabilities and functionality.
The possible drawback of run-time bootstrapping is that the system will give very bad advice initially, making users less prone to trust the system's advice later, when it is actually making sense. This is illustrated well by one participant of the study, that got extremely frustrated by the system's suggestions. In a commercial system, it is recommendable to do this initial testing with a smaller group of test users rather than with the full user community. It should be noted that such tuning very well can be combined with an evaluative study of the system in the same way as was done in the ConCall study. For example, a subset of users can be selected for the experiment, and given an evaluative questionnaire as well as a full set of documents to rate, to allow a precision and recall evaluation of the running system.
One problem that we encountered in this study was how to achieve annotations for documents when the system was not in use. This is again a version of the bootstrapping problem, but this time from a human rather than a machine intelligence perspective. We used six different experts to collect the CFPs used in the experiment. These experts selected calls for the database and provided annotations for them. Many of them expressed difficulties in selecting annotations for CFPs. When reviewing these calls, it was clear that the experts used very different strategies for annotations, as well as some variation in how the same annotations were worded. In the experiment, we did some harmonization of the wording of annotations, but else the annotations were left as they were stated by the experts. In the end, we obtained a large range of keywords that were sparsely used, and this caused problems both for the machine intelligence (in maintaining the candidat user profiles) and for end users in setting up their filters. The problem can partially be overcome by using a single editor for all call annotations, but it can only be fully overcome over time, by synchronously boot-strapping editor annotations and user profiles.
For comments or questions contact: http://www.sics.se/~annika/
Charlotte Averman. Using "Human-in-the-loop" in an Adaptive System:
An Evaluation Study of the ConCall System Master thesis, dept. of informatics, Gothenburgh University, forthcoming, March 1999.
John S. Breese, David Heckerman, and Carl Kadie. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. May 1998 (revised October 1998) In: Proceedings of Fouteenth Conference on Uncertainty in AI, Madison, WI, July 1998. Morgan Kaufmann Publisher.
Alfred Kobsa and Wolfgang Wahlster, eds. User Models in Dialog Systems. Springer Verlag, 1989.
Paul Resnick and Hal R. Varian. Recommender Systems. Introduction to special issue, Communications of the ACM, Vol. 40, No. 2 March,1997.
C.J. Van Rijsbergen. Information Retrieval, Second Ed., ISBN 0-408-70929-4, Butterworths, (1979).
Waern, Annika, Tierney, Mark, Rudström, Åsa and Laaksolahti, Jarmo. ConCall: An information service for researchers based on EdInfo. SICS technical report T98:04, October 1998.
Waern, Annika, Tierney, Mark, Rudström, Åsa, Laaksolahti, Jarmo and Mård, Torben. ConCall: Edited and Adaptive Information Filtering. Proc. of the Intelligent User Interfaces Conference, Los Angeles, Cal. January 1999.
Annika Waern, Charlotte Averman, Mark Tierney and Åsa Rudström. Information Services Based on User Profile Communication. in Proceedings of the seventh International Conference on User Modelling, Banff, Canada, forthcoming June 1999.