Bias-free Hypothesis Evaluation in Multirelational Domains

Christine Körner, Stefan Wrobel

Abstract

In propositional domains, using a separate test set via random sampling or cross validation is generally considered to be an unbiased estimator of true error. In multirelational domains, previous work has already noted that linkage of objects may cause these procedures to be biased, and has proposed corrected sampling procedures. However, as we show in this paper, the existing procedures only address one particular case of bias introduced by linkage. We recall that in the propositional case cross validation measures off-training set (OTS) error and not true error and illustrate the difference with a small experiment. In the multirelational case, we show that the distinction between training and test set needs to be carefully extended based on a graph of potentially linked objects, and on their assumed probabilities of reoccurrence. We demonstrate that the bias due to linkage to known objects varies with the chosen proportion of the training/test split and present an algorithm, generalized subgraph sampling, that is guaranteed to avoid bias in the test set for more generalized cases.