Distributed Pattern Recognition in RapidMiner

Alexander Arimond, Christian Kofler, Faisal Shafait

In: RapidMiner Community Meeting and Conference. RapidMiner Community Meeting and Conference (RCOMM-10) September 13-16 Dortmund Germany Online 9/2010.


RapidMiner already provides easy to use interfaces for developing and evaluating Pattern Recognition and Machine Learning applications. However, it has only limited support for parallelization and it lacks functionality to spread long-running computations over multiple machines. A solution to this is distributed computing with paradigms like MapReduce. In this paper, we present a system called DisPaRe, which integrates distributed computing frameworks into RapidMiner. A special focus is put on utilizing MapReduce as a programming model. The frameworks GridGain and Oracle Coherence are reviewed and evaluated with respect to their suitability to fit into the context of RapidMiner. The system provides effective means for transparently utilizing these frameworks and enabling RapidMiner processes to parallelize their computations within a distributed environment.


Arimond-DisPaRe-RCOMM10.pdf (pdf, 245 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence