Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification

Cannannore Nidhi Narayana Kamath, Syed Saqib Bukhari, Andreas Dengel

In: DocEng. ACM Symposium on Document Engineering (DocEng-2018) August 28-31 Halifax, Nova Scotia Canada ACM 2018.


In this contemporaneous world, it is an obligation for any orga- nization working with documents to end up with the insipid task of classifying truckload of documents, which is the nascent stage of venturing into the realm of information retrieval and data min- ing. But classification of such humongous documents into multiple classes, calls for a lot of time and labor. Hence a system which could classify these documents with acceptable accuracy would be of an unfathomable help in document engineering. We have created multiple classifiers for document classification and com- pared their accuracy on raw and processed data. We have garnered data used in a corporate organization as well as publicly available data for comparison. Data is processed by removing the stop-words and stemming is implemented to produce root words. Multiple traditional machine learning techniques like Naive Bayes, Logistic Regression, Support Vector Machine, Random forest Classifier and Multi-Layer Perceptron are used for classification of documents. Classifiers are applied on raw and processed data separately and their accuracy is noted. Along with this, Deep learning technique such as Convolution Neural Network is also used to classify the data and its accuracy is compared with that of traditional machine learning techniques. We are also exploring hierarchical classifiers for classification of classes and subclasses. The system classifies the data faster and with better accuracy than if done manually. The results are discussed in the results and evaluation section.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence