Exploring Deployment of Linguistic Features in Classification of Polish Texts

Jakub Piskorski, Marcin Sydow

In: Z. Vetulani (Hrsg.). 2nd Language and Technology Conference. International Language Technologies Conference (IS-LTC) Poznan, Poland Seiten 81-84 4/2005.


This paper reports on some preliminary experiments of deploying linguistic features for classification of Polish texts. In particular, we explore the impact of lemmatization and various term-selection strategies relying on inclusion and exclusion of certain named-entity classes. A slight improvement against the bag-of-words approach can be observed, but there is still a lot of place for improvement.

