DFKI-LT - A Corpus Study and Annotation Schema for Named Entity Recognition of Business Products
A Corpus Study and Annotation Schema for Named Entity Recognition of Business Products
4 Proceedings of the 11th International Conference on Language Resources and Evaluation, Miyazaki, Japan, European Language Resources Association, 2018
Recognizing non-standard entity types, such as B2B products and product classes, in news and forum texts is important in application areas such as supply chain monitoring and market research. However, there is a decided lack of annotated corpora and annotation guidelines in this domain. In this work, we present a corpus study, an annotation schema and associated guidelines, and a preliminary corpus for English B2B product named entity recognition. We find that although product mentions are often realized as noun phrases, defining the exact extent of a mention is difficult due to high boundary ambiguity and the broad syntactic and semantic variety of surface realizations of products.