Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Valia Kordoni; Carlos Ramisch; Aline Villavicencio
Association for Computational Linguistics, Portland, Oregon, USA, 6/2011.
Abstract
The ACL 2011 Workshop on Multiword Expressions: from Parsing and Generation to the Real World
(MWE 2011) took place on June 23, 2011 in Portland, Oregon, USA, in conjunction to the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL HLT
2011). The workshop has been held every year since 2003 in conjunction with ACL, EACL, COLING
and LREC.
Multiword Expressions (MWEs) range over linguistic constructions such as idioms (a frog in the throat,
kill some time), fixed phrases (per se, by and large, rockn roll), noun compounds (telephone booth,
cable car), compound verbs (give a presentation, go by [a name]), etc. While easily mastered by native
speakers, their interpretation poses a major challenge for computational systems, due to their flexible
and heterogeneous nature. Surprisingly enough, MWEs are not nearly as frequent in NLP resources
(dictionaries, grammars) as they are in real-word text, where they have been reported to account for
over 70% of the terms in a domain. Thus, MWEs are a key issue and a current weakness for tasks
like Natural Language Parsing (NLP) and Generation (NLG), as well as real-life applications such as
Machine Translation.
MWE 2011 is the 8th event in the series, and the time has come to move from basic preliminary research
and theoretical results to actual applications in real-world NLP tasks. Therefore, following further the
trend of previousMWEworkshops, we have now turned our focus towards MWEs on NLP applications,
specifically towards Parsing and Generation of MWEs, as there is a wide range of open problems that
prevent MWE treatment techniques to be fully integrated in current NLP systems. We have thus asked
our contributors for original research related (but not limited) to the following topics:
Lexical representations: In spite of several proposals for MWE representation ranging along
the continuum from words-with-spaces to compositional approaches connecting lexicon and
grammar, to date, it remains unclear how MWEs should be represented in electronic dictionaries,
thesauri and grammars. New methodologies that take into account the type of MWE and its
properties are needed for efficiently handling manually and/or automatically acquired expressions
in NLP systems. Moreover, strategies are also needed to represent deep attributes and semantic
properties for these multiword entries.
Task and Application-oriented evaluation: Evaluation is a crucial aspect for MWE research.
Various evaluation techniques have been proposed, from manual inspection of top-n candidates to
classic precision/recall measures. However, to get a clear indication of the effect of incorporating
a treatment ofMWEs in a particular context, task and application-oriented evaluations are needed.
We have thus called for submissions that study the impact of MWE handling in the context of
Parsing, Generation, Information Extraction, Machine Translation, Summarization, etc.
Type-dependent analysis: While there is no unique definition or classification of MWEs,
most researchers agree on some major classes such as named entities, collocations, multiword
terminology and verbal expressions. These, though, are very heterogeneous in terms of syntactic
and semantic properties, and should thus be treated differently by applications. Type-dependent
analyses could shed some light on the best methodologies to integrate MWE knowledge in our
analysis and generation systems.
MWE engineering: Where do MWEs go after being extracted? Do they belong to the lexicon
and/or to the grammar? In the pipeline of linguistic analysis and/or generation, where should we
insert MWEs? And even more important: HOW? Because all the effort put in automatic MWE
extraction will not be useful if we do not know how to employ these rich resources in our real-life
NLP applications!
This year, we had three different submission types: long, short and demonstration papers. We received
a total of 31 submissions, from which 16 were long papers, 9 were short papers and 6 were demo
papers. Given our limited capacity as a one-day workshop, we were only able to accept 6 long papers
for oral presentation and 4 long papers as posters: an acceptance rate of 62.5%. We further accepted 4
short papers for oral presentation and 2 short papers as posters (67% acceptance), as well as 5 out of
the 6 proposed demonstrations. The oral presentations were distributed in three sessions: Short Papers,
Identification and Representation, and Tasks and Applications. The workshop also featured two invited
talks, by Timothy Baldwin and by Kenneth Church, and a panel discussion.
We would like to thank the members of the Program Committee for the timely reviews. We would also
like to thank the authors for their valuable contributions.
Valia Kordoni, Carlos Ramisch, Aline Villavicencio
Co-Organizers
@book{pub5662,
author = {
Kordoni, Valia
and
Ramisch, Carlos
and
Villavicencio, Aline
},
title = {Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World},
year = {2011},
month = {6},
publisher = {Association for Computational Linguistics}
}
German Research Center for Artificial Intelligence Deutsches Forschungszentrum für Künstliche Intelligenz