DeepThought Consortium
       
DeepThought Home

Consortium

Project Overview

Publications

Techn. Documents
                           
The Heart of Gold

Roadmap

Introduction to the Project Idea

A novel combination of deep and shallow processing methods is utilized to extend
classical information retrieval by high precision concept indexing and relation
detection. The two extensions will exploit domain knowledge and a highly specialized
deep semantic analysis.

The project combines three goals:

1. To perform pioneering research with the aim to demonstrate the potential
of deep semantic processing if combined with shallow methods for robustness.

2. To demonstrate the feasibility of three ambitious applications on the basis
of the novel approach.

3. To lay out a road map for an even more ambitious interdisciplinary endeavour
within the 6th Framework Programme building on the results of this and related
projects.

The need for deep semantic processing of language has always been obvious. The
challenges of the knowledge society cannot be met without getting at the contents
of the vast volume of digital information. The concept of a semantic web is
a viable vision; hoping, however, that the semantic structuring of such large
volumes of unstructured information can be achieved by human authors or editors,
is rather naive. There is no chance to circumvent the challenge of true semantic
analysis on the way to the knowledge society.

So far, all attempts to accomplish a full semantic analysis of unrestricted
texts have failed. It has been impossible to meet the minimum requirements for
efficiency, robustness and coverage. Not more than one of these desiderata could
be fulfilled at the same time. Earlier attempts to process texts with large-coverage
deep-analysis grammars suffered from a lack of efficiency. Even the extension
and testing of large grammars turned out to be unfeasible because of the efficiency
problem. It was not until very recently that some members of this consortium
could report a jointly achieved breakthrough in deep syntactic and semantic
analysis.

By exploiting cross-fertilization in an international collaboration, the first
HPSG (head-driven phrase structure grammar) parser was put to service that was
fast enough for real time speech applications. This parser is already employed
in several projects among them industrial developments. In a large speech translation
project, grammars for English, German and Japanese were developed that combined
coverage and depth. This still left the robustness issue unsolved. One partner
in our consortium experimented with improving robustness through partial syntactic/semantic
analysis. Another partner approached the problem by combining deep processing
with shallow processing in such a way that the robustness of the hybrid system
could not be worse that the robustness of the shallow components.

The novel idea was to preserve the advantages of shallow processing while adding
more accuracy and depth in a controlled fashion at places where the application
has a real demand for such increase in semantic analysis. The goal is the detection
of relevant types of information, not full text understanding. Shallow processing
enriches a text with XML annotations (POS, phrases, named entities, simple relations).
Deep processing is only called at places where shallow analysis hypothesizes
relevant relations but cannot detect or select the correct relations. This approach
has important advantages. Robustness is maintained. The necessary coverage can
be provided by adding to the full-coverage shallow grammars specialized deep
grammar for the relevant domains and semantic relations. Efficiency is guaranteed
by adding fast deep processing to the super-fast shallow analysis only at places
where it is needed and where it has a fair chance.

Four types of technologies are combined:

1. statistical methods for tokenization, IR indexing and search, POS-tagging
and chunk parsing

2. wordnets and domain ontologies for conceptual indexing and detection of ambiguity
and polysemy

3. weighted finite-state transduction technology for named entity and simple
relation detection

4. HPSG parsing with minimal recursion semantics (MRS) for the detection of
complex and covert relations

We have now almost reached a point where this type of hybrid processing can
be applied in realistic applications. As the breakthrough in efficiency could
only be achieved in a cooperation of a few suited centres, the next step again
requires international collaboration. Research labs with extensive experience
in deep and shallow processing are needed to improve the hybrid technology and
apply it to plausible domains and tasks. Only very few innovative companies
possess the necessary competence and market orientation for the adoption and
exploitation of the new technology. Two of those few have joined our consortium.

The research including the creation of the necessary grammars and lexical has
three main themes:

1. Integration of Shallow and Deep Processing and Representations

2. Dynamic Stochastic HPSG for Disambiguation and Robustness

3. Multilingual HPSG Grammar Engineering

Three knowledge-intensive application types have been identified that will benefit
immensely from the increased depth in semantic analysis. All three applications
will work on semantically annotated texts:

1. Information extraction for business intelligence

2. Email response management for customer relationship management

3. Creativity support for document production and collective brainstorming

All three applications are innovative. Information extraction and automatic
email response have been tried before but were seriously hampered by the lack
of semantic analysis. Completely new is the use of semantic information retrieval
for supporting creative processes, which opens up a range of truly novel applications
with unprecedented functionality. Combined with speech recognition they provide
valuable IT support for real and virtual meetings.

The project participants regard the project as one important step towards a
new generation of powerful knowledge-intensive applications, integrating text
and speech processing with knowledge and web technologies. In order to cooperate
with other projects and initiatives in this exciting endeavour and to plan for
a joint major technology push in the Sixth Framework Programme, a special workpackage
is proposed for technology forecast and planning. In this workpackage a roadmap
for a large interdisciplinary initiative or project will be drawn, that unites
several existing or new developments under a common goal and work programme.

To this end a workshop is planned that will bring together experts from several
academic fields and engineering disciplines as well as representatives from
several industrial sectors for a creative brainstorming exercise. Software for
constructing, displaying and editing the roadmap will be taken over from another
project. The results of the roadmap workpackage will also be fed as specialized
input into the broader roadmap activities of the European Network of Language
and Speech (ELSNET).