In the TWENTYONE project environmental organisations, technology providers and research institutes from various European countries are working together to make documents on environmental issues - in particular on the subject of sustainable development - available on CD-ROM and on the Internet. To date these documents exist in a number of different media (paper, electronic documents, audio-visual material), formats (HTML, Word, PageMaker), and languages, and are often not available through standard publication channels. This diversity of media, formats and languages impedes both the distribution of documents and the possibility of targeted searching for and retrieval of required material. As a result of the TWENTYONE project all of these types of documents can be made available on CD ROM and on the Internet. Search engines are being developed to locate information, while automatic translation systems are employed to aid the comprehension of foreign texts. The economic advantage for the documents' authors and editors consists in their increased distribution, while the user profits from said documents and their contents being more easily accessible.
- Documents which are available on paper rather than as electronic documents are being scanned in and converted.
- The document structure is automatically analysed and the content of the text disclosed via OCR (Optical Character Recognition) and linguistic processing.
- The use of fuzzy-matching during the search process makes the system robust against spelling mistakes in documents and queries, and against OCR- errors.
- As a result of linguistic analysis of the texts it is possible to search not only for individual words, but also for relevant phrases (e.g. sustainable development).
Getronics Software NL (Co-ordinator), Environ Trust Ltd UK, Highland Software UK, Friends of the Earth Europe Belgium, Klima-Bündnis, Rank Xerox Research Center Grenoble, MOOI Foundation NL, TNO-TPD Delft NL, University of Tübingen, University of Twente NL, VODO Belgium