Centrum für Informations- und Sprachverarbeitung
print


Breadcrumb Navigation


Content

EU-Projekt "IMPACT"

Folgende Inhalte liegen auf Englisch vor:


CONCEPT

In the i2010 vision of a European Digital Library, the EU launched an ambitious plan for large scale digitisation projects transforming Europe’s printed heritage into digitally available resources. The aim of fully integrating intellectual content into the modern information and communication technologies environment can only be achieved by full-text digitisation: transforming digital images of scanned books into electronic text.

Over the last 2-3 years mass-digitisation has become one of the most prominent issues in the library world. Today, a number of advanced libraries in Europe are scanning millions of pages each year and large scale-digitisation is a matter of fact, not a vision any more. However, these efforts can tackle only a fraction of the total heritage available in cultural memory organisations. The digitised material is becoming available too slowly and in too small quantities from too few sources, for three reasons.

  1. There is a lack of institutional knowledge and expertise which causes inefficiency and ‘re-inventing the wheel’. This is a problem for the vast majority of libraries, museums and archives in Europe.
  2. The costs for full-featured electronic text of historical documents are much too high. Cultural heritage institutions will not be able to satisfy the needs of their users for electronic texts instead of pure digital images. Manual keying costs around 1 EUR per page, so that a typical book sums up to 400, 500 or even 1000 EUR.
  3. Automated text recognition, carried out by Optical Character Recognition (OCR) engines does in many cases not produce satisfying results for historical documents. Recognition rates are poor or even useless. No commercial or other OCR engine is able to cope satisfactorily with the wide range of printed materials published between the start of the Gutenberg age in the 15th century and the start of the industrial production of books in the middle of the 19th century.

The IMPACT project will remove many of these barriers. The project will push innovation in OCR technology and language technology for historical document processing and retrieval, and share expertise to build capacity in digitisation across Europe. During the project a Centre of Competence will be set up in order to provide a central service entry point for all libraries, archives and museums involved in the digitisation of textual material.

The consortium brings together twenty-six national and regional libraries, research institutions and commercial suppliers who will share their know-how and best practices, develop innovative tools to enhance the capabilities of OCR engines and the accessibility of digitised text and lay down the foundations for the mass-digitisation programmes that will take place over the next decade.


Objectives

Significantly improve access to historical text

  • Innovate OCR technology
    – By exploring the challenges using different approaches, rather than from just one side
    – By developing cutting-edge approaches such as collaborative correction
  • Provide innovative language technologies to remove the historical language barrier
  • Ensure the interoperability of the results
    – By defining an overall technical architecture and monitoring technical integration across all parts of the project

 
Take away the barriers that stand in the way of the mass digitization of the European cultural heritage

  • Provide Best Practice guidance about the operational context for digitisation
  • Deliver a coherent programme of dissemination, training and demonstration aimed at capacity-building in and beyond participating institutions
  • Address the needs of end-users and holders of collections of material in languages other than English


Ensure that tools and services will be sustained after the end of the project

  • Build a network of competence centres in order to provide a single access point for all players involved in mass-digitisation and full-text generation
  • Define strategies for the exploitation of the results during the project

FACTS AND FIGURES

Total Budget
16,5 M Euro

EC contribution
12,1 M Euro

Funding
IMPACT is funded under the Seventh Framework Programme of the European Commission (FP7). It is part of the Cooperation Work Programme for ICT and responds to the fourth challenge in this programme: Digital Libraries and Content. http://cordis.europa.eu/fp7/ict

Project Partners
26 national and regional libraries, research institutions and commercial suppliers

Coordinator
IMPACT is coordinated by the National Library of the Netherlands (KB)

Sub-Projects
There are 23 Work Packages spread over 4 Sub-Projects
Operational Context (led by The British Library)
Text Recognition (led by the University of Innsbruck)
Enhancement and Enrichment (led by the Austrian National Library)
Capacity Building (led by The British Library)

Start date
1 January 2008

Duration
4 years


Service
Twitter