Agency: Cordis | Branch: FP7 | Program: CP | Phase: ICT-2007.4.1 | Award Amount: 16.56M | Year: 2008
Text that is not digital is virtually invisible. Todays readers search the internet for electronically accessible texts rather than visit the reading room of a library. Born-digital and digitised contemporary materials contain the richness that allows tools such as text mining and the semantic web to offer superior accessibility but the story is very different for historic documents. A vital part of the European heritage, encompassing more than four centuries of historic books and bound periodicals is becoming less and less visible to the public at large.\nWith the i2010 vision of a European Digital Library, the EU has launched an ambitious plan for large scale digitisation projects transforming Europes printed heritage into digitally available resources. However, lack of institutional knowledge and expertise slows down the pace with which this vision can be realised. The state-of-the-art in OCR performance and machine understanding of the original document is inadequate, especially for historically important material with archaic fonts and spellings, newspapers with complex layouts, bound volumes, microfilm or typescript.\nThe IMPACT project will remove many of these barriers. It brings together fifteen national and regional libraries, research institutions and commercial suppliers - all centres of competence with unequalled experience of large-scale text digitisation processes and technologies. The project will let them share their know-how and best practices, develop innovative tools to enhance the capabilities of OCR engines and the accessibility of digitised text and lay down the foundations for the mass-digitisation programmes that will take place over the next decade. This project will facilitate a more collaborative approach to mass-digitisation. It will build capacity and lower the barriers to entry for organisations in the early stages of their own digitisation activity.
Agency: Cordis | Branch: FP7 | Program: CSA | Phase: INFRA-2010-3.3 | Award Amount: 759.69K | Year: 2010
The transition from science to e-Science is happening: a data deluge emerges from publicly- funded research facilities; a massive investment of public funds into the potential answer to the grand challenges of our times. This potential can only be realised by adding an interoperable data sharing, re-use and preservation layer to the emerging eco-system of e-Infrastructures. The importance of this layer, on top of emerging connectivity and computational layers, has not yet been addressed coherently at ERA or global level. All stakeholders in the scientific process must be involved in its design this layer: policy makers, funders, infrastructure operators, data centres, data providers and users, libraries and publishers.\nODE is proposed by the Alliance for Permanent Access and four of its members: CERN, Finnish Computer Centre for Science, Helmholtz Association and UK Science and Technology Funding Council. Collectively, we represent all the stakeholder groups listed above and have a significant sphere of influence within those communities. This will allow us to identify, collate, interpret and deliver evidence of emerging best practices in sharing, re-using, preserving and citing data, the drivers for these changes and barriers impeding progress, in forms suited to each audience. We will:\n-Enable operators, funders, designers and users of national and pan-european e-Infrastructures to compare their vision and explore shared opportunities\n-Provide projections of potential data re-use within research and educational communities in and beyond the ERA, their needs and differences\n-Demonstrate and improve understanding of best practices in the design of e-Infrastructures leading to more coherent national policies\n-Document success stories in data sharing, visionary policies to enable data re-use, and the needs and opportunities for interoperability of data layers to fully enable e-Science\n-Make that information available in readiness for FP8
Brocks H.,University of Hagen |
Kranstedt A.,Deutsche Nationalbibliothek |
Jaschke G.,Globale Informationstechnik GmbH |
Hemmje M.,University of Hagen
Studies in Computational Intelligence | Year: 2010
Digital preservation can be regarded as ensuring communication with the future, that means ensuring the persistence of digital resources, rendering them findable, accessible and understandable for supporting contemporary reuse as well as safeguarding the interests of future generations. The context of a digital object to be preserved over time comprises the representation of all known properties associated with it and of all operations that have been carried out on it. This implies the information needed to decode the data stream and to restore the original content, information about its creation environment, including the actors and resources involved, and information about the organizational and technical processes associated with the production, preservation, access and reuse of the digital object. In this article we propose a generic context model which provides a formal representation for capturing all these aspects, to enable retracing information paths for future reuse. Building on experiences with the preservation of digital documents in so-called memory institutions, we demonstrate the feasibility of our approach within the domain of scientific publishing. © 2010 Springer-Verlag Berlin Heidelberg.
Agency: Cordis | Branch: FP7 | Program: CP | Phase: ICT-2007.4.3 | Award Amount: 3.97M | Year: 2009
KEEP (Keeping Emulation Environments Portable) will develop an Emulation Access Platform to enable accurate rendering of both static and dynamic digital objects: text, sound, and image files; multimedia documents, websites, databases, videogames etc. The overall aim of the project is to facilitate universal access to our cultural heritage by developing flexible tools for accessing and storing a wide range of digital objects.\n\nThe very success of computing technology, where machines are rapidly superseded, has created a serious and growing challenge of how to preserve access to digital material produced on obsolete machines. Cultural heritage organisations are particularly sensitive to the threat of major data loss resulting from technical obsolescence. KEEP will develop an Emulation Access Platform to enable the accurate rendering of these objects, designed for a wide variety of computer systems, so that they can be securely accessed in the long term.\n\nKEEP will address the problems of transferring digital objects stored on outdated computer media such as floppy discs onto current storage devices. This will involve the specification of file formats and the production of transfer tools exploited within a framework, and taking into account possible legal and technical issues. KEEP will address all aspects ranging from safeguarding the original bits from the carrier to offering online services to end-users via a highly portable emulation framework running on any possible device. In addition to producing a software package, the project will deliver understanding about how to integrate emulation-based solutions with an operational electronic deposit system. Existing metadata models will be researched and guidelines will be developed for mapping digital objects to emulated manifestations. Overall, KEEP will create the foundation for the next generation of permanent access strategies based on emulation.\n\n Although primarily aimed at those involved in Cultural Heritage, suc
Nandzik J.,Acosta Consult GmbH |
Litz B.,Deutsche Nationalbibliothek |
Flores-Herr N.,Acosta Consult GmbH |
Lohden A.,Deutsche Nationalbibliothek |
And 10 more authors.
Multimedia Tools and Applications | Year: 2013
An ever-growing amount of digitized content urges libraries and archives to integrate new media types from a large number of origins such as publishers, record labels and film archives, into their existing collections. This is a challenging task, since the multimedia content itself as well as the associated metadata is inherently heterogeneous - the different sources lead to different data structures, data quality and trustworthiness. This paper presents the contentus approach towards an automated media processing chain for cultural heritage organizations and content holders. Our workflow allows for unattended processing from media ingest to availability thorough our search and retrieval interface. We aim to provide a set of tools for the processing of digitized print media, audio/visual, speech and musical recordings. Media specific functionalities include quality control for digitization of still image and audio/visual media and restoration of the most common quality issues encountered with these media. Furthermore, the contentus tools include modules for content analysis like segmentation of printed, audio and audio/visual media, optical character recognition (OCR), speech-to-text transcription, speaker recognition and the extraction of musical features from audio recordings, all aimed at a textual representation of information inherent within the media assets. Once the information is extracted and transcribed in textual form, media independent processing modules offer extraction and disambiguation of named entities and text classification. All contentus modules are designed to be flexibly recombined within a scalable workflow environment using cloud computing techniques. In the next step analyzed media assets can be retrieved and consumed through a search interface using all available metadata. The search engine combines Semantic Web technologies for representing relations between the media and entities such as persons, locations and organizations with a full-text approach for searching within transcribed information gathered through the preceding processing steps. The contentus unified search interface integrates text, images, audio and audio/visual content. Queries can be narrowed and expanded in an exploratory manner, search results can be refined by disambiguating entities and topics. Further, semantic relationships become not only apparent, but can also be navigated. © 2012 Springer Science+Business Media, LLC.