Nandzik J.,Acosta Consult GmbH |
Litz B.,Deutsche Nationalbibliothek |
Flores-Herr N.,Acosta Consult GmbH |
Lohden A.,Deutsche Nationalbibliothek |
And 10 more authors.
Multimedia Tools and Applications | Year: 2013
An ever-growing amount of digitized content urges libraries and archives to integrate new media types from a large number of origins such as publishers, record labels and film archives, into their existing collections. This is a challenging task, since the multimedia content itself as well as the associated metadata is inherently heterogeneous - the different sources lead to different data structures, data quality and trustworthiness. This paper presents the contentus approach towards an automated media processing chain for cultural heritage organizations and content holders. Our workflow allows for unattended processing from media ingest to availability thorough our search and retrieval interface. We aim to provide a set of tools for the processing of digitized print media, audio/visual, speech and musical recordings. Media specific functionalities include quality control for digitization of still image and audio/visual media and restoration of the most common quality issues encountered with these media. Furthermore, the contentus tools include modules for content analysis like segmentation of printed, audio and audio/visual media, optical character recognition (OCR), speech-to-text transcription, speaker recognition and the extraction of musical features from audio recordings, all aimed at a textual representation of information inherent within the media assets. Once the information is extracted and transcribed in textual form, media independent processing modules offer extraction and disambiguation of named entities and text classification. All contentus modules are designed to be flexibly recombined within a scalable workflow environment using cloud computing techniques. In the next step analyzed media assets can be retrieved and consumed through a search interface using all available metadata. The search engine combines Semantic Web technologies for representing relations between the media and entities such as persons, locations and organizations with a full-text approach for searching within transcribed information gathered through the preceding processing steps. The contentus unified search interface integrates text, images, audio and audio/visual content. Queries can be narrowed and expanded in an exploratory manner, search results can be refined by disambiguating entities and topics. Further, semantic relationships become not only apparent, but can also be navigated. © 2012 Springer Science+Business Media, LLC.
Agency: European Commission | Branch: FP7 | Program: NOE | Phase: ICT-2009.4.1 | Award Amount: 8.21M | Year: 2011
Digital preservation offers the economic and social benefits associated with the long-term preservation of information, knowledge and know-how for reuse by later generations. However, digital preservation has a great problem, namely that preservation support structures are built on projects which are short lived and fragmented. The unique feature of the APARSEN network is that it is building on the already established Alliance for Permanent Access (APA), a membership organisation of major European stakeholders in digital data and digital preservation. These stakeholders have come together to create a shared vision and framework for a sustainable digital information infrastructure providing permanent access to digitally encoded information.\n\nTo this self-sustaining grouping APARSEN will bring a wide range of other experts in digital preservation including academic and commercial researchers, as well as researchers in other cross-European organisations. The members of the APA and other members of the consortium already undertake research in digital preservation individually, but even here the effort is fragmented despite smaller groupings of these organisations working together in specific EU and national projects. APARSEN will help to combine and integrate these programmes into a shared programme of work, unified in a common vision, thereby creating the pre-eminent Virtual Centre of Excellence in digital preservation in Europe, if not the world. The APA provides a natural basis for a longer term consolidation of digital preservation research and expertise.\n\nThe Joint Programme of Activity will cover:\n technical methods for preservation, access and most importantly reuse of data holdings over the whole lifecycle;\n legal and economic issues including costs and governance issues as well as digital rights\n outreach within and outside the consortium to help to create a discipline of data curators with appropriate qualifications.
Agency: European Commission | Branch: FP7 | Program: CSA | Phase: INFRA-2010-3.3 | Award Amount: 759.69K | Year: 2010
The transition from science to e-Science is happening: a data deluge emerges from publicly- funded research facilities; a massive investment of public funds into the potential answer to the grand challenges of our times. This potential can only be realised by adding an interoperable data sharing, re-use and preservation layer to the emerging eco-system of e-Infrastructures. The importance of this layer, on top of emerging connectivity and computational layers, has not yet been addressed coherently at ERA or global level. All stakeholders in the scientific process must be involved in its design this layer: policy makers, funders, infrastructure operators, data centres, data providers and users, libraries and publishers.\nODE is proposed by the Alliance for Permanent Access and four of its members: CERN, Finnish Computer Centre for Science, Helmholtz Association and UK Science and Technology Funding Council. Collectively, we represent all the stakeholder groups listed above and have a significant sphere of influence within those communities. This will allow us to identify, collate, interpret and deliver evidence of emerging best practices in sharing, re-using, preserving and citing data, the drivers for these changes and barriers impeding progress, in forms suited to each audience. We will:\n-Enable operators, funders, designers and users of national and pan-european e-Infrastructures to compare their vision and explore shared opportunities\n-Provide projections of potential data re-use within research and educational communities in and beyond the ERA, their needs and differences\n-Demonstrate and improve understanding of best practices in the design of e-Infrastructures leading to more coherent national policies\n-Document success stories in data sharing, visionary policies to enable data re-use, and the needs and opportunities for interoperability of data layers to fully enable e-Science\n-Make that information available in readiness for FP8
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2007.4.1 | Award Amount: 16.56M | Year: 2008
Text that is not digital is virtually invisible. Todays readers search the internet for electronically accessible texts rather than visit the reading room of a library. Born-digital and digitised contemporary materials contain the richness that allows tools such as text mining and the semantic web to offer superior accessibility but the story is very different for historic documents. A vital part of the European heritage, encompassing more than four centuries of historic books and bound periodicals is becoming less and less visible to the public at large.\nWith the i2010 vision of a European Digital Library, the EU has launched an ambitious plan for large scale digitisation projects transforming Europes printed heritage into digitally available resources. However, lack of institutional knowledge and expertise slows down the pace with which this vision can be realised. The state-of-the-art in OCR performance and machine understanding of the original document is inadequate, especially for historically important material with archaic fonts and spellings, newspapers with complex layouts, bound volumes, microfilm or typescript.\nThe IMPACT project will remove many of these barriers. It brings together fifteen national and regional libraries, research institutions and commercial suppliers - all centres of competence with unequalled experience of large-scale text digitisation processes and technologies. The project will let them share their know-how and best practices, develop innovative tools to enhance the capabilities of OCR engines and the accessibility of digitised text and lay down the foundations for the mass-digitisation programmes that will take place over the next decade. This project will facilitate a more collaborative approach to mass-digitisation. It will build capacity and lower the barriers to entry for organisations in the early stages of their own digitisation activity.
Agency: European Commission | Branch: FP7 | Program: CSA | Phase: ICT-2011.4.3 | Award Amount: 1.66M | Year: 2013
The Collaboration to Clarify the Costs of Curation (4C) project will help organisations across Europe to more effectively invest in digital curation and preservation. Making an investment inevitably involves a cost and existing research on cost modelling provides the starting point for the 4C work. But the point of an investment is to realise a benefit, so work on cost must also focus on benefit, which must then encompass related concepts such as risk, value, quality and sustainability. Organisations that understand this will be more able to effectively control and manage their digital assets over time, but they may also be able to create new cost-effective solutions and services for others.\n\nExisting research into cost modelling is far from complete and there has been little uptake of the tools and methods that have been developed and very little integration into other digital curation processes. The main objective of the 4C project is, therefore, to ensure that where existing work is relevant, that stakeholders realise and understand how to employ those resources. But the additional aim of the work is to closely examine how they might be made more fit-for-purpose, relevant and useable by a wide range of organisations operating at different scales in both the public and the private sector.\n\nThese objectives will be achieved by a coordinated programme of outreach and engagement that will identify existing and emerging research and analyse user requirements. This will inform an assessment of where there are gaps in the current provision of tools, frameworks and models. The project will support stakeholders to better understand and articulate their requirements and will clarify some of the complexity of the relationships between cost and other factors. The outputs of this project will include various stakeholder engagement and dissemination events (focus groups, workshops, a conference), a series of reports, the creation of models and specifications, and the establishment of an international Curation Costs Exchange framework. All of this activity will enable the definition of a research and development agenda and a business engagement strategy which will be delivered to the European Commission in the form of a roadmap.\n\nThe consortium undertaking this project includes organisations with extensive domain expertise and experience with curation cost modelling issues. It includes national libraries and archives, specialist preservation and curation membership organisations, service providers, research departments and SMEs. It will be coordinated by a national funding organisation that specialises in supporting the innovative use of ICT methods and technologies.
Brocks H.,University of Hagen |
Kranstedt A.,Deutsche Nationalbibliothek |
Jaschke G.,Globale Informationstechnik GmbH |
Hemmje M.,University of Hagen
Studies in Computational Intelligence | Year: 2010
Digital preservation can be regarded as ensuring communication with the future, that means ensuring the persistence of digital resources, rendering them findable, accessible and understandable for supporting contemporary reuse as well as safeguarding the interests of future generations. The context of a digital object to be preserved over time comprises the representation of all known properties associated with it and of all operations that have been carried out on it. This implies the information needed to decode the data stream and to restore the original content, information about its creation environment, including the actors and resources involved, and information about the organizational and technical processes associated with the production, preservation, access and reuse of the digital object. In this article we propose a generic context model which provides a formal representation for capturing all these aspects, to enable retracing information paths for future reuse. Building on experiences with the preservation of digital documents in so-called memory institutions, we demonstrate the feasibility of our approach within the domain of scientific publishing. © 2010 Springer-Verlag Berlin Heidelberg.
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2007.4.1 | Award Amount: 12.29M | Year: 2007
The aim of the SHAMAN Integrated Project is to investigate and develop a long-term next generation digital preservation (DP) framework and corresponding application solution environments for analysing, ingesting, managing, accessing and reusing information objects and data across libraries and archives, Three prototypical application solutions will be build on the basis of this framework environment will support the and trialling and validating of the result in scientific publishing, parliamentary archival, industrial design and engineering and finally experimentally also in scientific application domains. To achieve these goals SHAMAN is applying and utilising radically new and promising methods for supporting DP as the core of the approach. Within SHAMAN, the core functions are organized within the SHAMAN reference architecture. Utilizing this architecture the project will create a framework and application development environment supporting the creation of test-beds of Digital Preservation support infrastructures and services. The core services of the SHAMAN framework are constructed by integrating Data Grid (DG), Digital Library (DL), Persistent Archive (PA), Context Representation, Annotation, and Preservation (CRAP) as well as Deep Linguistic Analysis (DLA) and corresponding Semantic Representation and Annotation (SRA) technologies for simple and connected data types establishing, document, media, CAD, and scientific data, knowledge, and information collections. This will result in an unprecedented level of functionality and will lay the foundations for the long-term unification of knowledge preservation and analysis across domains within a distributed grid-based infrastructure.
Agency: European Commission | Branch: FP7 | Program: CP | Phase: ICT-2007.4.3 | Award Amount: 3.97M | Year: 2009
KEEP (Keeping Emulation Environments Portable) will develop an Emulation Access Platform to enable accurate rendering of both static and dynamic digital objects: text, sound, and image files; multimedia documents, websites, databases, videogames etc. The overall aim of the project is to facilitate universal access to our cultural heritage by developing flexible tools for accessing and storing a wide range of digital objects.\n\nThe very success of computing technology, where machines are rapidly superseded, has created a serious and growing challenge of how to preserve access to digital material produced on obsolete machines. Cultural heritage organisations are particularly sensitive to the threat of major data loss resulting from technical obsolescence. KEEP will develop an Emulation Access Platform to enable the accurate rendering of these objects, designed for a wide variety of computer systems, so that they can be securely accessed in the long term.\n\nKEEP will address the problems of transferring digital objects stored on outdated computer media such as floppy discs onto current storage devices. This will involve the specification of file formats and the production of transfer tools exploited within a framework, and taking into account possible legal and technical issues. KEEP will address all aspects ranging from safeguarding the original bits from the carrier to offering online services to end-users via a highly portable emulation framework running on any possible device. In addition to producing a software package, the project will deliver understanding about how to integrate emulation-based solutions with an operational electronic deposit system. Existing metadata models will be researched and guidelines will be developed for mapping digital objects to emulated manifestations. Overall, KEEP will create the foundation for the next generation of permanent access strategies based on emulation.\n\n Although primarily aimed at those involved in Cultural Heritage, suc