Berlin, Germany
Berlin, Germany

Time filter

Source Type

Daiber J.,University of Groningen | Jakob M.,Neofonie GmbH | Hokamp C.,University of North Texas | Mendes P.N.,Wright State University
ACM International Conference Proceeding Series | Year: 2013

There has recently been an increased interest in named entity recognition and disambiguation systems at major conferences such as WWW, SIGIR, ACL, KDD, etc. However, most work has focused on algorithms and evaluations, leaving little space for implementation details. In this paper, we discuss some implementation and data processing challenges we encountered while developing a new multilingual version of DBpedia Spotlight that is faster, more accurate and easier to configure. We compare our solution to the previous system, considering time performance, space requirements and accuracy in the context of the Dutch and English languages. Additionally, we report results for 9 additional languages among the largest Wikipedias. Finally, we present challenges and experiences to foment the discussion with other developers interested in recognition and disambiguation of entities in natural language text. © 2013 ACM.


Lehmann J.,University of Leipzig | Isele R.,Brox IT Solutions GmbH | Jakob M.,Neofonie GmbH | Jentzsch A.,Hasso Plattner Institute for IT Systems Engineering | And 8 more authors.
Semantic Web | Year: 2015

The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a world-wide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications. © IOS Press and the authors


Grant
Agency: Cordis | Branch: FP7 | Program: CP | Phase: ICT-2009.4.3 | Award Amount: 3.51M | Year: 2010

The goal of the Dicode project is to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex settings. To do so, it will exploit and build on the most prominent high-performance computing paradigms and large data processing technologies - such as cloud computing, MapReduce, Hadoop, Mahout, and column databases to meaningfully search, analyze and aggregate data existing in diverse, extremely large, and rapidly evolving sources.Building on current advancements, the solution foreseen in the Dicode project will bring together the reasoning capabilities of both the machine and the humans. It can be viewed as an innovative workbench incorporating and orchestrating a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities. Services to be developed are: (i) scalable data mining services (including services for text mining and opinion mining), (ii) collaboration support services, and (iii) decision making support services.The achievement of the Dicode projects goal will be validated through three use cases addressing clearly established problems. These cases were chosen to test the transferability of Dicode solution in different collaboration and decision making settings, associated with diverse types of data and data sources, thus covering the full range of the foreseen solutions features and functionalities. They concern: (i) scientific collaboration supported by integrated large-scale knowledge discovery in clinico-genomic research, (ii) delivering pertinent information from heterogeneous data to communities of doctors and patients in medical treatment decision making, and (iii) capturing tractable, commercially valuable high-level information from unstructured Web 2.0 data for opinion mining.


Karacapilidis N.,University of Patras | Loeffler R.,Publicis Frankfurt Zweigniederlassung der PWW GmbH | Maassen D.,Neofonie GmbH | Tzagarakis M.,University of Patras
Frontiers in Artificial Intelligence and Applications | Year: 2012

This paper presents an innovative solution to Social Media monitoring. The proposed approach builds on the synergy between machine and collective human intelligence to enhance the underlying sense-making and decision making processes. In the setting under consideration, our approach reduces the data-intensiveness and overall complexity of real-life collaboration and decision making to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities. This is achieved through a meaningful integration of dedicated data mining and collaborative decision making services. © 2012 The authors and IOS Press. All rights reserved.


Kuhlmann F.,Neofonie GmbH | Hannemann J.,German National Library of Science and Technology | Traub M.,German National Library of Science and Technology | Bohme C.,German National Library of Science and Technology | And 12 more authors.
Cognitive Technologies | Year: 2014

The THESEUS research program assembled key companies with market power from all types of sectors to jointly develop the innovative products that will enable the knowledge society. There have been carried out six use cases to demonstrate applications based on the developements of the THESEUS Core Technology Cluster. In this article, we will give a short overview of selected results of each use case. © Springer International Publishing Switzerland 2014.


Grossmann B.,Neofonie GmbH | Todor A.,Free University of Berlin | Paschke A.,Free University of Berlin
ACM International Conference Proceeding Series | Year: 2015

Traditional keyword-based IR approaches take into account the document context only in a limited manner. In our paper we present a novel document ranking approach based on the semantic relationships between named entities. In the first step we annotate all documents with named entities from a knowledge base (for example people, places and organisations). In the next step these annotations in combination with the relationships from the knowledge base are used to rank documents in order to perform a semantic search. Documents that contain the specific named entity that was searched for as well as other strongly related entities, receive a higher ranking. The inclusion of the document context in the ranking approach achieves a higher precision in the Top-K results. © 2015 ACM.


Wendt M.,Neofonie GmbH | Gerlach M.,Neofonie GmbH | Duwiger H.,Neofonie GmbH
Cognitive Technologies | Year: 2014

The Semantic Web came with the prospect of once providing amounts of information just as immense as those now available from the Internet, ready for evaluation and analysis by machines. About a decade later, more and more data hubs (a data hub is comparable to a web site) emerged that provide information free of charge. Massive amounts of such information, also called Linked Open Data (LOD), make the vision of the Semantic Web come to life. As an example, DBpedia - by harvesting information from Wikipedia - already contains hundreds of millions of general knowledge facts. Such data can be used to conveniently make information of general interest available to the public. Above this, the structure of this information and the fact that it is present in machine readable form renders possibly more structured ways of information access. One of these technologies is Question Answering (QA) - a task that always hinged on the availability of massive amounts of information. This paper reports on our approach to implementing a QA system backed by Linked Open Data. The QA system is part of the Alexandria use case. © Springer International Publishing Switzerland 2014.


Kemmerer S.,Neofonie GmbH | Grossmann B.,Neofonie GmbH | Muller C.,Neofonie GmbH | Adolphs P.,Neofonie GmbH | Ehrig H.,Neofonie GmbH
ERD 2014 - Proceedings of the 1st ACM International Workshop on Entity Recognition and Disambiguation, Co-located with SIGIR 2014 | Year: 2014

This paper describes Neofonie NERD, our Named Entity Recognition and Disambiguation system submitted to the ERD Challenge 2014. The system uses a vector space model approach for disambiguation, based on the link structure of Freebase, in combination with precomputed statistical measures from Wikipedia and Freebase. It was originally developed for the German language and has now been adapted for English. We achieved 70.0% F1-score in the final evaluation, which is 5.7 percent points above the average of all participating teams. Copyright is held by the owner/author(s).


Wendt M.,Neofonie GmbH | Gerlach M.,Neofonie GmbH | Duwiger H.,Neofonie GmbH
CEUR Workshop Proceedings | Year: 2012

With the evolution of linked open data sources, question answering regains importance as a way to make data accessible and explorable to the public. The triple structure of RDF-data at the same time seems to predetermine question answering for being devised in its native subject-verb-object form. The devices of natural language, however, often exceed this triple-centered model. But RDF does not preclude this point of view. Rather, it depends on the modeling. As part of a government funded research project named Alexandria, we implemented an approach to question answering that enables the user to ask questions in ways that may involve more than binary relations.


Hahn R.,Neofonie GmbH | Bizer C.,Free University of Berlin | Sahnwaldt C.,Neofonie GmbH | Herta C.,Neofonie GmbH | And 4 more authors.
Lecture Notes in Business Information Processing | Year: 2010

Wikipedia articles contain, besides free text, various types of structured information in the form of wiki markup. The type of wiki content that is most valuable for search are Wikipedia infoboxes, which display an article's most relevant facts as a table of attribute-value pairs on the top right-hand side of the Wikipedia page. Infobox data is not used by Wikipedia's own search engine. Standard Web search engines like Google or Yahoo also do not take advantage of the data. In this paper, we present Faceted Wikipedia Search, an alternative search interface for Wikipedia, which facilitates infobox data in order to enable users to ask complex questions against Wikipedia knowledge. By allowing users to query Wikipedia like a structured database, Faceted Wikipedia Search helps them to truly exploit Wikipedia's collective intelligence. © Springer-Verlag Berlin Heidelberg 2010.

Loading Neofonie GmbH collaborators
Loading Neofonie GmbH collaborators