Athena Research Center

Athens, Greece

Athena Research Center

Athens, Greece

Time filter

Source Type

Efthymiou V.,ICS FORTH | Papadakis G.,National and Kapodistrian University of Athens | Papastefanatos G.,Athena Research Center | Stefanidis K.,ICS FORTH | Palpanas T.,University of Paris Descartes
Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 | Year: 2015

Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. Typically, it scales to large volumes of data through blocking: similar entities are clustered into blocks so that it suffices to perform comparisons only within each block. Meta-blocking further increases efficiency by cleaning the overlapping blocks from unnecessary comparisons. However, even Meta-blocking can be time-consuming: applying it to blocks with 7.4 million entities and 2.21011 comparisons takes almost 8 days on a modern high-end server. In this paper, we parallelize Meta-blocking based on MapReduce. We propose a simple strategy that explicitly creates the core concept of Meta-blocking, the blocking graph. We then describe an advanced strategy that creates the blocking graph implicitly, reducing the overhead of data exchange. We also introduce a load balancing algorithm that distributes the computationally intensive workload evenly among the available compute nodes. Our experimental analysis verifies the superiority of our advanced strategy and demonstrates an almost linear speedup for all meta-blocking techniques with respect to the number of available nodes. © 2015 IEEE.

Ioannidis Y.,National and Kapodistrian University of Athens | Ioannidis Y.,Athena Research Center
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

In 1981 the "1st LBL Workshop on Statistical Database Management" was held in Berkeley, CA. It was essentially the first step towards establishing the new at the time and very important branch of the data management field that deals with scientific data. A few years later, the third edition of the event already had another 'S' added in its acronym and was named the "3rd Int'l Workshop on Statistical and Scientific Database Management". Eventually, the series became the well-known annual SSDBM conference. For more than 30 years, the requirements for managing and analyzing the data created in the context of research and other activities in many domain sciences have brought out several major challenging problems that have motivated and inspired much data management research. Through the years there have been a very large number of related research papers and keynote presentations in all major database conferences and journals, many significant contributions that have pushed the state of the art in various directions, several dedicated funding programs around the world, and numerous specialized and generic software systems that have been developed targeting scientific data management. Scientists from the biological, medical, physical, natural, and other sciences, as well as the arts and the humanities have worked closely together with data management researchers to obtain solutions to critical problems. All these activities have created a solid body of work that is now considered part of the data management research mainstream, on topics ranging, for example, from the traditional indexing and query processing to the more specialized data mining, real-time streaming, and provenance. Scientific data management is an area of great importance with a long history that will continue to be at the forefront of many interesting developments in the field. © 2012 Springer-Verlag.

Kaoudi Z.,University Paris - Sud | Kaoudi Z.,Athena Research Center | Manolescu I.,University Paris - Sud
VLDB Journal | Year: 2015

The Resource Description Framework (RDF) pioneered by the W3C is increasingly being adopted to model data in a variety of scenarios, in particular data to be published or exchanged on the Web. Managing large volumes of RDF data is challenging, due to the sheer size, the heterogeneity, and the further complexity brought by RDF reasoning. To tackle the size challenge, distributed storage architectures are required. Cloud computing is an emerging paradigm massively adopted in many applications for the scalability, fault-tolerance, and elasticity feature it provides, enabling the easy deployment of distributed and parallel architectures. In this article, we survey RDF data management architectures and systems designed for a cloud environment, and more generally, those large-scale RDF data management systems that can be easily deployed therein. We first give the necessary background, then describe the existing systems and proposals in this area, and classify them according to dimensions related to their capabilities and implementation techniques. The survey ends with a discussion of open problems and perspectives. © 2014, Springer-Verlag Berlin Heidelberg.

Smart O.,Emory University | Smart O.,Georgia Institute of Technology | Tsoulos I.G.,Technological Educational Institute TEI of Epirus | Gavrilis D.,Athena Research Center | Georgoulas G.,Technological Educational Institute TEI of Epirus
Expert Systems with Applications | Year: 2011

This paper presents grammatical evolution (GE) as an approach to select and combine features for detecting epileptic oscillations within clinical intracranial electroencephalogram (iEEG) recordings of patients with epilepsy. Clinical iEEG is used in preoperative evaluations of a patient who may have surgery to treat epileptic seizures. Literature suggests that pathological oscillations may indicate the region(s) of brain that cause epileptic seizures, which could be surgically removed for therapy. If this presumption is true, then the effectiveness of surgical treatment could depend on the effectiveness in pinpointing critically diseased brain, which in turn depends on the most accurate detection of pathological oscillations. Moreover, the accuracy of detecting pathological oscillations depends greatly on the selected feature(s) that must objectively distinguish epileptic events from average activity, a task that visual review is inevitably too subjective and insufficient to resolve. Consequently, this work suggests an automated algorithm that incorporates grammatical evolution (GE) to construct the most sufficient feature(s) to detect epileptic oscillations within the iEEG of a patient. We estimate the performance of GE relative to three alternative methods of selecting or combining features that distinguish an epileptic gamma (∼65-95 Hz) oscillation from normal activity: forward sequential feature-selection, backward sequential feature-selection, and genetic programming. We demonstrate that a detector with a grammatically evolved feature exhibits a sensitivity and selectivity that is comparable to a previous detector with a genetically programmed feature, making GE a useful alternative to designing detectors. © 2011 Published by Elsevier Ltd.

Gkirtzou K.,Athena Research Center | Karozos K.,Athens University of Economics and Business | Vassalos V.,Athens University of Economics and Business | Dalamagas T.,Athena Research Center
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2015

Linked Data is the most common practice for publishing and sharing information in the Data Web. As new data become available, their exploration is a fundamental step towards integration and interoperability. However, typical search methods as SPARQL queries require knowing both the SPARQL syntax and the vocabulary used in the data. For this reason, keyword-based search has been proposed, allowing an intuitive way for searching an RDF dataset. In this paper, we present a novel approach for keyword search on graph-structured data, and in particular temporal RDF graph, i.e. RDF data that involve temporal properties. Our method, instead of providing answers directly from the RDF data graph, automatically generates a set of candidate SPARQL queries that try to capture users information need as expressed by the keywords used. To support temporal exploration, our method is enriched with temporal operators allowing the user to explore data within predefined time ranges. To evaluate our approach, we perform an effectiveness study using two real-world datasets. © Springer International Publishing Switzerland 2015.

News Article | October 28, 2016

INNOETICS, a company specializing in text-to-speech synthesis, has launched the VoiceCrafts platform at the Interspeech 2016 Conference in San Francisco. Powered by INNOETICS’s award-winning technology that delivers top-quality, near-natural synthetic speech, and its unique process for fast developing new voices, VoiceCrafts is a two-sided platform that allows any voice talent to create their own synthetic voices in just a few hours and any developer and app to use them through a simple API. “VoiceCrafts makes it extremely easy and fast to create new, quality synthetic voices in an automated way,” says Dr. Aimilios Chalamandaris, CEO and co-founder of INNOETICS. “A few hours of recorded speech are enough to produce a natural-sounding synthetic voice that can then be used to liven up a whole range of next generation voice-enabled applications, such as conversational agents, speaking robots, the smart home, games, or even branded voice personas for enterprises.” VoiceCrafts helps applications go well beyond what ready-made, uniform synthetics voices can offer. Each application can now have its own unique voice with a character and style that perfectly match its singular requirements. With VoiceCrafts, the voice talents become an organic part of the process. Each time their synthetic voice is used in an application or a service, they get a share of the revenues. This provides a clear incentive for them to nurture their synthetic vocal alter ego and ensure it sounds perfect. The VoiceCrafts platform will integrate simple and intuitive tools for such a task. “The voice building technology is now mature enough to take it out of the lab and into the hands of its stakeholders,” says Dr. Spyros Raptis, co-founder of INNOETICS and Director of Research at the Institute for Language and Speech Processing. “Empowering the voice talents to easily create their digital voice and directly monetize it through a rich set of speech services and APIs, can disrupt the speech synthesis business and change the rules of how synthetic voices are developed and used.” VoiceCrafts’ goal is to become the host of the most extensive collection of synthetic voices in the world, covering an unprecedented range of languages, dialects and styles. But further to that, it aspires to be a vivid community of voice designers and developers that will rely on the platform to give a digital presence to their voices and link them to great new apps and services that urgently need them. The VoiceCrafts platform is now in beta and until it is fully launched, all its services are free for non-commercial use. ABOUT INNOETICS INNOETICS is a spin-off company from the Institute for Language and Speech Processing / Athena Research Center, offering leading solutions in the field of Text-to-Speech. Based on its award-winning proprietary technology, INNOETICS aims to develop the largest portfolio of top-quality synthetic voices for the world of tomorrow. For more information visit

Kaoudi Z.,Athena Research Center | Kementsietsidis A.,Google
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

RDF has become recently a very popular data model used in a variety of applications and use cases in both academia and industry. Query processing and evaluation is a central component in data management in general and is, thus, unsurprisingly one of the most active areas of research in the field of RDF data management. In this chapter we provide an overview of query processing techniques for the RDF data model using different system architectures. We survey techniques for both centralized and distributed RDF stores, including peer-to-peer, federated and cloud-based systems. © Springer International Publishing Switzerland 2014.

Bikakis N.,National Technical University of Athens | Bikakis N.,Athena Research Center | Benouaret K.,French Institute for Research in Computer Science and Automation | Sacharidis D.,Athena Research Center
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

Given a set of objects and a set of user preferences, both defined over a set of categorical attributes, the Multiple Categorical Preferences (MCP) problem is to determine the objects that are considered preferable by all users. In a naïve interpretation of MCP, matching degrees between objects and users are aggregated into a single score which ranks objects. Such an approach, though, obscures and blurs individual preferences, and can be unfair, favoring users with precise preferences and objects with detailed descriptions. Instead, we propose an objective and fair interpretation of the MCP problem, based on two Pareto-based aggregations. We introduce an efficient approach that is based on a transformation of the categorical attribute values and an index structure. Moreover, we propose an extension for controlling the number of returned objects. An experimental study on real and synthetic data finds that our index-based technique is an order of magnitude faster than a baseline approach, scaling up to millions of objects. © 2014 Springer International Publishing Switzerland.

Gkirtzou K.,Athena Research Center | Papastefanatos G.,Athena Research Center | Dalamagas T.,Athena Research Center
NWSearch 2015 - Proceedings of the 1st International Workshop on Novel Web Search Interfaces and Systems | Year: 2015

In this paper, we present a summary of our work on RDF keyword search. Given a set of keywords, our method automatically generates a set of candidate SPARQL queries, and their natural language description, to be evaluated on the RDF data graph. We discuss our approach, highlighting current and future directions. © 2015 ACM.

Kaoudi Z.,Athena Research Center | Manolescu I.,French Institute for Research in Computer Science and Automation
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2014

The W3C's Resource Description Framework (or RDF, in short) is set to deliver many of the original semi-structured data promises: exible structure, optional schema, and rich, exible URIs as a basis for information sharing. Moreover, RDF is uniquely positioned to benefit from the efforts of scientific communities studying databases, knowledge representation, and Web technologies. As a consequence, numerous collections of RDF data are published, going from scientific data to general-purpose ontologies to open government data, in particular published as part of the Linked Data movement. Managing such large volumes of RDF data is challenging, due to the sheer size, the heterogeneity, and the further complexity brought by RDF reasoning. To tackle the size challenge, distributed storage architectures are required. Cloud computing is an emerging distributed paradigm massively adopted in many applications for the scalability, faulttolerance and elasticity features it provides. This tutorial presents the challenges faced in order to efficiently handle massive amounts of RDF data in a cloud environment. We provide the necessary background, analyze and classify existing solutions, and discuss open problems and perspectives. © 2014 ACM.

Loading Athena Research Center collaborators
Loading Athena Research Center collaborators