Time filter

Source Type

Dong X.L.,AT&T | Naumann F.,Hasso Plattner Institute HPI
SIGMOD Record | Year: 2010

WebDB 2010, the 13th International Workshop on the Web and Databases, took place on June 6, 2010. Christian Bizer, cofounder of the DBpedia project, compared the Linked Data movement, which stems from the Semantic Web research area, with research in the field of Dataspaces. The research session entitled Linked data and Wikipedia featured papers entitled 'An agglomerative query model for discovery in linked data: semantics and approach' and 'XML-based RDF data management for efficient query processing'. The other sessions of the workshop included papers entitled 'Find your advisor: robust knowledge gathering from the Web', 'Redundancy-driven web data extraction and integration', and 'Using latent-structure to detect objects on the Web'. Topics such as 'Manimal: relational optimization for data-intensive programs' and 'Learning topical transition probabilities in click through data with regression models' were also discussed.

Kokash N.,Centrum Wiskunde and Informatica CWI | Krause C.,Hasso Plattner Institute HPI | De Vink E.,TU Eindhoven
Formal Aspects of Computing | Year: 2012

The paradigm of service-oriented computing revolutionized the field of software engineering. According to this paradigm, new systems are composed of existing stand-alone services to support complex crossorganizational business processes. Correct communication of these services is not possible without a proper coordination mechanism. The Reo coordination language is a channel-based modeling language that introduces various types of channels and their composition rules. By composing Reo channels, one can specify Reo connectors that realize arbitrary complex behavioral protocols. Several formalisms have been introduced to give semantics to Reo. In their most basic form, they reflect service synchronization and dataflow constraints imposed by connectors. To ensure that the composed system behaves as intended, we need a wide range of automated verification tools to assist service composition designers. In this paper, we present our framework for the verification of Reo using the mCRL2 toolset. We unify our previous work on mapping various semantic models for Reo, namely, constraint automata, timed constraint automata, coloring semantics and the newly developed action constraint automata, to the process algebraic specification language of mCRL2, address the correctness of this mapping, discuss tool support, and present a detailed example that illustrates the use of Reo empowered with mCRL2 for the analysis of dataflow in service-based process models. © 2011 BCS.

Moon Y.-J.,French Institute for Research in Computer Science and Automation | Silva A.,Centrum Wiskunde and Informatica CWI | Silva A.,Radboud University Nijmegen | Silva A.,University of Minho | And 2 more authors.
Science of Computer Programming | Year: 2014

In this paper, we present a compositional semantics for the channel-based coordination language Reo that enables the analysis of quality of service (QoS) properties of service compositions. For this purpose, we annotate Reo channels with stochastic delay rates and explicitly model data-arrival rates at the boundary of a connector, to capture its interaction with the services that comprise its environment. We propose Stochastic Reo Automata as an extension of Reo automata, in order to compositionally derive a QoS-aware semantics for Reo. We further present a translation of Stochastic Reo Automata to Continuous-Time Markov Chains (CTMCs). This translation enables us to use third-party CTMC verification tools to do an end-to-end performance analysis of service compositions. In addition, we discuss to what extent Interactive Markov Chains (IMCs) can serve as an alternative semantic model for Stochastic Reo. We show that the semantics of Stochastic Reo cannot be specified compositionally using the product operator provided by IMCs. © 2013 Elsevier B.V. All rights reserved.

Abedjan Z.,Hasso Plattner Institute HPI | Quiane-Ruiz J.-A.,Qatar Computing Research Institute QCRI | Naumann F.,Hasso Plattner Institute HPI
Proceedings - International Conference on Data Engineering | Year: 2014

The discovery of all unique (and non-unique) column combinations in an unknown dataset is at the core of any data profiling effort. Unique column combinations resemble candidate keys of a relational dataset. Several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are suitable for applications on dynamic datasets, such as transactional databases, social networks, and scientific applications. In these cases, data profiling techniques should be able to efficiently discover new uniques and non-uniques (and validate old ones) after tuple inserts or deletes, without re-profiling the entire dataset. We present the first approach to efficiently discover unique and non-unique constraints on dynamic datasets that is independent of the initial dataset size. In particular, Swan makes use of intelligently chosen indices to minimize access to old data. We perform an exhaustive analysis of Swan and compare it with two state-of-the-art techniques for unique discovery: Gordian and Ducc. The results show that Swan significantly outperforms both, as well as their incremental adaptations. For inserts, Swan is more than 63x faster than Gordian and up to 50x faster than Ducc. For deletes, Swan is more than 15x faster than Gordian and up to 1 order of magnitude faster than Ducc. In fact, Swan even improves on the static case by dividing the dataset into a static part and a set of inserts. © 2014 IEEE.

Kruse S.,Hasso Plattner Institute HPI | Papotti P.,Qatar Computing Research Institute QCRI | Naumann F.,Hasso Plattner Institute HPI
EDBT 2015 - 18th International Conference on Extending Database Technology, Proceedings | Year: 2015

Data cleaning and data integration have been the topic of intensive research for at least the past thirty years, resulting in a multitude of specialized methods and integrated tool suites. All of them require at least some and in most cases significant human input in their configuration, during processing, and for evaluation. For managers (and for developers and scientists) it would be therefore of great value to be able to estimate the effort of cleaning and integrating some given data sets and to know the pitfalls of such an integration project in advance. This helps deciding about an integration project using cost/benefit analysis, budgeting a team with funds and manpower, and monitoring its progress. Further, knowledge of how well a data source fits into a given data ecosystem improves source selection. We present an extensible framework for the automatic effort estimation for mapping and cleaning activities in data integration projects with multiple sources. It comprises a set of measures and methods for estimating integration complexity and ultimately effort, taking into account heterogeneities of both schemas and instances and regarding both integration and cleaning operations. Experiments on two real-world scenarios show that our proposal is two to four times more accurate than a current approach in estimating the time duration of an integration process, and provides a meaningful breakdown of the integration problems as well as the required integration activities. © 2015, Copyright is with the authors.

Discover hidden collaborations