Time filter
Source Type

Wang T.,University of Toronto | Johnson R.,LogicBlox | Fekete A.,University of Sydney | Pandis I.,Amazon
VLDB Journal | Year: 2017

Concurrency control (CC) algorithms must trade off strictness for performance. In particular, serializable CC schemes generally pay higher cost to prevent anomalies, both in runtime overhead such as the maintenance of lock tables and in efforts wasted by aborting transactions. We propose the serial safety net (SSN), a serializability-enforcing certifier which can be applied on top of various CC schemes that offer higher performance but admit anomalies, such as snapshot isolation and read committed. The underlying CC mechanism retains control of scheduling and transactional accesses, while SSN tracks the resulting dependencies. At commit time, SSN performs a validation test by examining only direct dependencies of the committing transaction to determine whether it can commit safely or must abort to avoid a potential dependency cycle. SSN performs robustly for a variety of workloads. It maintains the characteristics of the underlying CC without biasing toward a certain type of transactions, though the underlying CC scheme might. Besides traditional OLTP workloads, SSN also efficiently handles heterogeneous workloads which include a significant portion of long, read-mostly transactions. SSN can avoid tracking the vast majority of reads (thus reducing the overhead of serializability certification) and still produce serializable executions with little overhead. The dependency tracking and validation tests can be done efficiently, fully parallel and latch-free, for multi-version systems on modern hardware with substantial core count and large main memory. We demonstrate the efficiency, accuracy and robustness of SSN using extensive simulations and an implementation that overlays snapshot isolation in ERMIA, a memory-optimized OLTP engine that supports multiple CC schemes. Evaluation results confirm that SSN is a promising approach to serializability with robust performance and low overhead for various workloads. © 2017 Springer-Verlag Berlin Heidelberg

Halpin T.,LogicBlox | Halpin T.,INTI Education Group | Wijbenga J.P.,University of Groningen
Lecture Notes in Business Information Processing | Year: 2010

A conceptual schema of an information system specifies the fact structures of interest as well as the business rules that apply to the business domain being modeled. These rules, which may be complex, are best validated with subject matter experts, since they best understand the business domain. In practice, business domain experts often lack expertise in the technical languages used by modelers to capture or query the information model. Controlled natural languages offer a potential solution to this problem, by allowing business experts to validate models and queries expressed in language they understand, while still being executable, with automated generation of implementation code. This paper describes FORML 2, a controlled natural language based on ORM 2 (second generation Object-Role Modeling), featuring rich expressive power, intelligibility, and semantic stability. Design guidelines are discussed, as well as a prototype implemented as an extension to the open source NORMA (Natural ORM Architect) tool. © 2010 Springer-Verlag Berlin Heidelberg.

Agency: GTR | Branch: EPSRC | Program: | Phase: Research Grant | Award Amount: 4.56M | Year: 2015

Data is everywhere, generated by increasing numbers of applications, devices and users, with few or no guarantees on the format, semantics, and quality. The economic potential of data-driven innovation is enormous, estimated to reach as much as £40B in 2017, by the Centre for Economics and Business Research. To realise this potential, and to provide meaningful data analyses, data scientists must first spend a significant portion of their time (estimated as 50% to 80%) on data wrangling - the process of collection, reorganising, and cleaning data. This heavy toll is due to what is referred as the four Vs of big data: Volume - the scale of the data, Velocity - speed of change, Variety - different forms of data, and Veracity - uncertainty of data. There is an urgent need to provide data scientists with a new generation of tools that will unlock the potential of data assets and significantly reduce the data wrangling component. As many traditional tools are no longer applicable in the 4 Vs environment, a radical paradigm shift is required. The proposal aims at achieving this paradigm shift by adding value to data, by handling data management tasks in an environment that is fully aware of data and user contexts, and by closely integrating key data management tasks in a way not yet attempted, but desperately needed by many innovative companies in todays data-driven economy. The VADA research programme will define principles and solutions for Value Added Data Systems, which support users in discovering, extracting, integrating, accessing and interpreting the data of relevance to their questions. In so doing, it uses the context of the user, e.g., requirements in terms of the trade-off between completeness and correctness, and the data context, e.g., its availability, cost, provenance and quality. The user context characterises not only what data is relevant, but also the properties it must exhibit to be fit for purpose. Adding value to data then involves the best efort provision of data to users, along with comprehensive information on the quality and origin of the data provided. Users can provide feedback on the results obtained, enabling changes to all data management tasks, and thus a continuous improvement in the user experience. Establishing the principles behind Value Added Data Systems requires a revolutionary approach to data management, informed by interlinked research in data extraction, data integration, data quality, provenance, query answering, and reasoning. This will enable each of these areas to benefit from synergies with the others. Research has developed focused results within such sub-disciplines; VADA develops these specialisms in ways that both transform the techniques within the sub-disciplines and enable the development of architectures that bring them together to add value to data. The commercial importance of the research area has been widely recognised. The VADA programme brings together university researchers with commercial partners who are in desperate need of a new generation of data management tools. They will be contributing to the programme by funding research staff and students, providing substantial amounts of staff time for research collaborations, supporting internships, hosting visitors, contributing challenging real-life case studies, sharing experiences, and participating in technical meetings. These partners are both developers of data management technologies (LogicBlox, Microsoft, Neo) and data user organisations in healthcare (The Christie), e-commerce (LambdaTek, PricePanda), finance (AllianceBernstein), social networks (Facebook), security (Horus), smart cities (FutureEverything), and telecommunications (Huawei).

Karvounarakis G.,LogicBlox | Green T.J.,LogicBlox | Ives Z.G.,University of Pennsylvania | Tannen V.,University of Pennsylvania
ACM Transactions on Database Systems | Year: 2013

Recent work [Ives et al. 2005] proposed a new class of systems for supporting data sharing among scientific and other collaborations: this new collaborative data sharing system connects heterogeneous logical peers using a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to incorporate related data from other peers as well. To achieve this, every peer's data and updates propagate along themappings to the other peers. However, this operation, termed update exchange, is filtered by trust conditions expressing what data and sources a peer judges to be authoritative which may cause a peer to reject another's updates. In order to support such filtering, updates carry provenance information. This article develops methods for realizing such systems: we build upon techniques from data integration, data exchange, incremental view maintenance, and view update to propagate updates along mappings, both to derived and optionally to source instances. We incorporate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance. We implement our techniques in a layer above an off-the-shelf RDBMS, and we experimentally demonstrate the viability of these techniques in the ORCHESTRA prototype system. © 2013 ACM.

Karvounarakis G.,LogicBlox | Ives Z.G.,University of Pennsylvania | Tannen V.,University of Pennsylvania
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2010

Many advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was produced, e.g., to determine its score or existence. This requires answers to queries such as, "Is this data derivable from trusted tuples?"; "What tuples are derived from this relation?"; or "What score should this answer receive, given initial scores of the base tuples?". Such questions can be answered by consulting the provenance of query results. In recent years there has been significant progress on formal models for provenance. However, the issues of provenance storage, maintenance, and querying have not yet been addressed in an application-independent way. In this paper, we adopt the most general formalism for tuple-based provenance, semiring provenance. We develop a query language for provenance, which can express all of the aforementioned types of queries, as well as many more; we propose storage, processing and indexing schemes for data provenance in support of these queries; and we experimentally validate the feasibility of provenance querying and the benefits of our indexing techniques across a variety of application classes and queries. © 2010 ACM.

Theoharis Y.,Institute of Computer Science Forth ICS | Fundulaki I.,World Wide Web Conferences W3C Greece Office | Karvounarakis G.,LogicBlox | Christophides V.,University of Crete
IEEE Internet Computing | Year: 2011

Capturing trustworthiness, reputation, and reliability of Semantic Web data manipulated by SPARQL requires researchers to represent adequate provenance information, usually modeled as source data annotations and propagated to query results along with a query evaluation. Alternatively, abstract provenance models can capture the relationship between query results and source data by taking into account the employed query operators. The authors argue the benefits of the latter for settings in which query results are materialized in several repositorwies and analyzed by multiple users. They also investigate how relational provenance models can be leveraged for SPARQL queries, and advocate for new provenance models. © 2011 IEEE.

Nanevski A.,IMDEA Madrid Institute for Advanced Studies | Ley-Wild R.,LogicBlox | Sergey I.,IMDEA Madrid Institute for Advanced Studies | Delbianco G.A.,IMDEA Madrid Institute for Advanced Studies
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2014

We present a novel model of concurrent computations with shared memory and provide a simple, yet powerful, logical framework for uniform Hoarestyle reasoning about partial correctness of coarse- and fine-grained concurrent programs. The key idea is to specify arbitrary resource protocols as communicating state transition systems (STS) that describe valid states of a resource and the transitions the resource is allowed to make, including transfer of heap ownership. We demonstrate how reasoning in terms of communicating STS makes it easy to crystallize behavioral invariants of a resource. We also provide entanglement operators to build large systems from an arbitrary number of STS components, by interconnecting their lines of communication. Furthermore, we show how the classical rules from the Concurrent Separation Logic (CSL), such as scoped resource allocation, can be generalized to fine-grained resource management. This allows us to give specifications as powerful as Rely-Guarantee, in a concise, scoped way, and yet regain the compositionality of CSL-style resource management. We proved the soundness of our logic with respect to the denotational semantics of action trees (variation on Brookes' action traces). We formalized the logic as a shallow embedding in Coq and implemented a number of examples, including a construction of coarse-grained CSL resources as a modular composition of various logical and semantic components. © 2014 Springer-Verlag.

Halpin T.,LogicBlox | Halpin T.,INTI International University | Curland M.,LogicBlox
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2011

Fact-oriented modeling approaches such as Object-Role Modeling (ORM) have long supported several varieties of ring constraints, such as irreflexivity, asymmetry, intransitivity, and acyclicity, on pairs of compatible roles. The latest version of the Web Ontology Language (OWL 2) supports five kinds of ring constraint on binary predicates. Recently, three more ring constraint types (local reflexivity, strong intransitivity, and transitivity) were added to ORM. This paper discusses these new additions to ORM, as implemented in the Natural ORM Architect (NORMA) tool, and identifies important ways in which ORM and OWL differ in their support for ring constraints, while noting different mapping alternatives. We determine which combinations of elements from this expanded set of ring constraints are permitted, and provide verbalization patterns for the new additions. Graphical shapes for the new constraints and constraint combinations are introduced and motivated, and NORMA's new user interface for entry of ring constraints is illustrated. © 2011 Springer-Verlag.

Halpin T.,LogicBlox | Halpin T.,INTI Education Group
International Journal of Information System Modeling and Design | Year: 2010

Object-Role Modeling (ORM) is an approach for modeling and querying information at the conceptual level, and for transforming ORM models and queries to or from other representations. Unlike attribute-based approaches such as Entity-Relationship (ER) modeling and class modeling within the Unified Modeling Language (UML), ORM is fact-oriented, where all facts and rules are modeled in terms of natural sentences easily understood and validated by nontechnical business users. ORM's modeling procedure facilitates validation by verbalization and population with concrete examples. ORM's graphical notation is far more expressive than that of ER diagrams or UML class diagrams, and its attribute-free nature makes it more stable and adaptable to changing business requirements. This article explains the fundamentals of ORM, illustrates some of its advantages as a data modeling approach, and outlines some recent research to extend ORM, with special attention to mappings to deductive databases. Copyright © 2010, IGI Global.

Curland M.,LogicBlox | Halpin T.,LogicBlox | Halpin T.,INTI International University
Lecture Notes in Business Information Processing | Year: 2011

Second generation Object-Role Modeling (ORM 2) is a prime exemplar of fact-orientation, an approach that models the underlying facts of interest in an attribute-free way, using natural sentences to identify objects and the roles they play in relationships. ORM 2 provides languages and procedures for modeling and querying information systems at a conceptual level as well as mapping procedures for transforming between ORM structures and other structures, such as Entity Relationship (ER) models, class models in the Unified Modeling Language (UML), relational database models, extensible markup language schemas (XSD), and datalog. This paper provides an overview of Natural ORM Architect (NORMA), an ORM 2 tool under development that is implemented as a plug-in to Microsoft Visual Studio. For data modeling purposes, ORM typically provides greater expressive power and semantic stability than provided by UML or industrial versions of ER. NORMA's support for automated verbalization and sample populations facilitates validation with subject matter experts, and its live error-checking provides efficient feedback to modelers. © 2011 Springer-Verlag Berlin Heidelberg.

Loading LogicBlox collaborators
Loading LogicBlox collaborators