Time filter

Source Type

Ma S.,Beihang University | Cao Y.,Beihang University | Fan W.,Laboratory for Foundations of Computer Science | Huai J.,Beihang University | Wo T.,Beihang University
ACM Transactions on Database Systems

Graph pattern matching is finding all matches in a data graph for a given pattern graph and is often defined in terms of subgraph isomorphism, an NP-complete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow graph pattern matching to be conducted in cubic time. However, they fall short of capturing the topology of data graphs, that is, graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this article proposes a notion of strong simulation, a revision of graph simulation for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of graph simulation by providing a cubic-time algorithm for computing strong simulation. (3) We present the locality property of strong simulation which allows us to develop an effective distributed algorithm to conduct graph pattern matching on distributed graphs. (4)We experimentally verify the effectiveness and efficiency of these algorithms using both real-life and synthetic data. © 2014 ACM. Source

Fan W.,Harbin Institute of Technology | Geerts F.,Harbin Institute of Technology | Geerts F.,Laboratory for Foundations of Computer Science
ACM Transactions on Database Systems

This article investigates the question of whether a partially closed database has complete information to answer a query. In practice an enterprise often maintains master data Dm, a closed-world database. We say that a database D is partially closed if it satisfies a set V of containment constraints of the form q(D) ⊆ p(Dm), where q is a query in a language LC and p is a projection query. The part of D not constrained by (Dm, V) is open, from which some tuples may be missing. The database D is said to be complete for a query Q relative to (D m, V) if for all partially closed extensions D′ of D, Q(D′) = Q(D), i.e., adding tuples to D either violates some constraints in V or does not change the answer to Q. We first show that the proposedmodel can also capture the consistency of data, in addition to its relative completeness. Indeed, integrity constraints studied for data consistency can be expressed as containment constraints. We then study two problems. One is to decide, given Dm, V, a query Q in a language LQ, and a partially closed database D, whether D is complete for Q relative to (Dm, V). The other is to determine, given Dm, V and Q, whether there exists a partially closed database that is complete for Q relative to (Dm, V).We establish matching lower and upper bounds on these problems for a variety of languages LQ and LC. We also provide characterizations for a database to be relatively complete, and for a query to allow a relatively complete database, when LQ and LC are conjunctive queries. © 2010 ACM. Source

Fan W.,Laboratory for Foundations of Computer Science | Wang X.,Laboratory for Foundations of Computer Science | Wu Y.,University of California at Santa Barbara
ACM Transactions on Database Systems

Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms of subgraph isomorphism or graph simulation. These notions, however, often impose too strong a topological constraint on graphs to identify meaningful matches. Second, in practice a graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches starting from scratch via batch algorithms when the graph is updated. This article studies these two issues. (1) We propose to define graph pattern matching based on a notion of bounded simulation, which extends graph simulation by specifying the connectivity of nodes in a graph within a predefined number of hops. We show that bounded simulation is able to find sensible matches that the traditional matching notions fail to catch. We also show that matching via bounded simulation is in cubic time, by giving such an algorithm. (2) We provide an account of results on incremental graph pattern matching, for matching defined with graph simulation, bounded simulation, and subgraph isomorphism.We show that the incremental matching problem is unbounded, that is, its cost is not determined alone by the size of the changes in the input and output, for all these matching notions. Nonetheless, when matching is defined in terms of simulation or bounded simulation, incremental matching is semibounded, that is, its worst-time complexity is bounded by a polynomial in the size of the changes in the input, output, and auxiliary information that is necessarily maintained to reuse previous computation, and the size of graph patterns.We also develop incremental matching algorithms for graph simulation and bounded simulation, by minimizing unnecessary recomputation. In contrast, matching based on subgraph isomorphism is neither bounded nor semibounded. (3) We experimentally verify the effectiveness and efficiency of these algorithms, and show that: (a) the revised notion of graph pattern matching allows us to identify communities commonly found in real-life networks, and (b) the incremental algorithms substantially outperform their batch counterparts in response to small changes. These suggest a promising framework for real-life graph pattern matching. © 2013 ACM. Source

Fan W.,Laboratory for Foundations of Computer Science | Geerts F.,Middelheimlaan | Wijsen J.,Institute dInformatique
ACM Transactions on Database Systems

Data in real-life databases become obsolete rapidly. One often finds that multiple values of the same entity reside in a database. While all of these values were once correct, most of them may have become stale and inaccurate. Worse still, the values often do not carry reliable timestamps. With this comes the need for studying data currency, to identify the current value of an entity in a database and to answer queries with the current values, in the absence of reliable timestamps. This article investigates the currency of data. (1) We propose a model that specifies partial currency orders in terms of simple constraints. The model also allows us to express what values are copied from other data sources, bearing currency orders in those sources, in terms of copy functions defined on correlated attributes. (2) We study fundamental problems for data currency, to determine whether a specification is consistent, whether a value is more current than another, and whether a query answer is certain no matter how partial currency orders are completed. (3) Moreover, we identify several problems associated with copy functions, to decide whether a copy function imports sufficient current data to answer a query, whether a copy function can be extended to import necessary current data for a query while respecting the constraints, and whether it suffices to copy data of a bounded size. (4) We establish upper and lower bounds of these problems, all matching, for combined complexity and data complexity, and for a variety of query languages. We also identify special cases that warrant lower complexity. © 2012 ACM. Source

Discover hidden collaborations