Kimelfeld B.,LogicBlox Inc.
Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems | Year: 2014
Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts and techniques used for performing Information Extraction tasks, and describes some of the declarative frameworks that provide abstractions and infrastructure for programming extractors. In addition, the tutorial highlights opportunities for research impact through principles of data management, illustrates these opportunities through recent work, and proposes directions for future research. Copyright 2014 ACM. Source
Green T.J.,LogicBlox Inc.
Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems | Year: 2015
We give an overview of LogiQL, a declarative, Datalog-based language for data management and analytics, along with techniques for efficient evaluation of LogiQL programs, emphasizing theoretical foundations when possible. These techniques include: leapfrog triejoin and its associated incremental maintenance algorithm, which we measure against appropriate optimality criteria; purely-functional data structures, which provide elegant versioning and branching capabilities that are indispensable for LogiQL; and transaction repair, a lock-free concurrency control scheme that uses LogiQL, incremental maintenance, and purely-functional data structures as essential ingredients. © 2015 ACM. Source
LogicBlox Inc. | Date: 2015-10-21
A method for joining records from database tables is proposed. Join attributes are ordered into a sequence S
LogicBlox Inc. | Date: 2015-03-19
An aspect includes concurrently executing two or more transactions over a database. A plurality of transactions is executed in parallel while recording each transactions sensitivities and output deltas. A sensitivity of a transaction identifies an aspect of a database state whose modification has a potential of altering an output of the transaction, and an output delta of the transaction indicates a change to the database state that is a result of the transaction being executed. The output deltas are fed from a first transaction through a filter for a second transaction. The filter is based on the second transactions sensitivities. The filtered deltas are processed in the second transaction to incrementally compute revised deltas and sensitivities for the second transaction. For each transaction that successfully commits, the transactions deltas are applied to update the database.
LogicBlox Inc. | Date: 2014-06-06
Salient sampling for query size estimation includes identifying two or more columns in a database table that have corresponding columns in one or more other tables. One or more hash functions are applied to domains of each of the identified columns. A first hash function is applied to a domain of the first column and a second hash function to a domain of the second column. A subset of the rows in the database table is selected. The selecting includes selecting rows in the database table where results of the first hash function meet a first numeric threshold and selecting rows in the database table where results of the second hash function meet a second numeric threshold. A sample database table corresponding to the database table is created. The sample database table includes the selected subset of the rows in the database table.