Time filter

Source Type

Khalil I.,Qatar Computing Research Institute QCRI | Khreishah A.,New Jersey Institute of Technology | Azeem M.,United Arab Emirates University
Computer Networks | Year: 2014

Security issues in cloud computing are shown to be the biggest obstacle that could lower the wide benefits of the cloud systems. This obstacle may be strengthened when cloud services are accessed by mobile devices. Mobile devices could be easily lost or stolen and hence, they are easy to compromise. Additionally, mobile users tend to store access credentials, passwords and other Personal Identifiable Information (PII) in an improperly protected way. We conduct a survey and found that more than 66% of the surveyed users store PIIs in unprotected text files, cookies, or applications. To strengthen the legitimate access process over the clouds and to facilitate authentication and authorization with multiple cloud service providers, third-party Identity Management Systems (IDMs) have been proposed and implemented. In this paper, we discuss the limitations of the state-of-the-art cloud IDMs with respect to mobile clients. Specifically, we show that the current IDMs are vulnerable to three attacks, namely - IDM server compromise, mobile device compromise, and network traffic interception. Most importantly, we propose and validate a new IDM architecture dubbed Consolidated IDM (CIDM) that countermeasures these attacks. We conduct experiments to evaluate the performance and the security guarantees of CIDM and compare them with those of current IDM systems. Our experiments show that CIDM provides its clients with better security guarantees and that it has less energy and communication overhead compared to the current IDM systems. © 2014 Elsevier B.V. All rights reserved. Source

Wang J.,University of California at Berkeley | Tang N.,Qatar Computing Research Institute QCRI
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2014

One of the main challenges that data cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (a.k.a. integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously hard problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules, and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules is consistent, and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data. © 2014 ACM. Source

Geerts F.,University of Antwerp | Mecca G.,University of Basilicata | Papotti P.,Qatar Computing Research Institute QCRI | Santoro D.,University of Basilicata | Santoro D.,Third University of Rome
Proceedings of the VLDB Endowment | Year: 2013

Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a set of given constraints. In recent years, repairing methods have been proposed for several classes of constraints. However, these methods rely on ad hoc decisions and tend to hard-code the strategy to repair conflicting values. As a consequence, there is currently no general algorithm to solve database repairing problems that involve different kinds of constraints and different strategies to select preferred values. In this paper we develop a uniform framework to solve this problem. We propose a new semantics for repairs, and a chase-based algorithm to compute minimal solutions. We implemented the framework in a DBMSbased prototype, and we report experimental results that confirm its good scalability and superior quality in computing repairs. Source

Naumann F.,Qatar Computing Research Institute QCRI
SIGMOD Record | Year: 2013

Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and occasionally functional dependencies and association rules. Individual research projects have proposed several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional dependencies. Data profiling deserves a fresh look for two reasons: First, the area itself is neither established nor defined in any principled way, despite significant research activity on individual parts in the past. Second, more and more data beyond the traditional relational databases are being created and beg to be profiled. The article proposes new research directions and challenges, including interactive and incremental profiling and profiling heterogeneous and non-relational data. Source

Abedjan Z.,Hasso Plattner Institute HPI | Quiane-Ruiz J.-A.,Qatar Computing Research Institute QCRI | Naumann F.,Hasso Plattner Institute HPI
Proceedings - International Conference on Data Engineering | Year: 2014

The discovery of all unique (and non-unique) column combinations in an unknown dataset is at the core of any data profiling effort. Unique column combinations resemble candidate keys of a relational dataset. Several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are suitable for applications on dynamic datasets, such as transactional databases, social networks, and scientific applications. In these cases, data profiling techniques should be able to efficiently discover new uniques and non-uniques (and validate old ones) after tuple inserts or deletes, without re-profiling the entire dataset. We present the first approach to efficiently discover unique and non-unique constraints on dynamic datasets that is independent of the initial dataset size. In particular, Swan makes use of intelligently chosen indices to minimize access to old data. We perform an exhaustive analysis of Swan and compare it with two state-of-the-art techniques for unique discovery: Gordian and Ducc. The results show that Swan significantly outperforms both, as well as their incremental adaptations. For inserts, Swan is more than 63x faster than Gordian and up to 50x faster than Ducc. For deletes, Swan is more than 15x faster than Gordian and up to 1 order of magnitude faster than Ducc. In fact, Swan even improves on the static case by dividing the dataset into a static part and a set of inserts. © 2014 IEEE. Source

Discover hidden collaborations