Hsueh Y.-L.,Teradata |
Zimmermann R.,National University of Singapore |
Ku W.-S.,Auburn University |
Jin Y.,University of Hong Kong
Proceedings - International Conference on Data Engineering | Year: 2011
Skyline query processing has become an important feature in multi-dimensional, data-intensive applications. Such computations are especially challenging under dynamic conditions, when either snapshot queries need to be answered with short user response times or when continuous skyline queries need to be maintained efficiently over a set of objects that are frequently updated. To achieve high performance, we have recently designed the ESC algorithm, an Efficient update approach for Skyline Computations. ESC creates a pre-computed candidate skyline set behind the first skyline (a second line of defense, so to speak) that facilitates an incremental, two-stage skyline update strategy which results in a quicker query response time for the user. Our demonstration presents the two-threaded SkyEngine system that builds upon and extends the base-features of the ESC algorithm with innovative, user-oriented functionalities that are termed SkyAlert and AutoAdjust. These functions enable a data or service provider to be informed about and gain the opportunity of automatically promoting its data records to remain part of the skyline, if so desired. The SkyEngine demonstration includes both a server and a web browser based client. Finally, the SkyEngine system also provides visualizations that reveal its internal performance statistics. © 2011 IEEE.
Rizk A.,Teradata |
Elragal A.,German University in Cairo
18th Americas Conference on Information Systems 2012, AMCIS 2012 | Year: 2012
Recent developments in wireless technology, mobility and networking infrastructures increased the amounts of data being captured every second. Data captured from the digital traces of moving objects and devices is called trajectory data. With the increasing volume of spatiotemporal trajectories, constructive and meaningful knowledge needs to be extracted. In this paper, a conceptual framework is proposed to apply data mining techniques on trajectories and semantically enrich the extracted patterns. A design science research approach is followed, where the framework is tested and evaluated using a prototypical instantiation, built to support decisions in the context of the Egyptian tourism industry. By applying association rule mining, the revealed time-stamped frequently visited regions of interest (ROI) patterns show that specific semantic annotations are required at early stages in the process and on lower levels of detail, refuting the presumption of cross-application usable patterns. © (2012) by the AIS/ICIS Administrative Office All rights reserved.
Anandan B.,Purdue University |
Clifton C.,Purdue University |
Jiang W.,Missouri University of Science and Technology |
Murugesan M.,Teradata |
And 2 more authors.
Transactions on Data Privacy | Year: 2012
De-identified data has the potential to be shared widely to support decision making and research. While significant advances have been made in anonymization of structured data, anonymization of textual information is in it infancy. Document sanitization requires finding and removing personally identifiable information. While current tools are effective at removing specific types of information (names, addresses, dates), they fail on two counts. The first is that complete text redaction may not be necessary to prevent re-identification, since this can affect the readability and usability of the text. More serious is that identifying information, as well as sensitive information, can be quite subtle and still be present in the text even after the removal of obvious identifiers. Observe that a diagnosis "tuberculosis" is sensitive, but in some situations it can also be identifying. Replacing it with the less sensitive term "infectious disease" also reduces identifiability. That is, instead of simply removing sensitive terms, these terms can be hidden by more general but semantically related terms to protect sensitive and identifying information, without unnecessarily degrading the amount of information contained in the document. Based on this observation, the main contribution of this paper is to provide a novel information theoretic approach to text sanitization and develop efficient heuristics to sanitize text documents.
Scanlon J.R.,Teradata |
Gerber M.S.,University of Virginia
IEEE Transactions on Information Forensics and Security | Year: 2015
The Internet's increasing use as a means of communication has led to the formation of cyber communities, which have become appealing to violent extremist (VE) groups. This paper presents research on forecasting the daily level of cyber-recruitment activity of VE groups. We used a previously developed support vector machine model to identify recruitment posts within a Western jihadist discussion forum. We analyzed the textual content of this data set with latent Dirichlet allocation (LDA), and we fed these analyses into a variety of time series models to forecast cyber-recruitment activity within the forum. Quantitative evaluations showed that employing LDA-based topics as predictors within time series models reduces forecast error compared with naive (random-walk), autoregressive integrated moving average, and exponential smoothing baselines. To the best of our knowledge, this is the first result reported on this forecasting task. This research could ultimately help assist with efficient allocation of intelligence analysts in response to predicted levels of cyber-recruitment activity. © 2015 IEEE.
Johnston J.,CGG |
Proceedings of the Annual Offshore Technology Conference | Year: 2015
In the current cost-saving and high-tech environment, this paper aims at demonstrating that significant business value can be derived from advanced information technology. The objective was indeed to identify and reduce risk in the Drilling and Wells domains using iterative, multi-disciplinary Big Data analytics and workflows. Examples of operational risk identified in this project include low borehole quality, poor wellbore stability, and stuck pipe. Subject-matter expertise and advanced analytical capabilities were assembled to mine and analyze large amounts of different data types across drilling parameters, petrophysics and well logs, and geological formation tops for a released data set of approximately 350 oil and gas wells in the UK North Sea. The data set contained information about a large geographical area, which conventional analysis techniques would find difficult, if not impossible, to handle and analyze in its entirety. Results of this study showed that iterative Big Data "discovery workflows" uncover hidden patterns and unknown correlations in the data and unexpected correlations across the data set are exhibited. It also confirmed the possibility to improve Drilling models using business analytics. In addition the correlations found allow predictive statistics to be computed. Finally advanced visualization capabilities provided an aid to interpret, understand, and make recommendations for Drilling plan and operations. This novel approach uncovered that patterns and correlations can be detected across a disparate data set, where data types are not traditionally linked, by integrating a large variety and complexity of data in one analytical environment. Furthermore the multi-domain analyses run during the study were all performed 'on-the-fly', without preconception or business requirements. As a final point Big Data Analytics can also be used as a Quality Control tool and will certainly be leveraged for further multi-variate analysis in Oil and Gas. Copyright © (2015) by the Offshore Technology Conference All rights reserved.