Tang X.,Teradata Aster |
Wehrmeister R.,Teradata Aster |
Shau J.,Teradata Aster |
Chakraborty A.,Teradata Aster |
And 16 more authors.
2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016 | Year: 2016
There is increasing demand to integrate big data analytic systems using SQL. Given the vast ecosystem of SQL applications, enabling SQL capabilities allows big data platforms to expose their analytic potential to a wide variety of end users, accelerating discovery processes and providing significant business value. Most existing big data frameworks are based on one particular programming model such as MapReduce or Graph. However, data scientists are often forced to manually create adhoc data pipelines to connect various big data tools and platforms to serve their analytic needs. When the analytic tasks change, these data pipelines may be costly to modify and maintain. In this paper we present SQL-SA, a polymorphic and parallelizable SQL scalar and aggregate infrastructure in Aster 6.20. This infrastructure extends Aster 6's MapReduce and Graph capabilities to support polymorphic user-defined scalar and aggregate functions using flexible SQL syntax. The implementation enhances main Aster components including query syntax, API, planning and execution extensively. Integrating these new user-defined scalar and aggregate functions with Aster MapReduce and Graph functions, Aster 6.20 enables data scientists to integrate diverse programming models in a single SQL statement. The statement is automatically converted to an optimal data pipeline and executed in parallel. Using a real world business problem and data, Aster 6.20 demonstrates a significant performance advantage (25%+) over Hadoop Pig and Hive. © 2016 IEEE.
Schnaitter K.,Teradata Aster |
Polyzotis N.,University of California at Santa Cruz
Proceedings of the VLDB Endowment | Year: 2012
To obtain a high level of system performance, a database administrator (DBA) must choose a set of indices that is appropriate for the workload. The system can aid in this challenging task by providing recommendations for the index configuration. We propose a new index recommendation technique, termed semi-automatic tuning, that keeps the DBA "in the loop" by generating recommendations that use feedback about the DBA's preferences. The technique also works online, which avoids the limitations of commercial tools that require the workload to be known in advance. The foundation of our approach is the Work Function Algorithm, which can solve a wide variety of online optimization problems with strong competitive guarantees. We present an experimental analysis that validates the benefits of semi-automatic tuning in a wide variety of conditions. © 2012 VLDB Endowment.
Simmen D.,Teradata Aster |
Schnaitter K.,Teradata Aster |
Davis J.,Teradata Aster |
He Y.,Teradata Aster |
And 5 more authors.
Proceedings of the VLDB Endowment | Year: 2014
Graph analytics is an important big data discovery technique. Applications include identifying influential employees for retention, detecting fraud in a complex interaction network, and determining product affinities by exploiting community buying patterns. Specialized platforms have emerged to satisfy the unique processing requirements of large-scale graph analytics; however, these platforms do not enable graph analytics to be combined with other analytics techniques, nor do they work well with the vast ecosystem of SQL-based business applications. Teradata Aster 6.0 adds support for large-scale graph analytics to its repertoire of analytics capabilities. The solution extends the multi-engine processing architecture with support for bulk synchronous parallel execution, and a specialized graph engine that enables iterative analysis of graph structures. Graph analytics functions written to the vertex-oriented API exposed by the graph engine can be invoked from the context of an SQL query and composed with existing SQL-MR functions, thereby enabling data scientists and business applications to express computations that combine large-scale graph analytics with techniques better suited to a different style of processing. The solution includes a suite of pre-built graph analytic functions adapted for parallel execution. © 2014 VLDB Endowment 2150-8097/14/08.
Pandit A.,Teradata Aster |
Kondo D.,Teradata Aster |
Simmen D.,Teradata Aster |
Norwood A.,Teradata Aster |
Bai T.,Teradata Aster
Proceedings - International Conference on Data Engineering | Year: 2015
The volume, velocity, and variety of Big Data necessitate the development of new and innovative data processing software. A multitude of SQL implementations on distributed systems have emerged in recent years to enable large-scale data analysis. User-Defined Table operators (written in procedural languages) embedded in these SQL implementations are a powerful mechanism to succinctly express and perform analytic operations typical in Big Data discovery workloads. Table operators can be easily customized to implement different processing models such as map, reduce and graph execution. Despite an inherently parallel execution model, the performance and scalability of these table operators is greatly restricted as they appear as a black box to a typical SQL query optimizer. The optimizer is not able to infer even the basic properties of table operators, prohibiting the application of optimization rules and strategies. In this paper, we introduce an innovative concept of 'Collaborative Planning', which results in the removal of redundant operations and a more optimal rearrangement of query plan operators. The optimization of the query proceeds through a collaborative exchange between the planner and the table operator. Plan properties and context information of surrounding query plan operations are exchanged between the optimizer and the table operator. Knowing these properties also allows the author of the table operator to optimize its embedded logic. Our main contribution in this paper is the design and implementation of Collaborative Planning in the Teradata Aster 6 system. Using real-world workloads, we show that Collaborative Planning reduces query execution times as much as 90.0% in common use cases, resulting in a 24x speedup. © 2015 IEEE.