Wesley R.,Seattle Software |
Xu F.,Seattle Software
Proceedings of the VLDB Endowment | Year: 2016
Windowed aggregates are a SQL 2003 feature for computing aggregates in moving windows. Common examples include cumulative sums, local maxima and moving quantiles. With the advent over the last few years of easy-to-use data analytics tools, these functions are becoming widely used by more and more analysts, but some aggregates (such as local maxima) are much easier to compute than others (such as moving quantiles). Nevertheless, aggregates that are more difficult to compute, like quantile and mode (or "most frequent") provide more appropriate statistical summaries in the common situation when a distribution is not Gaussian and are an essential part of a data analysis toolkit. Recent work has described highly efficient windowed implementations of the most common aggregate function categories, including distributive aggregates such as cumulative sums and algebraic aggregates such as moving averages. But little has been published on either the implementation or the performance of the more complex holistic windowed aggregates such as moving quantiles. This paper provides the first in-depth study of how to efficiently implement the three most common holistic windowed aggregates (count distinct, mode and quantile) by reusing the aggregate state between consecutive frames. Our measurements show that these incremental algorithms generally achieve improvements of about 10× over naïve implementations, and that they can effectively detect when to reset the internal state during extreme frame variation. © 2016 VLDB Endowment 2150-8097/16/08.
Eckels J.,Seattle Software
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] | Year: 2011
LabKey Server (formerly CPAS, the Computational Proteomics Analysis System) provides a Web-based platform for mining data from liquid chromatography-tandem mass spectrometry (LC-MS/MS) proteomic experiments. This open source platform supports systematic proteomic analyses and secure data management, integration, and sharing. LabKey Server incorporates several tools currently used in proteomic analysis, including the X! Tandem search engine, the ProteoWizard toolkit, and the PeptideProphet and ProteinProphet data mining tools. These tools and others are integrated into LabKey Server, which provides an extensible architecture for developing high-throughput biological applications. The LabKey Server analysis pipeline acts on data in standardized file formats, so that researchers may use LabKey Server with other search engines, including Mascot or SEQUEST, that follow a standardized format for reporting search engine results. Supported builds of LabKey Server are freely available at http://www.labkey.com/. Documentation and source code are available under the Apache License 2.0 at http://www.labkey.org. © 2011 by John Wiley & Sons, Inc.
Morton K.,University of Washington |
Balazinska M.,University of Washington |
Grossman D.,University of Washington |
Mackinlay J.,Seattle Software
Proceedings of the VLDB Endowment | Year: 2014
We present a vision of next-generation visual analytics ser-vices. We argue that these services should have three related capabilities: support visual and interactive data exploration as they do today, but also suggest relevant data to enrich visualizations, and facilitate the integration and cleaning of that data. Most importantly, they should provide all these capabilities seamlessly in the context of an uninterrupted data analysis cycle. We present the challenges and opportu-nities in building next-generation visual analytics services. © 2014 VLDB Endowment.
Wesley R.,Seattle Software |
Terlecki P.,Seattle Software
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2014
Data sets are growing rapidly and there is an attendant need for tools that facilitate human analysis of them in a timely manner. To help meet this need, column-oriented databases (or "column stores") have come into wide use because of their low latency on analytic workloads. Column stores use a number of techniques to produce these dramatic performance techniques, including the ability to perform operations directly on compressed data. In this paper, we describe how the Tableau Data Engine (an internally developed column store) leverages a number of compression techniques to improve query performance. The approach is simpler than existing systems for operating on compressed data and more unified, removing the necessity for custom data access mechanisms. The approach also uses some novel metadata extraction techniques to improve the choices made by the system's run-time optimizer. © 2014 ACM.
Wesley R.,Seattle Software |
Eldridge M.,Seattle Software |
Terlecki P.T.,Seattle Software
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2011
Efficient data processing is critical for interactive visualization of analytic data sets. Inspired by the large amount of recent research on column-oriented stores, we have developed a new specialized analytic data engine tightly-coupled with the Tableau data visualization system. The Tableau Data Engine ships as an integral part of Tableau 6.0 and is intended for the desktop and server environments. This paper covers the main requirements of our project, system architecture and query-processing pipeline. We use real-life visualization scenarios to illustrate basic concepts and provide experimental evaluation. © 2011 ACM.
Kosara R.,Seattle Software |
MacKinlay J.,Seattle Software
Computer | Year: 2013
Presentation-specifically, its use of elements from storytelling-is the next logical step in visualization research and should be a focus of at least equal importance with exploration and analysis. © 1970-2012 IEEE.
Setlur V.,Tableau Software |
Mackinlay J.D.,Seattle Software
Conference on Human Factors in Computing Systems - Proceedings | Year: 2014
Authors use icon encodings to indicate the semantics of categorical information in visualizations. The default icon libraries found in visualization tools often do not match the semantics of the data. Users often manually search for or create icons that are more semantically meaningful. This process can hinder the flow of visual analysis, especially when the amount of data is large, leading to a suboptimal user experience. We propose a technique for automatically generating semantically relevant icon encodings for categorical dimensions of data points. The algorithm employs natural language processing in order to find relevant imagery from the Internet. We evaluate our approach on Mechanical Turk by generating large libraries of icons using Tableau Public workbooks that represent real analytical effort by people out in the world. Our results show that the automatic algorithm does nearly as well as the manually created icons, and particularly has higher user satisfaction for larger cardinalities of data.
Morton K.,University of Washington |
Bunker R.,Seattle Software |
MacKinlay J.,Seattle Software |
Morton R.,Seattle Software |
Stolte C.,Seattle Software
Proceedings of the ACM SIGMOD International Conference on Management of Data | Year: 2012
Tableau is a commercial business intelligence (BI) software tool that supports interactive, visual analysis of data. Armed with a visual interface to data and a focus on usability, Tableau enables a wide audience of end-users to gain insight into their datasets. The user experience is a fluid process of interaction in which exploring and visualizing data takes just a few simple drag-and-drop operations (no programming or DB experience necessary). In this context of exploratory, ad-hoc visual analysis, we describe a novel approach to integrating large, heterogeneous data sources. We present a new feature in Tableau called data blending, which gives users the ability to create data visualization mashups from structured, heterogeneous data sources dynamically without any upfront integration effort. Users can author visualizations that automatically integrate data from a variety of sources, including data warehouses, data marts, text files, spreadsheets, and data cubes. Because our data blending system is workload driven, we are able to bypass many of the pain-points and uncertainty in creating mediated schemas and schema-mappings in current pay-as-you-go integration systems. © 2012 ACM.
Heer J.,Stanford University |
Stone M.,Seattle Software
Conference on Human Factors in Computing Systems - Proceedings | Year: 2012
Our ability to reliably name colors provides a link between visual perception and symbolic cognition. In this paper, we investigate how a statistical model of color naming can enable user interfaces to meaningfully mimic this link and support novel interactions. We present a method for constructing a probabilistic model of color naming from a large, unconstrained set of human color name judgments. We describe how the model can be used to map between colors and names and define metrics for color saliency (how reliably a color is named) and color name distance (the similarity between colors based on naming patterns). We then present a series of applications that demonstrate how color naming models can enhance graphical interfaces: a color dictionary & thesaurus, name-based pixel selection methods for image editing, and evaluation aids for color palette design. Copyright 2012 ACM.