Ayasdi Inc. | Date: 2016-09-12
An example method includes receiving a data set, each data point in the data set being associated with an indication of time, and a distance function, determining overlapping intervals over a time period range, identifying subsets of data in each overlapping interval based on the indications of time, applying the distance function to each subset of data to identify groups, constructing a node for each group to create a plurality of nodes, determining if two nodes of the plurality of nodes in adjacent time periods are connected by scoring shared data point membership between the two nodes and comparing a score of the shared data point membership to a threshold, and displaying at least two nodes with an indication of time, the two nodes being connected by a line based on the comparison of the score and the threshold.
Ayasdi Inc. | Date: 2017-01-11
Exemplary systems and methods to improve capture of relationships within information are provided. In various embodiments, a system comprises a landmark module configured to choose a set of landmarks from data in a finite metric space, the set of landmarks being a subset of points in the finite metric space, a nearest neighbor module configured to compute, for each landmark, a predetermined number of nearest neighbor landmarks in the set of landmarks, a graph construction module configured to identify at least one pair of landmarks that are nearest neighbors to each other, an edge generator module configured to add an edge between the at least one pair of landmarks, and a non-landmark projection module configured to project nonlandmark points based on the landmarks and one or more edges thereby enabling at least one shape to indicate relationships in the data.
Ayasdi Inc. | Date: 2016-12-12
An example method includes receiving a first set of data identifying entities and performance information for analysis, receiving a second set of data identifying entities and performance information associated with known or suspected past fraud or abuse, receiving metric and lens selections, performing metric and lens functions based on the metric and lens selections on first and second set of data, generating cover of reference space and cluster mapped performance information to identify nodes in a graph, each node including one or more entities as members, each node being connected to another node if they share at least one common entity as members, identifying nodes that include at least one member from the second set of data, determining entities that are members of the identified nodes that are from the first set of data, and generating a first report listing the determined entities as possibly involved in fraud or waste.
Ayasdi Inc. | Date: 2016-05-26
An example method includes determining a point from a data set closest to a particular data point using a particular metric and scoring a particular data point based on whether the closest point shares a similar characteristic, selecting a subset of metrics based on the metric score to generate a subset of metrics, evaluating a metric-lens combination by calculating a metric-lens score based on entropy of shared characteristics across subspaces of a reference map generated by the metric-lens combination, selecting a metric-lens combination based on the metric-lens score, generating topological representations using the received data set, associating each node with at least one shared characteristic based on member data points of that particular node sharing the shared characteristic, scoring groups within each topological representation based on entropy, scoring topological representation based on the group scores, and providing a visualization of at least one topological representation based on the graph scores.
Ayasdi Inc. | Date: 2015-10-15
An exemplary method comprises receiving data points, selecting a first subset of the data points to generate an initial set of landmarks, each data point of the first subset defining a landmark point and for each non-landmark data point: calculating first data point distances between a respective non-landmark data point and each landmark point of the initial set of landmarks, identifying a first shortest data point distance from among the first data point distances between the respective non-landmark data point and each landmark point of the initial set of landmarks, and storing the first shortest data point distance as a first landmark distance for the respective non-landmark data point. The method further comprising identifying a non-landmark data point with a longest first landmark distance in comparison with other first landmark distances and adding the identified non-landmark data point associated as a first landmark point to the initial set of landmarks.
Ayasdi Inc. | Date: 2016-07-25
An example method includes receiving a data set, generating a topological representation using topological data analysis, at least one metric-lens combination, and the data set, the representation including a plurality of nodes, each of the nodes having one or more data points as members, receiving a new data point, determining distances between the new data point and at least some of the one or more data points, locating the new data point in a location relative to one or more of the nodes using the distances, identifying a subset of the data points closest to the location of the new data point, comparing the subset of the data points to at least some information regarding the new data point to identify a regime, and generating a report indicating a model associating factors associated with the subset of the data points with the new data point for predicting future outcomes.
Agency: Department of Defense | Branch: Defense Advanced Research Projects Agency | Program: SBIR | Phase: Phase II | Award Amount: 2.00M | Year: 2014
Ayasdi, a leader in the new field of Topological Data Analysis, proposes to extend the previous effort in utilizing persistent homology in exploring data fusion. As part of this work, we propose both to deploy existing propriety Ayasdi technology and to develop new analytical approaches to interpret and provide users with a deeper understanding of disparate data sources. This work will include a development of persistent tools for fusing multiple data streams. In particular, we intend to explore two use cases: 1) Looking at multiple modalities of the same source (brain), for example a. physiological (electric impulses, etc.) b. genetic c. anatomical 2) Looking at the same modality of different data sources (brains). The final demonstration would include an automated analysis strategy for topological networks representing multi-modal data, employing persistence in identifying the networks that best characterize the underlying geometry of the data.
Ayasdi Inc. | Date: 2016-03-11
A method comprises receiving a network of a plurality of nodes and a plurality of edges, each of the nodes comprising members representative of at least one subset of training data points, each of the edges connecting nodes that share at least one data point, grouping the data points into a plurality of groups, each data point being a member of at least one group, creating a first transformation data set, the first transformation data set including the training data set as well as a plurality of feature subsets associated with at least one group, values of a particular data point for a particular feature subset for a particular group being based on values of the particular data point if the particular data point is a member of the particular group, and applying a machine learning model to the first transformation data set to generate a prediction model.
Ayasdi Inc. | Date: 2016-03-14
Autogrouping is described. An example method includes receiving a data set, building a first partition of subsets of the data set, computing a first subset score for each subset using a scoring function, generating a next partition including at least one subset that includes the elements of two or more subsets of the first partition, computing a second subset score for each subset of the next partition using the scoring function, defining a max score for each particular subset using a max score function, each max score being based on maximal subset scores of that particular subset and at least the subsets of the first partition related to that particular subset, selecting output subsets, selection of each of the output subsets being made using a maximum score of previously computed subset scores, and generating a report indicating an output partition, the output subsets being associated with the received data set.
Ayasdi Inc. | Date: 2016-05-05
An example method comprises receiving data points, determining at least one size of a plurality of subsets based on a constraint of at least one computation device or an analysis server, transferring each of the subsets to different computation devices, each computation device selecting a group of data points to generate a first sub-subset of landmarks, add non-landmark data points that have the farthest distance to the closest landmark to create an expanded sub-subset of landmarks, create an analysis landmark set based on a combination of expanded sub-subsets of expanded landmarks from different computation devices, perform a similarity function on the analysis landmark set, generate a cover of the mathematical reference space to create overlapping subsets, cluster the mapped landmark points based on the overlapping subsets, create a plurality of nodes, each node being based on the clustering, each landmark point being a member of at least one node.