Computer Science and Artificial Intelligence Laboratory and

Computer Science and Artificial Intelligence Laboratory and

SEARCH FILTERS
Time filter
Source Type

PubMed | Free University of Berlin, Computer Science and Artificial Intelligence Laboratory and and Massachusetts Institute of Technology
Type: Journal Article | Journal: Cerebral cortex (New York, N.Y. : 1991) | Year: 2016

Every human cognitive function, such as visual object recognition, is realized in a complex spatio-temporal activity pattern in the brain. Current brain imaging techniques in isolation cannot resolve the brains spatio-temporal dynamics, because they provide either high spatial or temporal resolution but not both. To overcome this limitation, we developed an integration approach that uses representational similarities to combine measurements of magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) to yield a spatially and temporally integrated characterization of neuronal activation. Applying this approach to 2 independent MEG-fMRI data sets, we observed that neural activity first emerged in the occipital pole at 50-80 ms, before spreading rapidly and progressively in the anterior direction along the ventral and dorsal visual streams. Further region-of-interest analyses established that dorsal and ventral regions showed MEG-fMRI correspondence in representations later than early visual cortex. Together, these results provide a novel and comprehensive, spatio-temporally resolved view of the rapid neural dynamics during the first few hundred milliseconds of object vision. They further demonstrate the feasibility of spatially unbiased representational similarity-based fusion of MEG and fMRI, promising new insights into how the brain computes complex cognitive functions.


News Article | December 19, 2016
Site: www.scientificcomputing.com

One way to handle big data is to shrink it. If you can identify a small subset of your data set that preserves its salient mathematical relationships, you may be able to perform useful analyses on it that would be prohibitively time consuming on the full set. The methods for creating such “coresets” vary according to application, however. Last week, at the Annual Conference on Neural Information Processing Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique that’s tailored to a whole family of data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience, among many others. “These are all very general algorithms that are used in so many applications,” says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper. “They’re fundamental to so many problems. By figuring out the coreset for a huge matrix for one of these tools, you can enable computations that at the moment are simply not possible.” As an example, in their paper the researchers apply their technique to a matrix — that is, a table — that maps every article on the English version of Wikipedia against every word that appears on the site. That’s 1.4 million articles, or matrix rows, and 4.4 million words, or matrix columns. That matrix would be much too large to analyze using low-rank approximation, an algorithm that can deduce the topics of free-form texts. But with their coreset, the researchers were able to use low-rank approximation to extract clusters of words that denote the 100 most common topics on Wikipedia. The cluster that contains “dress,” “brides,” “bridesmaids,” and “wedding,” for instance, appears to denote the topic of weddings; the cluster that contains “gun,” “fired,” “jammed,” “pistol,” and “shootings” appears to designate the topic of shootings. Joining Rus on the paper are Mikhail Volkov, an MIT postdoc in electrical engineering and computer science, and Dan Feldman, director of the University of Haifa’s Robotics and Big Data Lab and a former postdoc in Rus’s group. The researchers’ new coreset technique is useful for a range of tools with names like singular-value decomposition, principal-component analysis, and latent semantic analysis. But what they all have in common is dimension reduction: They take data sets with large numbers of variables and find approximations of them with far fewer variables. In this, these tools are similar to coresets. But coresets are application-specific, while dimension-reduction tools are general-purpose. That generality makes them much more computationally intensive than coreset generation — too computationally intensive for practical application to large data sets. The researchers believe that their technique could be used to winnow a data set with, say, millions of variables — such as descriptions of Wikipedia pages in terms of the words they use — to merely thousands. At that point, a widely used technique like principal-component analysis could reduce the number of variables to mere hundreds, or even lower. The researchers’ technique works with what is called sparse data. Consider, for instance, the Wikipedia matrix, with its 4.4 million columns, each representing a different word. Any given article on Wikipedia will use only a few thousand distinct words. So in any given row — representing one article — only a few thousand matrix slots out of 4.4 million will have any values in them. In a sparse matrix, most of the values are zero. Crucially, the new technique preserves that sparsity, which makes its coresets much easier to deal with computationally. Calculations become lot easier if they involve a lot of multiplication by and addition of zero. The new coreset technique uses what’s called a merge-and-reduce procedure. It starts by taking, say, 20 data points in the data set and selecting 10 of them as most representative of the full 20. Then it performs the same procedure with another 20 data points, giving it two reduced sets of 10, which it merges to form a new set of 20. Then it does another reduction, from 20 down to 10. Even though the procedure examines every data point in a huge data set, because it deals with only small collections of points at a time, it remains computationally efficient. And in their paper, the researchers prove that, for applications involving an array of common dimension-reduction tools, their reduction method provides a very good approximation of the full data set. That method depends on a geometric interpretation of the data, involving something called a hypersphere, which is the multidimensional analogue of a circle. Any piece of multivariable data can be thought of as a point in a multidimensional space. In the same way that the pair of numbers (1, 1) defines a point in a two-dimensional space — the point one step over on the X-axis and one step up on the Y-axis — a row of the Wikipedia table, with its 4.4 million numbers, defines a point in a 4.4-million-dimensional space. The researchers’ reduction algorithm begins by finding the average value of the subset of data points — let’s say 20 of them — that it’s going to reduce. This, too, defines a point in a high-dimensional space; call it the origin. Each of the 20 data points is then “projected” onto a hypersphere centered at the origin. That is, the algorithm finds the unique point on the hypersphere that’s in the direction of the data point. The algorithm selects one of the 20 data projections on the hypersphere. It then selects the projection on the hypersphere farthest away from the first. It finds the point midway between the two and then selects the data projection farthest away from the midpoint; then it finds the point midway between those two points and selects the data projection farthest away from it; and so on. The researchers were able to prove that the midpoints selected through this method will converge very quickly on the center of the hypersphere. The method will quickly select a subset of points whose average value closely approximates that of the 20 initial points. That makes them particularly good candidates for inclusion in the coreset.


News Article | December 16, 2016
Site: www.eurekalert.org

When data sets get too big, sometimes the only way to do anything useful with them is to extract much smaller subsets and analyze those instead. Those subsets have to preserve certain properties of the full sets, however, and one property that's useful in a wide range of applications is diversity. If, for instance, you're using your data to train a machine-learning system, you want to make sure that the subset you select represents the full range of cases that the system will have to confront. Last week at the Conference on Neural Information Processing Systems, researchers from MIT's Computer Science and Artificial Intelligence Laboratory and its Laboratory for Information and Decision Systems presented a new algorithm that makes the selection of diverse subsets much more practical. Whereas the running times of earlier subset-selection algorithms depended on the number of data points in the complete data set, the running time of the new algorithm depends on the number of data points in the subset. That means that if the goal is to winnow a data set with 1 million points down to one with 1,000, the new algorithm is 1 billion times faster than its predecessors. "We want to pick sets that are diverse," says Stefanie Jegelka, the X-Window Consortium Career Development Assistant Professor in MIT's Department of Electrical Engineering and Computer Science and senior author on the new paper. "Why is this useful? One example is recommendation. If you recommend books or movies to someone, you maybe want to have a diverse set of items, rather than 10 little variations on the same thing. Or if you search for, say, the word 'Washington.' There's many different meanings that this word can have, and you maybe want to show a few different ones. Or if you have a large data set and you want to explore -- say, a large collection of images or health records -- and you want a brief synopsis of your data, you want something that is diverse, that captures all the directions of variation of the data. "The other application where we actually use this thing is in large-scale learning. You have a large data set again, and you want to pick a small part of it from which you can learn very well." Joining Jegelka on the paper are first author Chengtao Li, a graduate student in electrical engineering and computer science; and Suvrit Sra, a principal research scientist at MIT's Laboratory for Information and Decision Systems. Traditionally, if you want to extract a diverse subset from a large data set, the first step is to create a similarity matrix -- a huge table that maps every point in the data set against every other point. The intersection of the row representing one data item and the column representing another contains the points' similarity score on some standard measure. There are several standard methods to extract diverse subsets, but they all involve operations performed on the matrix as a whole. With a data set with a million data points -- and a million-by-million similarity matrix -- this is prohibitively time consuming. The MIT researchers' algorithm begins, instead, with a small subset of the data, chosen at random. Then it picks one point inside the subset and one point outside it and randomly selects one of three simple operations: swapping the points, adding the point outside the subset to the subset, or deleting the point inside the subset. The probability with which the algorithm selects one of those operations depends on both the size of the full data set and the size of the subset, so it changes slightly with every addition or deletion. But the algorithm doesn't necessarily perform the operation it selects. Again, the decision to perform the operation or not is probabilistic, but here the probability depends on the improvement in diversity that the operation affords. For additions and deletions, the decision also depends on the size of the subset relative to that of the original data set. That is, as the subset grows, it becomes harder to add new points unless they improve diversity dramatically. This process repeats until the diversity of the subset reflects that of the full set. Since the diversity of the full set is never calculated, however, the question is how many repetitions are enough. The researchers' chief results are a way to answer that question and a proof that the answer will be reasonable. ARCHIVE: "Shrinking bull's-eye" algorithm speeds up complex modeling from days to hours ARCHIVE: Collecting just the right data


News Article | December 15, 2016
Site: www.eurekalert.org

One way to handle big data is to shrink it. If you can identify a small subset of your data set that preserves its salient mathematical relationships, you may be able to perform useful analyses on it that would be prohibitively time consuming on the full set. The methods for creating such "coresets" vary according to application, however. Last week, at the Annual Conference on Neural Information Processing Systems, researchers from MIT's Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique that's tailored to a whole family of data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience, among many others. "These are all very general algorithms that are used in so many applications," says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper. "They're fundamental to so many problems. By figuring out the coreset for a huge matrix for one of these tools, you can enable computations that at the moment are simply not possible." As an example, in their paper the researchers apply their technique to a matrix -- that is, a table -- that maps every article on the English version of Wikipedia against every word that appears on the site. That's 1.4 million articles, or matrix rows, and 4.4 million words, or matrix columns. That matrix would be much too large to analyze using low-rank approximation, an algorithm that can deduce the topics of free-form texts. But with their coreset, the researchers were able to use low-rank approximation to extract clusters of words that denote the 100 most common topics on Wikipedia. The cluster that contains "dress," "brides," "bridesmaids," and "wedding," for instance, appears to denote the topic of weddings; the cluster that contains "gun," "fired," "jammed," "pistol," and "shootings" appears to designate the topic of shootings. Joining Rus on the paper are Mikhail Volkov, an MIT postdoc in electrical engineering and computer science, and Dan Feldman, director of the University of Haifa's Robotics and Big Data Lab and a former postdoc in Rus's group. The researchers' new coreset technique is useful for a range of tools with names like singular-value decomposition, principal-component analysis, and nonnegative matrix factorization. But what they all have in common is dimension reduction: They take data sets with large numbers of variables and find approximations of them with far fewer variables. In this, these tools are similar to coresets. But coresets simply reduce the size of a data set, while the dimension-reduction tools change its description in a way that's guaranteed to preserve as much information as possible. That guarantee, however, makes the tools much more computationally intensive than coreset generation -- too computationally intensive for practical application to large data sets. The researchers believe that their technique could be used to winnow a data set with, say, millions of variables -- such as descriptions of Wikipedia pages in terms of the words they use -- to merely thousands. At that point, a widely used technique like principal-component analysis could reduce the number of variables to mere hundreds, or even lower. The researchers' technique works with what is called sparse data. Consider, for instance, the Wikipedia matrix, with its 4.4 million columns, each representing a different word. Any given article on Wikipedia will use only a few thousand distinct words. So in any given row -- representing one article -- only a few thousand matrix slots out of 4.4 million will have any values in them. In a sparse matrix, most of the values are zero. Crucially, the new technique preserves that sparsity, which makes its coresets much easier to deal with computationally. Calculations become lot easier if they involve a lot of multiplication by and addition of zero. The new coreset technique uses what's called a merge-and-reduce procedure. It starts by taking, say, 20 data points in the data set and selecting 10 of them as most representative of the full 20. Then it performs the same procedure with another 20 data points, giving it two reduced sets of 10, which it merges to form a new set of 20. Then it does another reduction, from 20 down to 10. Even though the procedure examines every data point in a huge data set, because it deals with only small collections of points at a time, it remains computationally efficient. And in their paper, the researchers prove that, for applications involving an array of common dimension-reduction tools, their reduction method provides a very good approximation of the full data set. That method depends on a geometric interpretation of the data, involving something called a hypersphere, which is the multidimensional analogue of a circle. Any piece of multivariable data can be thought of as a point in a multidimensional space. In the same way that the pair of numbers (1, 1) defines a point in a two-dimensional space -- the point one step over on the X-axis and one step up on the Y-axis -- a row of the Wikipedia table, with its 4.4 million numbers, defines a point in a 4.4-million-dimensional space. The researchers' reduction algorithm begins by finding the average value of the subset of data points -- let's say 20 of them -- that it's going to reduce. This, too, defines a point in a high-dimensional space; call it the origin. Each of the 20 data points is then "projected" onto a hypersphere centered at the origin. That is, the algorithm finds the unique point on the hypersphere that's in the direction of the data point. The algorithm selects one of the 20 data projections on the hypersphere. It then selects the projection on the hypersphere farthest away from the first. It finds the point midway between the two and then selects the data projection farthest away from the midpoint; then it finds the point midway between those two points and selects the data projection farthest away from it; and so on. The researchers were able to prove that the midpoints selected through this method will converge very quickly on the center of the hypersphere. The method will quickly select a subset of points whose average value closely approximates that of the 20 initial points. That makes them particularly good candidates for inclusion in the coreset. ARCHIVE: Collecting just the right data


The methods for creating such "coresets" vary according to application, however. Last week, at the Annual Conference on Neural Information Processing Systems, researchers from MIT's Computer Science and Artificial Intelligence Laboratory and the University of Haifa in Israel presented a new coreset-generation technique that's tailored to a whole family of data analysis tools with applications in natural-language processing, computer vision, signal processing, recommendation systems, weather prediction, finance, and neuroscience, among many others. "These are all very general algorithms that are used in so many applications," says Daniela Rus, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT and senior author on the new paper. "They're fundamental to so many problems. By figuring out the coreset for a huge matrix for one of these tools, you can enable computations that at the moment are simply not possible." As an example, in their paper the researchers apply their technique to a matrix—that is, a table—that maps every article on the English version of Wikipedia against every word that appears on the site. That's 1.4 million articles, or matrix rows, and 4.4 million words, or matrix columns. That matrix would be much too large to analyze using low-rank approximation, an algorithm that can deduce the topics of free-form texts. But with their coreset, the researchers were able to use low-rank approximation to extract clusters of words that denote the 100 most common topics on Wikipedia. The cluster that contains "dress," "brides," "bridesmaids," and "wedding," for instance, appears to denote the topic of weddings; the cluster that contains "gun," "fired," "jammed," "pistol," and "shootings" appears to designate the topic of shootings. Joining Rus on the paper are Mikhail Volkov, an MIT postdoc in electrical engineering and computer science, and Dan Feldman, a lecturer at the University of Haifa and a former postdoc in Rus's group. The researchers' new coreset technique is useful for a range of tools with names like singular-value decomposition, principal-component analysis, and nonnegative matrix factorization. But what they all have in common is dimension reduction: They take data sets with large numbers of variables and find approximations of them with far fewer variables. In this, these tools are similar to coresets. But coresets simply reduce the size of a data set, while the dimension-reduction tools change its description in a way that's guaranteed to preserve as much information as possible. That guarantee, however, makes the tools much more computationally intensive than coreset generation—too computationally intensive for practical application to large data sets. The researchers believe that their technique could be used to winnow a data set with, say, millions of variables—such as descriptions of Wikipedia pages in terms of the words they use—to merely thousands. At that point, a widely used technique like principal-component analysis could reduce the number of variables to mere hundreds, or even lower. The researchers' technique works with what is called sparse data. Consider, for instance, the Wikipedia matrix, with its 4.4 million columns, each representing a different word. Any given article on Wikipedia will use only a few thousand distinct words. So in any given row—representing one article—only a few thousand matrix slots out of 4.4 million will have any values in them. In a sparse matrix, most of the values are zero. Crucially, the new technique preserves that sparsity, which makes its coresets much easier to deal with computationally. Calculations become lot easier if they involve a lot of multiplication by and addition of zero. The new coreset technique uses what's called a merge-and-reduce procedure. It starts by taking, say, 20 data points in the data set and selecting 10 of them as most representative of the full 20. Then it performs the same procedure with another 20 data points, giving it two reduced sets of 10, which it merges to form a new set of 20. Then it does another reduction, from 20 down to 10. Even though the procedure examines every data point in a huge data set, because it deals with only small collections of points at a time, it remains computationally efficient. And in their paper, the researchers prove that, for applications involving an array of common dimension-reduction tools, their reduction method provides a very good approximation of the full data set. That method depends on a geometric interpretation of the data, involving something called a hypercircle, which is the multidimensional analogue of a circle. Any piece of multivariable data can be thought of as a point in a multidimensional space. In the same way that the pair of numbers (1, 1) defines a point in a two-dimensional space—the point one step over on the X-axis and one step up on the Y-axis—a column of the Wikipedia table, with its 4.4 million numbers, defines a point in a 4.4-million-dimensional space. The researchers' reduction algorithm begins by finding the average value of the subset of data points—let's say 20 of them—that it's going to reduce. This, too, defines a point in a high-dimensional space; call it the origin. Each of the 20 data points is then "projected" onto a hypercircle centered at the origin. That is, the algorithm finds the unique point on the hypercircle that's in the direction of the data point. The algorithm selects one of the 20 data projections on the hypercircle. It then selects the projection on the hypercircle farthest away from the first. It finds the point midway between the two and then selects the data projection farthest away from the midpoint; then it finds the point midway between those two points and selects the data projection farthest away from it; and so on. The researchers were able to prove that the midpoints selected through this method will converge very quickly on the center of the hypercircle. The method will quickly select a subset of points whose average value closely approximates that of the 20 initial points. That makes them particularly good candidates for inclusion in the coreset. More information: Dimensionality Reduction of Massive Sparse Datasets Using Coresets: arxiv.org/pdf/1503.01663v1.pdf


News Article | November 7, 2016
Site: www.sciencenewsdaily.org

Researchers from MIT's Computer Science and Artificial Intelligence Laboratory and Stony Brook University have developed a new system that allows users to describe what they want their programs to do in very general terms. It then automatically produces versions of those programs that are optimized to run on multicore chips. It also guarantees that the new versions will yield exactly the same results that the single-core versions would, albeit much faster. New system lets nonexperts optimize programs that run on multiprocessor chips Dynamic programming is a technique that can yield relatively efficient solutions to computational problems in economics, genomic analysis, and other fields. But adapting it to computer chips ... Researchers from MIT's Computer Science and Artificial Intelligence Laboratory and Stony Brook University have developed a new system that allows users to describe what they want their programs ... New system lets nonexperts optimize programs that run on multiprocessor chips. Dynamic programming is a technique that can yield relatively efficient solutions to computational problems ...

Loading Computer Science and Artificial Intelligence Laboratory and collaborators
Loading Computer Science and Artificial Intelligence Laboratory and collaborators