Time filter

Source Type

Jenatton R.,CNRS ENS Informatics Department | Audibert J.-Y.,University Paris Est Creteil | Bach F.,CNRS ENS Informatics Department
Journal of Machine Learning Research | Year: 2011

We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual l1-norm and the group l1-norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for least-squares linear regression in low and high-dimensional settings. © 2011 Rodolphe Jenatton, Jean-Yves Audibert and Francis Bach.


Mairal J.,CNRS ENS Informatics Department | Bach F.,CNRS ENS Informatics Department | Ponce J.,CNRS ENS Informatics Department | Sapiro G.,University of Minnesota
Journal of Machine Learning Research | Year: 2010

Sparse coding-that is, modelling data vectors as sparse linear combinations of basis elements-is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, non-negative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to state-of-the-art performance in terms of speed and optimization for both small and large data sets. © 2010 Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro.


Furukawa Y.,Google | Ponce J.,CNRS ENS Informatics Department
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2010

This paper proposes a novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images. Stereopsis is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these before using visibility constraints to filter away false matches. The keys to the performance of the proposed algorithm are effective techniques for enforcing local photometric consistency and global visibility constraints. Simple but effective methods are also proposed to turn the resulting patch model into a mesh which can be further refined by an algorithm that enforces both photometric consistency and regularization constraints. The proposed approach automatically detects and discards outliers and obstacles and does not require any initialization in the form of a visual hull, a bounding box, or valid depth ranges. We have tested our algorithm on various data sets including objects with fine surface details, deep concavities, and thin structures, outdoor scenes observed from a restricted set of viewpoints, and crowded scenes where moving obstacles appear in front of a static structure of interest. A quantitative evaluation on the Middlebury benchmark [CHECK END OF SENTENCE] shows that the proposed method outperforms all others submitted so far for four out of the six data sets. © 2010 IEEE.


Jenatton R.,CNRS ENS Informatics Department | Mairal J.,CNRS ENS Informatics Department | Obozinski G.,CNRS ENS Informatics Department | Bach F.,CNRS ENS Informatics Department
Journal of Machine Learning Research | Year: 2011

Sparse coding consists in representing signals as sparse linear combinations of atoms selected from a dictionary. We consider an extension of this framework where the atoms are further assumed to be embedded in a tree. This is achieved using a recently introduced tree-structured sparse regularization norm, which has proven useful in several applications. This norm leads to regularized problems that are difficult to optimize, and in this paper, we propose efficient algorithms for solving them. More precisely, we show that the proximal operator associated with this norm is computable exactly via a dual approach that can be viewed as the composition of elementary proximal operators. Our procedure has a complexity linear, or close to linear, in the number of atoms, and allows the use of accelerated gradient techniques to solve the tree-structured sparse approximation problem at the same computational cost as traditional ones using the ̀1-norm. Our method is efficient and scales gracefully to millions of variables, which we illustrate in two types of applications: first, we consider fixed hierarchical dictionaries of wavelets to denoise natural images. Then, we apply our optimization tools in the context of dictionary learning, where learned dictionary elements naturally self-organize in a prespecified arborescent structure, leading to better performance in reconstruction of natural image patches. When applied to text documents, our method learns hierarchies of topics, thus providing a competitive alternative to probabilistic topic models. © 2011 Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski and Francis Bach.


Mairal J.,University of California at Berkeley | Bach F.,CNRS ENS Informatics Department | Ponce J.,CNRS ENS Informatics Department
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2012

Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations. © 2012 IEEE.


Bach F.,CNRS ENS Informatics Department
Journal of Machine Learning Research | Year: 2014

In this paper, we consider supervised learning problems such as logistic regression and study the stochastic gradient method with averaging, in the usual stochastic approximation setting where observations are used only once. We show that after N iterations, with a constant step-size proportional to 1=R 2 √N where N is the number of observations and R is the maximum norm of the observations, the convergence rate is always of order O(1= √N), and improves to O(R2=μN) where μ is the lowest eigenvalue of the Hessian at the global optimum (when this eigenvalue is greater than R2= √N). Since μ does not need to be known in advance, this shows that averaged stochastic gradient is adaptive to unknown local strong convexity of the objective function. Our proof relies on the generalized selfconcordance properties of the logistic loss and thus extends to all generalized linear models with uniformly bounded features. © 2014 Francis Bach.


Philbin J.,University of Oxford | Sivic J.,CNRS ENS Informatics Department | Zisserman A.,University of Oxford
International Journal of Computer Vision | Year: 2011

Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked. Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how "hub images" (images representative of an object or landmark) can easily be extracted from our matching graph representation. We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images). © 2010 Springer Science+Business Media, LLC.


Aubry M.,CNRS ENS Informatics Department | Russell B.C.,CNRS ENS Informatics Department | Sivic J.,CNRS ENS Informatics Department
ACM Transactions on Graphics | Year: 2014

This article describes a technique that can reliably align arbitrary 2D depictions of an architectural site, including drawings, paintings, and historical photographs, with a 3D model of the site. This is a tremendously difficult task, as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, for example, due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the painting to a large 3D model, such as a partial reconstruction of a city, is huge. To address these issues,we develop a newcompact representation of complex 3D scenes. The 3D model of the scene is represented by a small set of discriminative visual elements that are automatically learned from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learned in a discriminative fashion. We show that the learned visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, historical photograph) and structural changes (e.g., missing scene parts, large occluders) of the scene.We demonstrate an application of the proposed approach to automatic rephotography to find an approximate viewpoint of historical paintings and photographs with respect to a 3D model of the site. The proposed alignment procedure is validated via a human user study on a new database of paintings and sketches spanning several sites. The results demonstrate that our algorithm produces significantly better alignments than several baseline methods. © 2014 ACM 0730-0301/2014/03- ART14 $15.00.


Bach F.,CNRS ENS Informatics Department
SIAM Journal on Optimization | Year: 2015

Given a convex optimization problem and its dual, there are many possible first-order algorithms. In this paper, we show the equivalence between mirror descent algorithms and algorithms generalizing the conditional gradient method. This is done through convex duality and implies notably that for certain problems, such as for supervised machine learning problems with nonsmooth losses or problems regularized by nonsmooth regularizers, the primal subgradient method and the dual conditional gradient method are formally equivalent. The dual interpretation leads to a form of line search for mirror descent, as well as guarantees of convergence for primal-dual certificates. © 2015 Society for Industrial and Applied Mathematics.


Oquab M.,French Institute for Research in Computer Science and Automation | Oquab M.,CNRS ENS Informatics Department | Bottou L.,MSR | Laptev I.,French Institute for Research in Computer Science and Automation | And 3 more authors.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | Year: 2014

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large- scale visual recognition challenge (ILSVRC2012). The suc- cess of CNNs is attributed to their ability to learn rich mid- level image representations as opposed to hand-designed low-level features used in other image classification meth- ods. Learning CNNs, however, amounts to estimating mil- lions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be effi- ciently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred rep- resentation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization. © 2014 IEEE.

Loading CNRS ENS Informatics Department collaborators
Loading CNRS ENS Informatics Department collaborators