Le Touquet – Paris-Plage, France
Le Touquet – Paris-Plage, France

Time filter

Source Type

Olteanu M.,SAMM | Villa-Vialaneix N.,SAMM | Villa-Vialaneix N.,French National Institute for Agricultural Research
Neurocomputing | Year: 2015

In some applications and in order to address real-world situations better, data may be more complex than simple numerical vectors. In some examples, data can be known only through their pairwise dissimilarities or through multiple dissimilarities, each of them describing a particular feature of the data set. Several variants of the Self-Organizing Map (SOM) algorithm were introduced to generalize the original algorithm to the framework of dissimilarity data. Whereas median SOM is based on a rough representation of the prototypes, relational SOM allows representing these prototypes by a virtual linear combination of all elements in the data set, referring to a pseudo-Euclidean framework. In the present article, an on-line version of relational SOM is introduced and studied. Similar to the situation in the Euclidean framework, this on-line algorithm provides a better organization and is much less sensible to prototype initialization than standard (batch) relational SOM. In a more general case, this stochastic version allows us to integrate an additional stochastic gradient descent step in the algorithm which can tune the respective weights of several dissimilarities in an optimal way: the resulting multiple relational SOM thus has the ability to integrate several sources of data of different types, or to make a consensus between several dissimilarities describing the same data. The algorithms introduced in this paper are tested on several data sets, including categorical data and graphs. On-line relational SOM is currently available in the R package SOMbrero that can be downloaded at http://sombrero.r-forge.r-project.org/ or directly tested on its Web User Interface at http://shiny.nathalievilla.org/sombrero. © 2014 Elsevier B.V.

Massoni S.,Paris-Sorbonne University | Olteanu M.,SAMM | Villa-Vialaneix N.,SAMM | Villa-Vialaneix N.,French National Institute for Agricultural Research
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2013

Originally developed in bioinformatics, sequence analysis is being increasingly used in social sciences for the study of life-course processes. The methodology generally employed consists in computing dissimilarities between the trajectories and, if typologies are sought, in clustering the trajectories according to their similarities or dissemblances. The choice of an appropriate dissimilarity measure is a major issue when dealing with sequence analysis for life sequences. Several dissimilarities are available in the literature, but neither of them succeeds to become indisputable. In this paper, instead of deciding upon one dissimilarity measure, we propose to use an optimal convex combination of different dissimilarities. The optimality is automatically determined by the clustering procedure and is defined with respect to the within-class variance. © 2013 Springer-Verlag Berlin Heidelberg.

Karolak S.,University Paris - Sud | Nefau T.,University Paris - Sud | Bailly E.,University Paris - Sud | Solgadi A.,SAMM | Levi Y.,University Paris - Sud
Forensic Science International | Year: 2010

Illicit drugs consumption is actually an important public health concern that needs to be well defined to be managed. A new method, expressed as sewage epidemiology has been proposed by Daughton and developed by Zuccato. This method involves estimating the consumption from the measurement of drug residues in sewage. Several studies have been carried out, leading to an assessment of drugs consumption in some European countries. This work, carried out in Paris area (France) brings new data to this assessment and allows a comparison of cocaine and MDMA consumptions with European estimations.Four wastewater treatment plants (WWTPs) have been retained for the study, taking into account biological treatment, volume capacity, geographic location and social environment. Cocaine and its major metabolite benzoylecgonine (BZE), amphetamine, 3,4-methylenedioxymethamphetamine (MDMA) and buprenorphine were measured in raw water and WWTP effluent using HPLC-MS/MS after SPE extraction. Amphetamine was rarely detected. Cocaine and BZE were quantified at levels from 5 to 282ngL-1 and 15 to 849ngL-1, respectively. MDMA and buprenorphine concentrations remained under 20ngL-1. Cocaine consumption was estimated from cocaine or BZE concentrations measured in raw water and the results showed significant difference in drug taking during week or weekend. The estimated doses observed in this study are lower than those reported for others countries, especially Spain and Italy. MDMA consumption was estimated at lower levels than cocaine. © 2010 Elsevier Ireland Ltd.

Rynkiewicz J.,SAMM
Neurocomputing | Year: 2012

Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modeling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of an MLP. © 2012 Elsevier B.V.

Olteanu M.,SAMM | Rynkiewicz J.,SAMM
Neurocomputing | Year: 2011

The statistical properties of the likelihood ratio test statistic (LRTS) for mixture-of-expert models are addressed in this paper. This question is essential when estimating the number of experts in the model. Our purpose is to extend the existing results for simple mixture models (Liu and Shao, 2003 [8]) and mixtures of multilayer perceptrons (Olteanu and Rynkiewicz, 2008 [9]). In this paper we first study a simple example which embodies all the difficulties arising in such models. We find that in the most general case the LRTS diverges but, with additional assumptions, the behavior of such models can be totally explicated. © 2011 Elsevier B.V.

Loading SAMM collaborators
Loading SAMM collaborators