French Argentine International Center for Information and Systems science

Rosario del Tala, Argentina

French Argentine International Center for Information and Systems science

Rosario del Tala, Argentina
SEARCH FILTERS
Time filter
Source Type

Crossa J.,Biometrics and Statistics Unit | Perez P.,Colegio de Mexico | Hickey J.,Biometrics and Statistics Unit | Hickey J.,University of New England of Australia | And 9 more authors.
Heredity | Year: 2014

Genomic selection (GS) has been implemented in animal and plant species, and is regarded as a useful tool for accelerating genetic gains. Varying levels of genomic prediction accuracy have been obtained in plants, depending on the prediction problem assessed and on several other factors, such as trait heritability, the relationship between the individuals to be predicted and those used to train the models for prediction, number of markers, sample size and genotype × environment interaction (GE). The main objective of this article is to describe the results of genomic prediction in International Maize and Wheat Improvement Center's (CIMMYT's) maize and wheat breeding programs, from the initial assessment of the predictive ability of different models using pedigree and marker information to the present, when methods for implementing GS in practical global maize and wheat breeding programs are being studied and investigated. Results show that pedigree (population structure) accounts for a sizeable proportion of the prediction accuracy when a global population is the prediction problem to be assessed. However, when the prediction uses unrelated populations to train the prediction equations, prediction accuracy becomes negligible. When genomic prediction includes modeling GE, an increase in prediction accuracy can be achieved by borrowing information from correlated environments. Several questions on how to incorporate GS into CIMMYT's maize and wheat programs remain unanswered and subject to further investigation, for example, prediction within and between related bi-parental crosses. Further research on the quantification of breeding value components for GS in plant breeding populations is required. © 2014 Macmillan Publishers Limited All rights reserved.


Ornella L.,French Argentine International Center for Information and Systems science | Perez P.,Colegio de Mexico | Tapia E.,French Argentine International Center for Information and Systems science | Gonzalez-Camacho J.M.,Colegio de Mexico | And 9 more authors.
Heredity | Year: 2014

Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait-environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets) and the best RE in the same 13 data sets, with values ranging from 0.393 to 0.948 (statistically significant in 12 data sets). RR produced the best mean for both κ and RE in one data set (0.148 and 0.381, respectively). Regarding the wheat data sets, SVC-lin presented the best κ in 12 of the 16 data sets, with outcomes ranging from 0.280 to 0.580 (statistically significant in 4 data sets) and the best RE in 9 data sets ranging from 0.484 to 0.821 (statistically significant in 5 data sets). SVC-rbf (0.235), RR (0.265) and RHKS (0.422) gave the best κ in one data set each, while RHKS and BL tied for the last one (0.234). Finally, BL presented the best RE in two data sets (0.738 and 0.750), RFR (0.636) and SVC-rbf (0.617) in one and RHKS in the remaining three (0.502, 0.458 and 0.586). The difference between the performance of SVC-lin and that of the rest of the models was not so pronounced at higher percentiles of the distribution. The behaviour of regression and classification algorithms varied markedly when selection was done at different thresholds, that is, κ and RE for each algorithm depended strongly on the selection percentile. Based on the results, we propose classification method as a promising alternative for GS in plant breeding. © 2014 Macmillan Publishers Limited All rights reserved.


Baya A.E.,French Argentine International Center for Information and Systems science | Granitto P.M.,French Argentine International Center for Information and Systems science
IEEE/ACM Transactions on Computational Biology and Bioinformatics | Year: 2013

Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the gap statistic. The resulting method is able to find the right number of arbitrary-shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression data sets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data. © 2004-2012 IEEE.


Castro R.,University of Buenos Aires | Castro R.,CONICET | Kofman E.,French Argentine International Center for Information and Systems science | Kofman E.,National University of Rosario
Simulation | Year: 2015

In this work we generalize the concept of activity of continuous time signals. We define the activity of order n of a signal and show that it allows us to estimate the number of sections of polynomials up to order n which are needed to represent that signal with a certain accuracy. Then we apply this concept to obtain a lower bound for the number of steps performed by quantization-based integration algorithms in the simulation of ordinary differential equations.We perform an exhaustive analysis over two examples, computing the activity of order n and comparing it with the number of steps performed by different integration methods. This analysis corroborates the theoretical predictions and also allows us to measure the suitability of the different algorithms depending on how close to the theoretical lower bound they perform. © 2015 The Author(s).


Ornella L.,French Argentine International Center for Information and Systems science | Sukhwinder-Singh,Maize and Wheat Improvement Center | Perez P.,Colegio de Mexico | Burgueno J.,Maize and Wheat Improvement Center | And 7 more authors.
Plant Genome | Year: 2012

Durable resistance to the rust diseases of wheat (Triticum aestivum L.) can be achieved by developing lines that have racenonspecific adult plant resistance conferred by multiple minor slow-rusting genes. Genomic selection (GS) is a promising tool for accumulating favorable alleles of slow-rusting genes. In this study, five CIMMYT wheat populations evaluated for resistance were used to predict resistance to stem rust (Puccinia graminis) and yellow rust (Puccinia striiformis) using Bayesian least absolute shrinkage and selection operator (LASSO) (BL), ridge regression (RR), and s upport vector regression with linear or radial basis function kernel models. All parents and populations were genotyped using 1400 Diversity Arrays Technology markers and different prediction problems were assessed. Results show that prediction ability for yellow rust was lower than for stem rust, probably due to differences in the conditions of infection of both diseases. For within population and environment, the correlation between predicted and observed values (Pearson's correlation [ρ]) was greater than 0.50 in 90% of the evaluations whereas for yellow rust, ρ ranged from 0.0637 to 0.6253. The BL and RR models have similar prediction ability, with a slight superiority of the BL confirming reports about the additive nature of rust resistance. When making predictions between environments and/or between populations, including information from another environment or environments or another population or populations improved prediction. © Crop Science Society of America.


Leale G.,I-Systems | Baya A.,French Argentine International Center for Information and Systems science | Milone D.H.,I-Systems | Granitto P.,French Argentine International Center for Information and Systems science | Stegmayer G.,I-Systems
IEEE/ACM Transactions on Computational Biology and Bioinformatics | Year: 2016

Characterizing genes with semantic information is an important process regarding the description of gene products. In spite that complete genomes of many organisms have been already sequenced, the biological functions of all of their genes are still unknown. Since experimentally studying the functions of those genes, one by one, would be unfeasible, new computational methods for gene functions inference are needed. We present here a novel computational approach for inferring biological function for a set of genes with previously unknown function, given a set of genes with well-known information. This approach is based on the premise that genes with similar behaviour should be grouped together. This is known as the guilt-by-association principle. Thus, it is possible to take advantage of clustering techniques to obtain groups of unknown genes that are co-clustered with genes that have well-known semantic information (GO annotations). Meaningful knowledge to infer unknown semantic information can therefore be provided by these well-known genes. We provide a method to explore the potential function of new genes according to those currently annotated. The results obtained indicate that the proposed approach could be a useful and effective tool when used by biologists to guide the inference of biological functions for recently discovered genes. Our work sets an important landmark in the field of identifying unknown gene functions through clustering, using an external source of biological input. A simple web interface to this proposal can be found at http://fich.unl.edu.ar/sinc/webdemo/gamma-am/. © 2016 IEEE.


Larese M.G.,French Argentine International Center for Information and Systems science | Larese M.G.,CONICET | Larese M.G.,Instituto Nacional de Tecnologia Agropecuaria | Craviotto R.M.,Instituto Nacional de Tecnologia Agropecuaria | And 4 more authors.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | Year: 2012

In this paper we propose an automatic algorithm able to classify legume leaf images considering only the leaf venation patterns (leaf shape, color and texture are excluded). This method processes leaf images captured with a standard scanner and segments the veins using the Unconstrained Hit-or-Miss Transform (UHMT) and adaptive thresholding. We measure several morphological features on the veins and classify them using Random forests. We applied the process to recognize several legumes (soybean, white bean and red bean). We analyze the importance of the features and select a small set which is relevant for the recognition task. Our automatic procedure outperforms the expert manual classification. © 2012 Springer-Verlag.


Caruso N.,French Argentine International Center for Information and Systems science | Portapila M.,French Argentine International Center for Information and Systems science | Power H.,University of Nottingham
Engineering Analysis with Boundary Elements | Year: 2016

In this work we present an improvement of the Localized Regular Dual Reciprocity Method (LRDRM). LRDRM is an integral domain decomposition method with two distinguishing features, the boundary conditions are imposed at the local interpolation level and all the calculated integrals are regular. In this work we present an enhancement of this method where the interpolation functions themselves satisfy the partial differential equation to be solved. Results for 1D and 2D convection-diffusion, 2D Helmholtz and 2D Poisson equations are presented, attaining accuracies two to three orders of magnitude higher than the original version of the LRDRM. © 2015 Elsevier Ltd. All rights reserved.


PubMed | French Argentine International Center for Information and Systems science
Type: Journal Article | Journal: IEEE/ACM transactions on computational biology and bioinformatics | Year: 2013

Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the gap statistic. The resulting method is able to find the right number of arbitrary-shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression data sets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.


PubMed | French Argentine International Center for Information and Systems science
Type: Journal Article | Journal: IEEE transactions on neural networks | Year: 2011

Many learning problems may vary slowly over time: in particular, some critical real-world applications. When facing this problem, it is desirable that the learning method could find the correct input-output function and also detect the change in the concept and adapt to it. We introduce the time-adaptive support vector machine (TA-SVM), which is a new method for generating adaptive classifiers, capable of learning concepts that change with time. The basic idea of TA-SVM is to use a sequence of classifiers, each one appropriate for a small time window but, in contrast to other proposals, learning all the hyperplanes in a global way. We show that the addition of a new term in the cost function of the set of SVMs (that penalizes the diversity between consecutive classifiers) produces a coupling of the sequence that allows TA-SVM to learn as a single adaptive classifier. We evaluate different aspects of the method using appropriate drifting problems. In particular, we analyze the regularizing effect of changing the number of classifiers in the sequence or adapting the strength of the coupling. A comparison with other methods in several problems, including the well-known STAGGER dataset and the real-world electricity pricing domain, shows the good performance of TA-SVM in all tested situations.

Loading French Argentine International Center for Information and Systems science collaborators
Loading French Argentine International Center for Information and Systems science collaborators