Saccenti E.,University of Amsterdam |
Saccenti E.,Netherlands Bioinformatics Center |
Westerhuis J.A.,University of Amsterdam |
Smilde A.K.,University of Amsterdam |
And 4 more authors.
PLoS ONE | Year: 2011
One of the first steps in analyzing high-dimensional functional genomics data is an exploratory analysis of such data. Cluster Analysis and Principal Component Analysis are then usually the method of choice. Despite their versatility they also have a severe drawback: they do not always generate simple and interpretable solutions. On the basis of the observation that functional genomics data often contain both informative and non-informative variation, we propose a method that finds sets of variables containing informative variation. This informative variation is subsequently expressed in easily interpretable simplivariate components. We present a new implementation of the recently introduced simplivariate models. In this implementation, the informative variation is described by multiplicative models that can adequately represent the relations between functional genomics data. Both a simulated and two real-life metabolomics data sets show good performance of the method. © 2011 Saccenti et al.
Hageman J.A.,Biometris Applied Statistics |
Hageman J.A.,Center for BioSystems Genomics |
Malosetti M.,Biometris Applied Statistics |
van Eeuwijk F.A.,Biometris Applied Statistics |
van Eeuwijk F.A.,Center for BioSystems Genomics
Euphytica | Year: 2012
In this paper, we demonstrate the use of two-mode clustering for genotype by trait and genotype by environment data. In contrast to two separate (one mode) clusterings on genotypes or traits/environments, two-mode clustering simultaneously produces homogeneous groups of genotypes and traits/environments. For two-mode clustering, we first scan all two-mode cluster solutions with all possible numbers of clusters using k-means. After deciding on the final numbers of clusters, we continue with a two-mode clustering algorithm based on a genetic algorithm. This ensures optimal solutions even for large data sets. We discuss the application of two-mode clustering to multiple trait data stemming from genomic research on tomatoes as well as an application to multi-environment data on barley. © 2010 The Author(s).