Biometrics and Statistics Unit

Mexico City, Mexico

Biometrics and Statistics Unit

Mexico City, Mexico
Time filter
Source Type

Cleveland M.A.,Genus plc. | Hickey J.M.,University of New England of Australia | Hickey J.M.,Biometrics and Statistics Unit
Journal of Animal Science | Year: 2013

Genomic selection can be implemented in pig breeding at a reduced cost using genotype imputation. Accuracy of imputation and the impact on resulting genomic breeding values (gEBV) was investigated. High-density genotype data was available for 4,763 animals from a single pig line. Three low-density genotype panels were constructed with SNP densities of 450 (L450), 3,071 (L3k) and 5,963 (L6k). Accuracy of imputation was determined using 184 test individuals with no genotyped descendants in the data but with parents and grandparents genotyped using the Illumina PorcineSNP60 Beadchip. Alternative genotyping scenarios were created in which parents, grandparents, and individuals that were not direct ancestors of test animals (Other) were genotyped at high density (S1), grandparents were not genotyped (S2), dams and granddams were not genotyped (S3), and dams and granddams were genotyped at low density (S4). Four additional scenarios were created by excluding Other animal genotypes. Test individuals were always genotyped at low density. Imputation was performed with AlphaImpute. Genomic breeding values were calculated using the single-step genomic evaluation. Test animals were evaluated for the information retained in the gEBV, calculated as the correlation between gEBV using imputed genotypes and gEBV using true genotypes. Accuracy of imputation was high for all scenarios but decreased with fewer SNP on the low-density panel (0.995 to 0.965 for S1) and with reduced genotyping of ancestors, where the largest changes were for L450 (0.965 in S1 to 0.914 in S3). Exclusion of genotypes for Other animals resulted in only small accuracy decreases. Imputation accuracy was not consistent across the genome. Information retained in the gEBV was related to genotyping scenario and thus to imputation accuracy. Reducing the number of SNP on the low-density panel reduced the information retained in the gEBV, with the largest decrease observed from L3k to L450. Excluding Other animal genotypes had little impact on imputation accuracy but caused large decreases in the information retained in the gEBV. These results indicate that accuracy of gEBV from imputed genotypes depends on the level of genotyping in close relatives and the size of the genotyped dataset. Fewer high-density genotyped individuals are needed to obtain accurate imputation than are needed to obtain accurate gEBV. Strategies to optimize development of low-density panels can improve both imputation and gEBV accuracy. © 2013 American Society of Animal Science. All rights reserved.

Hickey J.M.,University of New England of Australia | Hickey J.M.,Biometrics and Statistics Unit | Kranis A.,Aviagen Ltd.
Genetics Selection Evolution | Year: 2013

AlphaImpute is a flexible and accurate genotype imputation tool that was originally designed for the imputation of genotypes on autosomal chromosomes. In some species, sex chromosomes comprise a large portion of the genome. For example, chromosome Z represents approximately 8% of the chicken genome and therefore is likely to be important in determining genetic variation in a population. When breeding programs make selection decisions based on genomic information, chromosomes that are not represented on the genotyping platform will not be subject to selection. Therefore imputation algorithms should be able to impute genotypes for all chromosomes. The objective of this research was to extend AlphaImpute so that it could impute genotypes on sex chromosomes. The accuracy of imputation was assessed using different genotyping strategies in a real commercial chicken population. The correlation between true and imputed genotypes was high in all the scenarios and was 0.96 for the most favourable scenario. Overall, the accuracy of imputation of the sex chromosome was slightly lower than that of autosomes for all scenarios considered. © 2013Hickey and Kranis; licensee BioMed Central Ltd.

Hickey J.M.,University of New England of Australia | Crossa J.,Biometrics and Statistics Unit | Babu R.,International Maize and Wheat Improvement Center | de los Campos G.,University of Alabama at Birmingham
Crop Science | Year: 2012

Genomic selection and association mapping offer great potential to increase rates of genetic progress in plants. The prediction of genomic breeding values usually requires that missing genotypes be imputed because a proportion of genotypes is usually uncalled by the genotyping algorithm, different individuals may be genotyped using different platforms, or low cost genotyping strategies can involve genotyping some individuals at high density and others at low density. The objective of this paper was to quantify the accuracy of imputation in a maize (Zea mays L.) data set and explore some of the factors that affect it. The factors studied were the density of the low-density platform, level of linkage disequilibrium, minor allele frequency of the marker being imputed, and degree of genetic relationship between the line being imputed and the training population. The accuracy of imputation was high even when only 8774 genotypes constitute the low-density platform. The correlation between the true and imputed genotypes was 0.87. However, there was a dramatic reduction in the accuracy of imputation when the low-density platforms had fewer than 8774 genotypes. Genetic relatedness between an individual having its genotypes imputed and the individuals genotyped with the high-density platform was important. The design of an information nucleus that incorporates imputation for the purposes of implementing genomic selection and association mapping in small independent breeding programs was discussed. © Crop Science Society of America.

Daetwyler H.D.,Australian Department of Primary Industries and Fisheries | Calus M.P.L.,Wageningen University | Pong-Wong R.,Roslin Institute | de los Campos G.,University of Alabama at Birmingham | And 2 more authors.
Genetics | Year: 2013

The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals. © 2013 by the Genetics Society of America.

Federer W.T.,Cornell University | Crossa J.,Biometrics and Statistics Unit
Frontiers in Physiology | Year: 2012

Crop breeding programs using conventional approaches, as well as new biotechnological tools, rely heavily on data resulting from the evaluation of genotypes in different environmental conditions (agronomic practices, locations, and years). Statistical methods used for designing field and laboratory trials and for analyzing the data originating from those trials need to be accurate and efficient. The statistical analysis of multi-environment trails (MET) is useful for assessing genotype x environment interaction (GEI), mapping quantitative trait loci (QTLs), and studying QTL x environment interaction (QEI). Large populations are required for scientific study of QEI, and for determining the association between molecular markers and quantitative trait variability. Therefore, appropriate control of local variability through efficient experimental design is of key importance. In this chapter we present and explain several classes of augmented designs useful for achieving control of variability and assessing genotype effects in a practical and efficient manner. A popular procedure for unreplicated designs is the one known as "systematically spaced checks." Augmented designs contain "c" check or standard treatments replicated "r" times, and "n" new treatments or genotypes included once (usually) in the experiment. © 2012 Federer and Crossa.

Burgueno J.,Biometrics and Statistics Unit | de los Campos G.,University of Alabama at Birmingham | Weigel K.,University of Wisconsin - Madison | Crossa J.,Biometrics and Statistics Unit
Crop Science | Year: 2012

Genomic selection (GS) has become an important aid in plant and animal breeding. Multienvironment (multitrait) models allow borrowing of information across environments (traits), which could enhance prediction accuracy. This study presents multienvironment (multitrait) models for GS and compares the predictive accuracy of these models with: (i) multienvironment analysis without pedigree and marker information, and (ii) multienvironment pedigree or/and marker-based models. A statistical framework for incorporating pedigree and molecular marker information in models for multienvironment data is described and applied to data that originate from wheat (Triticum aestivum L.) multienvironment trials. Two prediction problems relevant to plant breeders are considered: (CV1) predicting the performance of untested genotypes ("newly" developed lines), and (CV2) predicting the performance of genotypes that have been evaluated in some environments but not in others. Results confirmed the superiority of models using both marker and pedigree information over those based on pedigree information only. Models with pedigree and/or markers had better predictive accuracy than simple linear mixed models that do not include either of these two sources of information. We concluded that the evaluation of such trials can benefit greatly from using multienvironment GS models. © Crop Science Society of America.

Wang D.,University of Nebraska - Lincoln | Salah El-Basyoni I.,University of Nebraska - Lincoln | Stephen Baenziger P.,University of Nebraska - Lincoln | Crossa J.,Biometrics and Statistics Unit | And 2 more authors.
Heredity | Year: 2012

Though epistasis has long been postulated to have a critical role in genetic regulation of important pathways as well as provide a major source of variation in the process of speciation, the importance of epistasis for genomic selection in the context of plant breeding is still being debated. In this paper, we report the results on the prediction of genetic values with epistatic effects for 280 accessions in the Nebraska Wheat Breeding Program using adaptive mixed least absolute shrinkage and selection operator (LASSO). The development of adaptive mixed LASSO, originally designed for association mapping, for the context of genomic selection is reported. The results show that adaptive mixed LASSO can be successfully applied to the prediction of genetic values while incorporating both marker main effects and epistatic effects. Especially, the prediction accuracy is substantially improved by the inclusion of two-locus epistatic effects (more than onefold in some cases as measured by cross-validation correlation coefficient), which is observed for multiple traits and planting locations. This points to significant potential in using non-additive genetic effects for genomic selection in crop breeding practices. © 2012 Macmillan Publishers Limited All rights reserved.

Montesinos-Lopez O.A.,University of Colima | Montesinos-Lopez A.,Research Center en Matematicas | Crossa J.,Biometrics and Statistics Unit | Eskridge K.,University of Nebraska - Lincoln
PLoS ONE | Year: 2012

Background: The group testing method has been proposed for the detection and estimation of genetically modified plants (adventitious presence of unwanted transgenic plants, AP). For binary response variables (presence or absence), group testing is efficient when the prevalence is low, so that estimation, detection, and sample size methods have been developed under the binomial model. However, when the event is rare (low prevalence <0.1), and testing occurs sequentially, inverse (negative) binomial pooled sampling may be preferred. Methodology/Principal Findings: This research proposes three sample size procedures (two computational and one analytic) for estimating prevalence using group testing under inverse (negative) binomial sampling. These methods provide the required number of positive pools (r m), given a pool size (k), for estimating the proportion of AP plants using the Dorfman model and inverse (negative) binomial sampling. We give real and simulated examples to show how to apply these methods and the proposed sample-size formula. The Monte Carlo method was used to study the coverage and level of assurance achieved by the proposed sample sizes. An R program to create other scenarios is given in Appendix S2. Conclusions: The three methods ensure precision in the estimated proportion of AP because they guarantee that the width (W) of the confidence interval (CI) will be equal to, or narrower than, the desired width (ω), with a probability of γ. With the Monte Carlo study we found that the computational Wald procedure (method 2) produces the more precise sample size (with coverage and assurance levels very close to nominal values) and that the samples size based on the Clopper-Pearson CI (method 1) is conservative (overestimates the sample size); the analytic Wald sample size method we developed (method 3) sometimes underestimated the optimum number of pools. © 2012 Montesinos-López et al.

Burgueno J.,Biometrics and Statistics Unit | Crossa J.,Biometrics and Statistics Unit | Cotes J.M.,Biometrics and Statistics Unit | Vicente F.S.,Biometrics and Statistics Unit | Das B.,National University of Colombia
Crop Science | Year: 2011

Fixed linear models have been used for describing genotype × environment interaction (GE). Previous attempts have been made to assess the predictive ability of some linear mixed models when GE components are treated as random effects and modeled by the factor analytic (FA) model. This study compares the predictive ability of linear mixed models when the GE is modeled by the FA model with that of simple linear mixed models when the GE is not modeled. A cross-validation scheme is used that randomly deletes some genotypes from sites; the values for these genotypes are then predicted by the different models and correlated with their observed values to assess model accuracy. A total of six multienvironment trials (one potato [Solanum tuberosum L.] trial, three maize [Zea mays L.] trials, and two wheat [Triticum aestivum L.] trials) with GE of varying complexity were used in the evaluation. Results show that for data sets with complex GE, modeling GE using the FA model improved the predictability of the model up to 6%. When GE is not complex, most models (with and without FA) gave high predictability, and models with FA did not seem to lose much predictive ability. Therefore, we concluded that modeling GE with the FA model is a good thing. © Crop Science Society of America.

Crossa J.,Biometrics and Statistics Unit
Current Genomics | Year: 2012

Historically in plant breeding a large number of statistical models has been developed and used for studying genotype × environment interaction. These models have helped plant breeders to assess the stability of economically important traits and to predict the performance of newly developed genotypes evaluated under varying environmental conditions. In the last decade, the use of relatively low numbers of markers has facilitated the mapping of chromosome regions associated with phenotypic variability (e.g., QTL mapping) and, to a lesser extent, revealed the differetial response of these chromosome regions across environments (i.e., QTL × environment interaction). QTL technology has been useful for marker-assisted selection of simple traits; however, it has not been efficient for predicting complex traits affected by a large number of loci. Recently the appearance of cheap, abundant markers has made it possible to saturate the genome with high density markers and use marker information to predict genomic breeding values, thus increasing the precision of genetic value prediction over that achieved with the traditional use of pedigree information. Genomic data also allow assessing chromosome regions through marker effects and studying the pattern of covariablity of marker effects across differential environmental conditions. In this review, we outline the most important models for assessing genotype × environment interaction, QTL × environment interaction, and marker effect (gene) × environment interaction. Since analyzing genetic and genomic data is one of the most challenging statistical problems researchers currently face, different models from different areas of statistical research must be attempted in order to make significant progress in understanding genetic effects and their interaction with environment. © 2012 Bentham Science Publishers.

Loading Biometrics and Statistics Unit collaborators
Loading Biometrics and Statistics Unit collaborators