Crossa J.,Biometrics and Statistics Unit
Current Genomics | Year: 2012
Historically in plant breeding a large number of statistical models has been developed and used for studying genotype × environment interaction. These models have helped plant breeders to assess the stability of economically important traits and to predict the performance of newly developed genotypes evaluated under varying environmental conditions. In the last decade, the use of relatively low numbers of markers has facilitated the mapping of chromosome regions associated with phenotypic variability (e.g., QTL mapping) and, to a lesser extent, revealed the differetial response of these chromosome regions across environments (i.e., QTL × environment interaction). QTL technology has been useful for marker-assisted selection of simple traits; however, it has not been efficient for predicting complex traits affected by a large number of loci. Recently the appearance of cheap, abundant markers has made it possible to saturate the genome with high density markers and use marker information to predict genomic breeding values, thus increasing the precision of genetic value prediction over that achieved with the traditional use of pedigree information. Genomic data also allow assessing chromosome regions through marker effects and studying the pattern of covariablity of marker effects across differential environmental conditions. In this review, we outline the most important models for assessing genotype × environment interaction, QTL × environment interaction, and marker effect (gene) × environment interaction. Since analyzing genetic and genomic data is one of the most challenging statistical problems researchers currently face, different models from different areas of statistical research must be attempted in order to make significant progress in understanding genetic effects and their interaction with environment. © 2012 Bentham Science Publishers.
Federer W.T.,Cornell University |
Crossa J.,Biometrics and Statistics Unit
Frontiers in Physiology | Year: 2012
Crop breeding programs using conventional approaches, as well as new biotechnological tools, rely heavily on data resulting from the evaluation of genotypes in different environmental conditions (agronomic practices, locations, and years). Statistical methods used for designing field and laboratory trials and for analyzing the data originating from those trials need to be accurate and efficient. The statistical analysis of multi-environment trails (MET) is useful for assessing genotype x environment interaction (GEI), mapping quantitative trait loci (QTLs), and studying QTL x environment interaction (QEI). Large populations are required for scientific study of QEI, and for determining the association between molecular markers and quantitative trait variability. Therefore, appropriate control of local variability through efficient experimental design is of key importance. In this chapter we present and explain several classes of augmented designs useful for achieving control of variability and assessing genotype effects in a practical and efficient manner. A popular procedure for unreplicated designs is the one known as "systematically spaced checks." Augmented designs contain "c" check or standard treatments replicated "r" times, and "n" new treatments or genotypes included once (usually) in the experiment. © 2012 Federer and Crossa.
Daetwyler H.D.,Australian Department of Primary Industries and Fisheries |
Calus M.P.L.,Wageningen University |
Pong-Wong R.,Roslin Institute |
de los Campos G.,University of Alabama at Birmingham |
And 2 more authors.
Genetics | Year: 2013
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals. © 2013 by the Genetics Society of America.
Cleveland M.A.,Genus plc |
Hickey J.M.,University of New England of Australia |
Hickey J.M.,Biometrics and Statistics Unit
Journal of Animal Science | Year: 2013
Genomic selection can be implemented in pig breeding at a reduced cost using genotype imputation. Accuracy of imputation and the impact on resulting genomic breeding values (gEBV) was investigated. High-density genotype data was available for 4,763 animals from a single pig line. Three low-density genotype panels were constructed with SNP densities of 450 (L450), 3,071 (L3k) and 5,963 (L6k). Accuracy of imputation was determined using 184 test individuals with no genotyped descendants in the data but with parents and grandparents genotyped using the Illumina PorcineSNP60 Beadchip. Alternative genotyping scenarios were created in which parents, grandparents, and individuals that were not direct ancestors of test animals (Other) were genotyped at high density (S1), grandparents were not genotyped (S2), dams and granddams were not genotyped (S3), and dams and granddams were genotyped at low density (S4). Four additional scenarios were created by excluding Other animal genotypes. Test individuals were always genotyped at low density. Imputation was performed with AlphaImpute. Genomic breeding values were calculated using the single-step genomic evaluation. Test animals were evaluated for the information retained in the gEBV, calculated as the correlation between gEBV using imputed genotypes and gEBV using true genotypes. Accuracy of imputation was high for all scenarios but decreased with fewer SNP on the low-density panel (0.995 to 0.965 for S1) and with reduced genotyping of ancestors, where the largest changes were for L450 (0.965 in S1 to 0.914 in S3). Exclusion of genotypes for Other animals resulted in only small accuracy decreases. Imputation accuracy was not consistent across the genome. Information retained in the gEBV was related to genotyping scenario and thus to imputation accuracy. Reducing the number of SNP on the low-density panel reduced the information retained in the gEBV, with the largest decrease observed from L3k to L450. Excluding Other animal genotypes had little impact on imputation accuracy but caused large decreases in the information retained in the gEBV. These results indicate that accuracy of gEBV from imputed genotypes depends on the level of genotyping in close relatives and the size of the genotyped dataset. Fewer high-density genotyped individuals are needed to obtain accurate imputation than are needed to obtain accurate gEBV. Strategies to optimize development of low-density panels can improve both imputation and gEBV accuracy. © 2013 American Society of Animal Science. All rights reserved.
Hickey J.M.,University of New England of Australia |
Crossa J.,Biometrics and Statistics Unit |
Babu R.,International Maize and Wheat Improvement Center |
de los Campos G.,University of Alabama at Birmingham
Crop Science | Year: 2012
Genomic selection and association mapping offer great potential to increase rates of genetic progress in plants. The prediction of genomic breeding values usually requires that missing genotypes be imputed because a proportion of genotypes is usually uncalled by the genotyping algorithm, different individuals may be genotyped using different platforms, or low cost genotyping strategies can involve genotyping some individuals at high density and others at low density. The objective of this paper was to quantify the accuracy of imputation in a maize (Zea mays L.) data set and explore some of the factors that affect it. The factors studied were the density of the low-density platform, level of linkage disequilibrium, minor allele frequency of the marker being imputed, and degree of genetic relationship between the line being imputed and the training population. The accuracy of imputation was high even when only 8774 genotypes constitute the low-density platform. The correlation between the true and imputed genotypes was 0.87. However, there was a dramatic reduction in the accuracy of imputation when the low-density platforms had fewer than 8774 genotypes. Genetic relatedness between an individual having its genotypes imputed and the individuals genotyped with the high-density platform was important. The design of an information nucleus that incorporates imputation for the purposes of implementing genomic selection and association mapping in small independent breeding programs was discussed. © Crop Science Society of America.