Entity

Time filter

Source Type


Hayes B.J.,Australian Department of Primary Industries and Fisheries | Hayes B.J.,Dairy Futures Cooperative Research Center
Journal of Dairy Science | Year: 2011

Large numbers of dairy cattle are now routinely genotyped for dense single nucleotide polymorphism (SNP) arrays for the purpose of predicting genomic estimated breeding values. Such SNP arrays contain very good information for parentage assignment and pedigree reconstruction. The main challenge in using this information for parentage assignment and pedigree reconstruction is development of computationally efficient strategies that enable a candidate animal to be assigned its sire and dam with the large volume of data. Here we describe an efficient algorithm for parentage assignment with SNP data and demonstrate very accurate assignment with 50,000-SNP and 3,000-SNP panels. The computer code implementing the algorithm is given in the Appendix. © 2011 American Dairy Science Association. Source


Druet T.,University of Liege | Macleod I.M.,University of Melbourne | Hayes B.J.,La Trobe University | Hayes B.J.,Australian Department of Primary Industries and Fisheries | Hayes B.J.,Dairy Futures Cooperative Research Center
Heredity | Year: 2014

Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%. © 2014 Macmillan Publishers Limited All rights reserved. Source


Rodriguez-Ramilo S.T.,Instituto Nacional Of Investigacion Y Tecnologia Agraria Y Alimentaria Inia | Garcia-Cortes L.A.,Instituto Nacional Of Investigacion Y Tecnologia Agraria Y Alimentaria Inia | Gonzalez-Recio O.,Instituto Nacional Of Investigacion Y Tecnologia Agraria Y Alimentaria Inia | Gonzalez-Recio O.,Australian Department of Primary Industries and Fisheries | Gonzalez-Recio O.,Dairy Futures Cooperative Research Center
PLoS ONE | Year: 2014

Genome-enhanced genotypic evaluations are becoming popular in several livestock species. For this purpose, the combination of the pedigree-based relationship matrix with a genomic similarities matrix between individuals is a common approach. However, the weight placed on each matrix has been so far established with ad hoc procedures, without formal estimation thereof. In addition, when using marker- and pedigree-based relationship matrices together, the resulting combined relationship matrix needs to be adjusted to the same scale in reference to the base population. This study proposes a semi-parametric Bayesian method for combining marker- and pedigree-based information on genome-enabled predictions. A kernel matrix from a reproducing kernel Hilbert spaces regression model was used to combine genomic and genealogical information in a semi-parametric scenario, avoiding inversion and adjustment complications. In addition, the weights on marker- versus pedigree-based information were inferred from a Bayesian model with Markov chain Monte Carlo. The proposed method was assessed involving a large number of SNPs and a large reference population. Five phenotypes, including production and type traits of dairy cattle were evaluated. The reliability of the genome-based predictions was assessed using the correlation, regression coefficient and mean squared error between the predicted and observed values. The results indicated that when a larger weight was given to the pedigree-based relationship matrix the correlation coefficient was lower than in situations where more weight was given to genomic information. Importantly, the posterior means of the inferred weight were near the maximum of 1. The behavior of the regression coefficient and the mean squared error was similar to the performance of the correlation, that is, more weight to the genomic information provided a regression coefficient closer to one and a smaller mean squared error. Our results also indicated a greater accuracy of genomic predictions when using a large reference population. © 2014 Rodríguez-Ramilo et al. Source


Moser G.,University of Queensland | Lee S.H.,University of Queensland | Hayes B.J.,Australian Department of Primary Industries and Fisheries | Hayes B.J.,Dairy Futures Cooperative Research Center | And 4 more authors.
PLoS Genetics | Year: 2015

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches. © 2015 Moser et al. Source


Brondum R.F.,University of Aarhus | Su G.,University of Aarhus | Lund M.S.,University of Aarhus | Bowman P.J.,Australian Department of Primary Industries and Fisheries | And 5 more authors.
BMC Genomics | Year: 2012

Background: The accuracy of genomic prediction is highly dependent on the size of the reference population. For small populations, including information from other populations could improve this accuracy. The usual strategy is to pool data from different populations; however, this has not proven as successful as hoped for with distantly related breeds. BayesRS is a novel approach to share information across populations for genomic predictions. The approach allows information to be captured even where the phase of SNP alleles and casuative mutation alleles are reversed across populations, or the actual casuative mutation is different between the populations but affects the same gene. Proportions of a four-distribution mixture for SNP effects in segments of fixed size along the genome are derived from one population and set as location specific prior proportions of distributions of SNP effects for the target population. The model was tested using dairy cattle populations of different breeds: 540 Australian Jersey bulls, 2297 Australian Holstein bulls and 5214 Nordic Holstein bulls. The traits studied were protein-, fat- and milk yield. Genotypic data was Illumina 777K SNPs, real or imputed.Results: Results showed an increase in accuracy of up to 3.5% for the Jersey population when using BayesRS with a prior derived from Australian Holstein compared to a model without location specific priors. The increase in accuracy was however lower than was achieved when reference populations were combined to estimate SNP effects, except in the case of fat yield. The small size of the Jersey validation set meant that these improvements in accuracy were not significant using a Hotelling-Williams t-test at the 5% level. An increase in accuracy of 1-2% for all traits was observed in the Australian Holstein population when using a prior derived from the Nordic Holstein population compared to using no prior information. These improvements were significant (P<0.05) using the Hotelling Williams t-test for protein- and fat yield.Conclusion: For some traits the method might be advantageous compared to pooling of reference data for distantly related populations, but further investigation is needed to confirm the results. For closely related populations the method does not perform better than pooling reference data. However, it does give an increased accuracy compared to analysis based on only one reference population, without an increased computational burden. The approach described here provides a general setup for inclusion of location specific priors: the approach could be used to include biological information in genomic predictions. © 2012 Brøndum et al.; licensee BioMed Central Ltd. Source

Discover hidden collaborations