PubMed | Biomedical Informatics Training Program.
Type: | Journal: Bioinformatics (Oxford, England) | Year: 2016
Microarray measurements of gene expression constitute a large fraction of publicly shared biological data, and are available in the Gene Expression Omnibus (GEO). Many studies use GEO data to shape hypotheses and improve statistical power. Within GEO, the Affymetrix HG-U133A and HG-U133 Plus 2.0 are the two most commonly used microarray platforms for human samples; the HG-U133 Plus 2.0 platform contains 54 220 probes and the HG-U133A array contains a proper subset (21 722 probes). When different platforms are involved, the subset of common genes is most easily compared. This approach results in the exclusion of substantial measured data and can limit downstream analysis. To predict the expression values for the genes unique to the HG-U133 Plus 2.0 platform, we constructed a series of gene expression inference models based on genes common to both platforms. Our model predicts gene expression values that are within the variability observed in controlled replicate studies and are highly correlated with measured data. Using six previously published studies, we also demonstrate the improved performance of the enlarged feature space generated by our model in downstream analysis.The gene inference model described in this paper is available as a R package (affyImpute), which can be downloaded at http://firstname.lastname@example.orgSupplementary information: Supplementary data are available at Bioinformatics online.
PubMed | Biomedical Informatics Training Program
Type: Comparative Study | Journal: Journal of evaluation in clinical practice | Year: 2010
The assessment of statistical significance of survivorship differences of model-predicted groups is an important step in survivorship studies. Some models determined to be significant using current methodologies are assumed to have predictive capabilities. These methods compare parameters from predicted classes, not random samples from homogenous populations, and they may be insensitive to prediction errors. Type I-like errors can result wherein models with high prediction error rates are accepted. We have developed and evaluated an alternate statistic for determining the significance of survivorship between or among model-derived survivorship classes.We propose and evaluate a new statistical test, the F* test, which incorporates parameters that reflect prediction errors that are unobserved by the current methods of evaluation.We found that the Log Rank test identified fewer failed models than the F* test. When both the tests were significant, we found a more accurate model. Using two prediction models applied to eight datasets, we found that the F* test gave a correct inference five out of eight times, whereas the Log Rank test only identified one model out of the eight correctly.Our empirical evaluation reveals that the hypothesis testing inferences derived using the F* test exhibit better parity with the accuracy of prediction models than the other options. The generalizable prediction accuracy of prediction models should be of paramount importance for model-based survivorship prediction studies.