Chhangawala S.,New York Medical College |
Rudy G.,Golden Helix Inc. |
Mason C.E.,New York Medical College |
Mason C.E.,Rutgers Cancer Institute of New Jersey |
And 2 more authors.
The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. Results: We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application. Conclusions: A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study. © 2015 Chhangawala et al. Source
Lambert C.G.,Golden Helix Inc. |
Black L.J.,Montana State University |
Black L.J.,Greer Black Company
Many public and private genome-wide association studies that we have analyzed include flaws in design, with avoidable confounding appearing as a norm rather than the exception. Rather than recognizing flawed research design and addressing that, a category of quality-control statistical methods has arisen to treat only the symptoms. Reflecting more deeply, we examine elements of current genomic research in light of the traditional scientific method and find that hypotheses are often detached from data collection, experimental design, and causal theories. Association studies independent of causal theories, along with multiple testing errors, too often drive health care and public policy decisions. In an era of large-scale biological research, we ask questions about the role of statistical analyses in advancing coherent theories of diseases and their mechanisms. We advocate for reinterpretation of the scientific method in the context of large-scale data analysis opportunities and for renewed appreciation of falsifiable hypotheses, so that we can learn more from our best mistakes. © 2012 The Author. Source
Golden Helix Inc. | Date: 2013-04-09
Computer software for statistical analysis and visualization of data, and user manuals sold therewith.
Golden Helix Inc. | Date: 2012-06-09
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase II | Award Amount: 750.00K | Year: 2003
Not Available DESCRIPTION (provided by applicant): The development of a software system is proposed that will combine statistical theory, computer science algorithms, and genetics expertise to take advantage of the great influx of data generated by the study of the human genome, clinical trials data and the creation of inexpensive genotyping techniques. This software will elucidate the complex relationship between drug efficacy and side effects, multiple interacting genes and environmental factors. Our Phase I results show it is feasible to link phenotype to genotype for a list of "candidate" genes. A novel haplotype trend test has been developed to aid in finding associations across large SNP maps. Commercialization of this technique is essential for companies that intend to use large public or private SNP maps to locate genes that are associated with disease and drug safety and efficacy. Our statistical methods are expected to be successful even if the disease mechanism can differ from one person to another. By analyzing and interpreting clinical trial data, the software will match drugs to target populations according to their specific genotype. This will enable pharmaceutical companies to create novel drugs that render maximum effectiveness and have minimum side effects, i.e. the right drug for the right person.