Martens L.,Vlaams Institute for Biotechnology |
Martens L.,Ghent University |
Chambers M.,Vanderbilt University |
Sturm M.,University of Tübingen |
And 16 more authors.
Molecular and Cellular Proteomics | Year: 2011
Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology. © 2011 by The American Society for Biochemistry and Molecular Biology, Inc.
Vaisar T.,University of Washington |
Mayer P.,University of Washington |
Nilsson E.,INSILICOS |
Zhao X.-Q.,University of Washington |
And 2 more authors.
Clinica Chimica Acta | Year: 2010
Background: Alterations in protein composition and oxidative damage of high density lipoprotein (HDL) have been proposed to impair the cardioprotective properties of HDL. We tested whether relative levels of proteins in HDL2 could be used as biomarkers for coronary artery disease (CAD). Methods: Twenty control and eighteen CAD subjects matched for HDL-cholesterol, age, and sex were studied. HDL2 isolated from plasma was digested with trypsin and analyzed by high-resolution matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) and pattern recognition analysis. Results: Partial least squares discriminant analysis (PLS-DA) of mass spectra clearly differentiated CAD from control subjects with area under the receiver operating characteristic curve (ROCAUC) of 0.94. Targeted tandem mass spectrometric analysis of the model's significant features revealed that HDL2 of CAD subjects contained oxidized methionine residues of apolipoprotein A-I and elevated levels of apolipoprotein C-III. A proteomic signature composed of MALDI-MS signals from apoA-I, apoC-III, Lp(a) and apoC-I accurately classified CAD and control subjects (ROCAUC=0.82). Conclusions: HDL2 of CAD subjects carries a distinct protein cargo and that protein oxidation helps generate dysfunctional HDL. Moreover, models based on selected identified peptides in MALDI-TOF mass spectra of the HDL may have diagnostic potential. © 2010 Elsevier B.V.
Insilicos and University of Washington | Date: 2012-07-06
The invention provides methods of screening a mammalian subject to determine if the subject is at risk to develop or is suffering from, cardiovascular disease. In one embodiment, the method comprises detecting a measurable feature of at least two biomarkers in an EMT subfraction, or in a complex containing apoA-I or apoA-III isolated from a biological sample obtained from the subject, wherein the at least two biomarkers are selected from the group consisting of apoA-I, apoA-II, apoB-100, Lp(a), apoC-I, and apoC-III, combinations or portions and/or derivatives thereof, and comparing the measurable features of the at least two biomarkers from the biological sample to a reference standard, wherein a difference in the measurable features of the at least two biomarkers from the biological sample and the reference standard is indicative of the presence or risk of cardiovascular disease in the subject.
Pratt B.,INSILICOS |
Howbert J.J.,INSILICOS |
Tasman N.I.,INSILICOS |
Bioinformatics | Year: 2012
MR-Tandem adapts the popular X!Tandem peptide search engine to work with Hadoop MapReduce for reliable parallel execution of large searches. MR-Tandem runs on any Hadoop cluster but offers special support for Amazon Web Services for creating inexpensive on-demand Hadoop clusters, enabling search volumes that might not otherwise be feasible with the compute resources a researcher has at hand. MR-Tandem is designed to drop in wherever X!Tandem is already in use and requires no modification to existing X!Tandem parameter files, and only minimal modification to X!Tandem-based workflows. © The Author 2011. Published by Oxford University Press. All rights reserved.
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase I | Award Amount: 181.86K | Year: 2010
DESCRIPTION (provided by applicant): High-throughput Epistasis Screening using Genetical Genomics A fast software tool is proposed for identifying potential sets of interacting genes involved in human disease pathways. A meta-analysis of marker and expression-trait studies is performed using penalized regression software running in parallel on commodity graphics cards. The research team includes experts from genomics, statistics and software acceleration. Data will come from published studies. Initial results suggest promise for our approach. Epistasis is a key area of investigation in the elucidation of human- disease pathways. eQTL experiments have shown promise in identifying epistasis for given expression traits. We will leverage the success of eQTLs by employing the results of GWAS experiments to suggest specific expression traits to study. In this way we will exploit the findings of multiple, disparate studies in an overall meta-analysis of a disease trait. Various forms of regression analysis are currently used to screen eQTL data for epistasis, especially stepwise linear regression. We will employ penalized regression techniques, because of their speed advantage, their ability to identify multiple candidates simultaneously and their relative novelty. We will apply several distinct types of penalized regression, each with its own predictor-selection characteristics. We have strong in-house expertise in penalized regression. As more and larger genomic data sets become available, effective means for combining and mining them become essential. The sheer mass of the data, moreover, will require high-performance software in order to provide analysis in reasonable time. Parallel computation is one promising area for improving software performance. We will employ the new generation of inexpensive, widely-available graphics coprocessors to run our software in parallel. Successful application will demonstrate that relevant, large- data bioinformatics solutions can be implemented on modestly-priced desktop hardware. PUBLIC HEALTH RELEVANCE: Personalized medicine is based on the observation that susceptibility to disease has a strong genetic component. This genetic component consists of groups of highly interacting genes. We will develop high- speed software able to process the huge amounts of data needed to identify these interactions and the role they play in disease susceptibility.
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase I | Award Amount: 179.50K | Year: 2010
DESCRIPTION (provided by applicant): A enhancement to the popular peptide search engine X Tandem is proposed, to allow it to run on new super-scalable MapReduce computer clusters. This will allow much faster and less-expensive operation, allowing proteomics researchers to routinely search for post-translational modifications (PTMs). Proteomics has led to many important advances in biological understanding. Yet, many valuable data sets are not searched for PTMs, simply because the computer power necessary to conduct the searches is not available. With this project, we plan to substantially reduce the computational cost of proteomics experiments, via a peptide search engine operating on highly-scalable computer clusters. The research team is well-qualified to undertake this research, having extensive, direct experience in all the scientific disciplines and specific software elements necessary. The research team includes experts in proteomics, mass spectrometry, peptide search, cloud computing, and MapReduce. PUBLIC HEALTH RELEVANCE: High-throughput analysis of post-translational modifications is increasingly pivotal for understanding the molecular function and dynamics of living cells. This proposal thus addresses key opportunities for applying proteomics to human health research.
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase II | Award Amount: 751.57K | Year: 2010
DESCRIPTION (provided by applicant): The ability to conduct basic and applied biomedical research is becoming increasingly dependent on data produced by new and emerging technologies. This data has an unprecedented amount of detail and volume. Researchers are therefore dependent on computing and computational tools to be able to visualize, analyze, model, and interpret these large and complex sets of data. Tools for disease detection, diagnosis, treatment, and prevention are common goals of many, if not all, biomedical research programs. Sound analytical and statistical theory and methodology for class pre- diction and class discovery lay the foundation for building these tools, of which the machine learning techniques of classification (supervised learning) and clustering (unsupervised learning) are crucial. Our goal is to produce software for analysis and interpretation of large data sets using ensemble machine learning techniques and parallel computing technologies. Ensemble techniques are recent advances in machine learning theory and methodology leading to great improvements in accuracy and stability in data set analysis and interpretation. The results from a committee of primary machine learners (classifiers or clusterers) that have been trained on different instance or feature subsets are combined through techniques such as voting. The high prediction accuracy of classifier ensembles (such as boosting, bagging, and random forests) has generated much excitement in the statistics and machine learning communities. Recent research extends the ensemble methodology to clustering, where class information is unavailable, also yielding superior performance in terms of accuracy and stability. In theory, most ensemble techniques are inherently parallel. However, existing implementations are generally serial and assume the data set is memory resident. Therefore current software will not scale to the large data sets produced in today's biomedical research. We propose to take two approaches to scale ensemble techniques to large data sets: data partitioning approaches and parallel computing. The focus of Phase I will be to prototype scalable classifier ensembles using parallel architectures. We intend to: establish the parallel computing infrastructures; produce a preliminary architecture and software design; investigate a wide range of ensemble generation schemes using data partitioning strategies; and implement scalable bagging and random forests based on the preliminary design. The focus of Phase II will be to complete the software architecture and implement the scalable classifier ensembles and scalable clusterer ensembles within this framework. We intend to: complete research and development of classifier ensembles; extend the classification framework to clusterer ensembles; research and develop a unified interface for building ensembles with differing generation mechanisms and combination strategies; and evaluate the effectiveness of the software on simulated and real data. PUBLIC HEALTH RELEVANCE: The common goals to many, if not all, biomedical research programs are the development of tools for disease detection, diagnosis, treatment, and prevention. These programs often rely on new types of data that have an unprecedented amount of detail and volume. Our goal is to produce software for the analysis and interpretation of large data sets using ensemble machine learning techniques and parallel computing technologies to enable researchers who are dependent on computational tools to have the ability to visualize, analyze, model, and interpret these large and complex sets of data.
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase II | Award Amount: 1.20M | Year: 2010
DESCRIPTION (provided by applicant): Heart disease is the leading cause of death in the western societies. The plasma level of high density lipoprotein cholesterol (HDL-C), the good form of cholesterol, is negatively correlated with the risk of heart disease, myocardial infarction, and coronary death. However, current composite risk score tests such as the Framingham tests identify only 1/3rd of individuals at risk, and HDL-C contributes very little to these composite risk scores. Phase I studies demonstrated that the protein composition of HDL measured by MALDI-MS differs markedly between patients with established CAD and age- and sex-matched healthy subjects. Phase I also established that CAD subjects could be distinguished from age- and sex- matched healthy subjects with a high sensitivity and specificity using HDL protein signals. The goal of the proposed work is to demonstrate that the HDL protein signals can be used to improve the accuracy of composite MI risk scores. This goal will be met through experiments that measure banked samples and new samples collected from 400 subjects who will be monitored for adverse events during the period of the project. PUBLIC HEALTH RELEVANCE: Relevance to public health: Our overall hypothesis is that pattern recognition MS analysis of HDL will be a powerful tool for detecting people at risk for myocardial infarctions (Ml). Each year, more than a million Americans have an Ml, of whom half die. However, our ability to identify subjects at increased risk for these events is severely limited. Therefore, the availability of diagnostics that could accurately predict risk in time to ward off an MI would have an enormous impact on health care costs and public health.
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase I | Award Amount: 222.75K | Year: 2012
DESCRIPTION (provided by applicant): This Small Business Innovation Research project addresses the problem of biomarker detection in clinical and high-throughput data. The objective is to investigate new approaches for deter- mining, from data consisting of many possibly irrelevant or redundant measurements, a highly predictive and interpretable model that involves only a small number of measurements. These new methods will be studied for modeling subjects' time-to-event (such as stroke, heart attack, or metastasis in cancer). The proposed approaches will be compared with existing methods that attempt to use relatively few mea- surements in modeling survival (time-to-event) data. The data to be analyzed will include ion-mobility and clinical data from a large cardiovascular disease cohort, as well as high-throughput genomic data from cancer research with many more measurements than samples. Relevance. Although today's advanced technologies offer the possibility of revolutionizing clinical practice, the analytical tools available for extracting information from this amount of daa are not yet sufficiently developed for targeted exploration of the underlying biology. This project directly addresses the need to make what the FDA terms IVDMIA (In-Vitro DiagnosticMultivariate Index Assays) transparent and interpretable, and is thus an opportunity to improve analysis services or products provided to companies that identify, characterize, and validate biomarkers for clinical diagnostics and drug development decisionpoints. The proposed project will produce robust methods for parsimonious biomarker detection that will speed the development of cheaper and more effective diagnostic tests for disease diagnosis, treatment monitoring, and therapeutic drug development.PUBLIC HEALTH RELEVANCE: There is a great need in medical research for prognostic models that can accurately predict time to an event, such as a heart attack, from a few observed features. These models can be used in establishing new diagnostic and screening tests, and in advancing new therapies. New methods for time-to-event modeling are proposed that will speed the development of cheaper and more effective clinical support systems, and have a far-reaching impact on public health.
Agency: Department of Health and Human Services | Branch: | Program: SBIR | Phase: Phase I | Award Amount: 203.60K | Year: 2011
DESCRIPTION (provided by applicant): The use of Graphics Processing Unit (GPU) devices is proposed to increase the speed of peptide search by as much as an order of magnitude. Peptide search engines perform the most computation-intensive task in shotgun proteomics. This proposal is to speed up peptide search using the increasingly popular approach of general-purpose computation on highly-parallel GPU devices. Computation using GPUs to gain desktop supercomputer performance at low cost is an active and fruitful area of research. This project will obtain these benefits for peptide search. As a result, researchers will be able to take experiments that today must be analyzed on a computer cluster, and instead analyze them on their desktop computer. PUBLIC HEALTH RELEVANCE: A enhancement to the popular peptide search engine X Tandem is proposed, to allow it to run in parallel on graphics processing unit (GPU) cards with dramatically faster performance. This will allow much faster and less-expensive operation, allowing proteomics researchers to analyze large proteomics experiments or search for post- translational modifications (PTMs) using their existing desktop computer. Proteomics has led to many important advances in biological understanding. Yet, manyvaluable data sets are not searched for PTMs, simply because the computer power necessary to conduct the searches is not available. With this project, we plan to substantially reduce the computational cost of proteomics experiments, via a peptide search engine making use of inexpensive, highly parallel GPU hardware. Thus, this project, if successful, will allow proteomics to be more successfully exploited for public health. The research team is well-qualified to undertake this research, having extensive, direct experience in all the scientific disciplines and specific software elements necessary. The research team includes experts in proteomics, mass spectrometry, peptide search, and GPU computing.