The Broad Institute of MIT and Harvard

Cambridge, MA, United States

The Broad Institute of MIT and Harvard

Cambridge, MA, United States

The Eli and Edythe L. Broad Institute of MIT and Harvard , often referred to as the Broad Institute, is a biomedical and genomic research center located in Cambridge, Massachusetts, United States. The institute is independently governed and supported as a 501 nonprofit research organization under the name Broad Institute Inc., and is partners with Massachusetts Institute of Technology, Harvard University, and the five Harvard teaching hospitals. Wikipedia.

Time filter
Source Type

News Article | May 25, 2017

The Broad Institute of MIT and Harvard will release version 4 of the industry-leading Genome Analysis Toolkit under an open source software license. The software package, designated GATK4, contains new tools and rebuilt architecture. It is available currently as an alpha preview on the Broad Institute’s GATK website, with a beta release expected in mid-June. Broad engineers announced the upgrade, as well as the decision to release the tool as an open source product, at Bio-IT World today. The new version is built on a new architecture, allowing significant streamlining of individual tools and support for performance-enhancing technologies such as Apache Spark. This new framework brings improvements to parallelization, capitalizing on cloud deployment and making the process of analyzing vast amounts of genomic data easier, faster, and more efficient. “We wanted to remove traditional barriers of scale while offering the same high level of data quality our users expect,” said Eric Banks, Senior Director of Data Sciences and Data Engineering at Broad and a creator of the original GATK software package. “Thanks to the rapid adoption of cloud computing, researchers can finally do away with many of the infrastructure-related complications that have hampered progress, especially at smaller institutions and startups.” Today, more than 45,000 academic and commercial users worldwide rely on the GATK, running millions of analyses. The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. In addition to improving the performance of these established tools, GATK4 extends this scope of analysis to include copy number and structural variation, for both germline and somatic research applications. GATK4 will be released as a fully open source product, thanks in part to a collaboration between Broad Institute and Intel Corporation to advance high-performance analytics so researchers can study massive amounts of genomic data from diverse sources worldwide. At the Intel-Broad Center for Genomic Data Engineering, software engineers and researchers have spent the last several months building, optimizing, and widely sharing new tools and infrastructure to help scientists integrate and process genomic data. GATK4 has benefited from this collaboration, which has helped engineers optimize best practices in hardware and software for genome analytics to make it possible to combine and use research data sets that reside on private, public, and hybrid clouds. “Releasing GATK4 as open source was the obvious next step for our team,” said Geraldine Van der Auwera, Associate Director of Outreach and Communications within the Data Science and Data Engineering group at the Broad Institute. “We believe it’s the most effective way to support the community, and we hope it continues to grow, innovate, and help researchers make insights that are essential for future human health breakthroughs.” “It is critical for progress in biomedicine that the software we use for analysing the genomes of millions of people is robust and well understood,” said Ewan Birney, Director of EMBL-EBI and Chair of the Global Alliance for Genomics and Health (GA4GH). “Releasing GATK software with an open source license directly supports open innovation, data re-use and data re-analysis in the global biomedical community.” “The GATK tools are crucial for both germline and cancer analyses,” said Robert L. Grossman of the University of Chicago Department of Medicine and an expert in biomedical informatics. “Releasing GATK4 as an open source software package will increase adoption, and benefit the community.” “Open sourcing the GATK is a big deal for open genomics, and for open science in general,” said Jeremy Freeman, manager of computational biology at the Chan Zuckerberg Initiative (CZI). “Not only does it make this critical tool available to as broad as possible an audience for use, reuse, inspection, and contribution -- it provides a powerful example to the community for how an existing project can embrace open source.” “Open source code is a foundation of efficient biomedical research,” said Brad Chapman, a research scientist at the Harvard T.H. Chan School of Public Health. “It enables reproducibility, reuse and remixing by removing barriers for sharing and distributing analyses. The Broad Institute’s GATK team leads in the development of scalable, sensitive and specific variant calling algorithms, and open sourcing GATK4 will allow frameworks like Blue Collar Bioinformatics to make these methods broadly available to the scientific research community.” “Cloudera has always been a supporter and believer in the power of open source code,” said Tom White, data scientist at Cloudera and a member of the Apache Hadoop PMC. “We’ve been excited to contribute to the GATK codebase, to make it run smoothly on Apache Spark™ and Cloudera. This next phase of the GATK, powered by Spark and open source software, will expand access and improve collaboration among genomic data scientists.” “The open sourcing of GATK4 is a great step for genomics, allowing for scalability and performance gains to be openly available to the research, biotech and pharmaceutical communities,” said Jason Waxman, corporate vice president and general manager of Data Center Solutions at Intel. “GATK4, when run on Intel’s new reference architecture, can achieve a 5X speed-up compared to earlier versions of the software.” “We at Google are excited to see this new release,” said Ilia Tulchinsky, Google Cloud Healthcare Engineering Lead. “We’ve been collaborating with the Broad Institute for the past three years to enhance genomic processing on Google Cloud Platform. As a strong supporter for open source technology, we believe that making GATK available this way will facilitate its use by genomic scientists everywhere. As fellow collaborators with Intel, we particularly look forward to enabling researchers to run GATK4 on Google Cloud using the upcoming Intel Xeon processor Scalable family.” “The GATK is one of the most widely-utilized software packages in the life sciences, and our team has worked very productively with Broad to accelerate it for use on Azure,” said Geralyn Miller, Director, AI & Research, Microsoft. “This new model will greatly facilitate this effort going forward, and we are excited to continue and expand our efforts around GATK on Azure.” “With the open source launch of GATK4, there is an opportunity to create a global community that can collaborate together and advance the state of art in bioinformatics,” said Hong Tang, chief architect at Alibaba Cloud, the cloud computing arm of Alibaba Group. “We look forward to closely working with Broad Institute in bringing the cloud-based GATK service to genomics customers in China, as well as in ongoing GATK research and development.” In addition to offering GATK4 as an open source toolkit, Broad Institute will continue to offer user support, training, and outreach on its popular user support forum. GATK4, like many of the Broad Institute’s genome analysis tools, will be available through the Broad Institute’s cloud based analysis platform, FireCloud.

Hotamisligil G.S.,The Broad Institute of MIT and Harvard
Nature | Year: 2017

Proper regulation and management of energy, substrate diversity and quantity, as well as macromolecular synthesis and breakdown processes, are fundamental to cellular and organismal survival and are paramount to health. Cellular and multicellular organization are defended by the immune response, a robust and critical system through which self is distinguished from non-self, pathogenic signals are recognized and eliminated, and tissue homeostasis is safeguarded. Many layers of evolutionarily conserved interactions occur between immune response and metabolism. Proper maintenance of this delicate balance is crucial for health and has important implications for many pathological states such as obesity, diabetes, and other chronic non-communicable diseases. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

Agency: European Commission | Branch: H2020 | Program: CSA | Phase: INT-01-2015 | Award Amount: 2.42M | Year: 2016

The EU and the USA have highly-productive, immensely-innovative and excellence-driven research and innovation systems. Acknowledging the particular strengths of each landscape, a balanced transatlantic STI partnership of equals bears great potential and contributes to the ultimate goal of tackling societal challenges and boost economic competitiveness. International cooperation between power nodes results in a constant knowledge exchange and more efficient use of STI investment. BILAT USA 4.0s targeted transatlantic activities work towards: i) Strategic priority setting for EU-US cooperation through identifying emerging STI fields with a high benefit and added value from cooperation, thus providing evidence-based input for policy decision-making ii) Stronger interaction between EU and US researchers through thematic events promoting funding opportunities on both sides and thus strengthening the quality and quantity of partnerships between STI actors in EU MS/AC and the USA iii) Establishing optimal framework conditions through proposing concrete solutions for eliminating cooperation obstacles deriving from researchers and innovators feedback, thus, creating an environment that favors joint solutions for global challenges iv) Enhanced coordination and synergies between different policies through analyzing EU, MS/AC and US programmes and detection of duplications, thus, contributing to a greater coherence, joint ownership and resource efficiency v) Ensuring close synergies with calls launched in H2020 and their int. dimension through screening of US-targeted actions in H2020 and liaising with relevant (ERA) projects to guarantee a consistent information exchange Ensuring sustainability, project activities build on former and liaise with existing initiatives. Relevance and exploitation of project actions will be assured by a close coordination with the EC. The project will pursue a targeted communication connecting the diverse range of EU-US STI stakeholders.

Blainey P.C.,The Broad Institute of MIT and Harvard | Blainey P.C.,Massachusetts Institute of Technology
FEMS Microbiology Reviews | Year: 2013

Interest in the expanding catalog of uncultivated microorganisms, increasing recognition of heterogeneity among seemingly similar cells, and technological advances in whole-genome amplification and single-cell manipulation are driving considerable progress in single-cell genomics. Here, the spectrum of applications for single-cell genomics, key advances in the development of the field, and emerging methodology for single-cell genome sequencing are reviewed by example with attention to the diversity of approaches and their unique characteristics. Experimental strategies transcending specific methodologies are identified and organized as a road map for future studies in single-cell genomics of environmental microorganisms. Over the next decade, increasingly powerful tools for single-cell genome sequencing and analysis will play key roles in accessing the genomes of uncultivated organisms, determining the basis of microbial community functions, and fundamental aspects of microbial population biology. © 2013 Federation of European Microbiological Societies.

Wagner J.C.,The Broad Institute of MIT and Harvard
Nature methods | Year: 2014

Malaria is a major cause of global morbidity and mortality, and new strategies for treating and preventing this disease are needed. Here we show that the Streptococcus pyogenes Cas9 DNA endonuclease and single guide RNAs (sgRNAs) produced using T7 RNA polymerase (T7 RNAP) efficiently edit the Plasmodium falciparum genome. Targeting the genes encoding native knob-associated histidine-rich protein (kahrp) and erythrocyte binding antigen 175 (eba-175), we achieved high (≥ 50-100%) gene disruption frequencies within the usual time frame for generating transgenic parasites.

The inability to quantify large numbers of proteins in tissues and biofluids with high precision, sensitivity, and throughput is a major bottleneck in biomarker studies. We previously demonstrated that coupling immunoaffinity enrichment using anti-peptide antibodies (SISCAPA) to multiple reaction monitoring mass spectrometry (MRM-MS) produces Immunoprecipitation MRM-MS (immuno-MRM-MS) assays that can be multiplexed to quantify proteins in plasma with high sensitivity, specificity, and precision. Here we report the first systematic evaluation of the interlaboratory performance of multiplexed (8-plex) immuno-MRM-MS in three independent labs. A staged study was carried out in which the effect of each processing and analysis step on assay coefficient of variance, limit of detection, limit of quantification, and recovery was evaluated. Limits of detection were at or below 1 ng/ml for the assayed proteins in 30 μl of plasma. Assay reproducibility was acceptable for verification studies, with median intra- and interlaboratory coefficients of variance above the limit of quantification of 11% and <14%, respectively, for the entire immuno-MRM-MS assay process, including enzymatic digestion of plasma. Trypsin digestion and its requisite sample handling contributed the most to assay variability and reduced the recovery of target peptides from digested proteins. Using a stable isotope-labeled protein as an internal standard instead of stable isotope-labeled peptides to account for losses in the digestion process nearly doubled assay accuracy for this while improving assay precision 5%. Our results demonstrate that multiplexed immuno-MRM-MS can be made reproducible across independent laboratories and has the potential to be adopted widely for assaying proteins in matrices as complex as plasma.

Hyman S.E.,The Broad Institute of MIT and Harvard
Neuropsychopharmacology | Year: 2014

Despite high prevalence and enormous unmet medical need, the pharmaceutical industry has recently de-emphasized neuropsychiatric disorders as 'too difficult' a challenge to warrant major investment. Here I describe major obstacles to drug discovery and development including a lack of new molecular targets, shortcomings of current animal models, and the lack of biomarkers for clinical trials. My major focus, however, is on new technologies and scientific approaches to neuropsychiatric disorders that give promise for revitalizing therapeutics and may thus answer industry's concerns. © 2014 American College of Neuropsychopharmacology.

Hotamisligil G.S.,The Broad Institute of MIT and Harvard
Cell | Year: 2010

The endoplasmic reticulum (ER) is the major site in the cell for protein folding and trafficking and is central to many cellular functions. Failure of the ER's adaptive capacity results in activation of the unfolded protein response (UPR), which intersects with many different inflammatory and stress signaling pathways. These pathways are also critical in chronic metabolic diseases such as obesity, insulin resistance, and type 2 diabetes. The ER and related signaling networks are emerging as a potential site for the intersection of inflammation and metabolic disease. © 2010 Elsevier Inc.

Hyman S.E.,The Broad Institute of MIT and Harvard
Cell | Year: 2014

In the face of growing controversy about the utility of genetic mouse models of human disease, Rothwell et al. report on a shared mechanism by which two different neuroligin-3 mutations, associated with autism spectrum disorders in humans, produce an enhancement in motor learning. The open question is how much we can learn about human ills from such models. © 2014 Elsevier Inc.

Li H.,The Broad Institute of MIT and Harvard
Bioinformatics | Year: 2011

Summary: Tabix is the first generic tool that indexes position sorted files in TAB-delimited formats such as GFF, BED, PSL, SAM and SQL export, and quickly retrieves features overlapping specified regions. Tabix features include few seek function calls per query, data compression with gzip compatibility and direct FTP/HTTP access. Tabix is implemented as a free command-line tool as well as a library in C, Java, Perl and Python. It is particularly useful for manually examining local genomic features on the command line and enables genome viewers to support huge data files and remote custom tracks over networks. © The Author 2011. Published by Oxford University Press. All rights reserved.

Loading The Broad Institute of MIT and Harvard collaborators
Loading The Broad Institute of MIT and Harvard collaborators