i2b2 National Center for Biomedical Computing

Boston, MA, United States

i2b2 National Center for Biomedical Computing

Boston, MA, United States
SEARCH FILTERS
Time filter
Source Type

Jung J.-Y.,Harvard University | Kohane I.S.,Harvard University | Kohane I.S.,Childrens Hospital | Kohane I.S.,i2b2 National Center for Biomedical Computing | And 2 more authors.
Translational Psychiatry | Year: 2011

The role of the immune system in neuropsychiatric diseases, including autism spectrum disorder (ASD), has long been hypothesized. This hypothesis has mainly been supported by family cohort studies and the immunological abnormalities found in ASD patients, but had limited findings in genetic association testing. Two cross-disorder genetic association tests were performed on the genome-wide data sets of ASD and six autoimmune disorders. In the polygenic score test, we examined whether ASD risk alleles with low effect sizes work collectively in specific autoimmune disorders and show significant association statistics. In the genetic variation score test, we tested whether allele-specific associations between ASD and autoimmune disorders can be found using nominally significant single-nucleotide polymorphisms. In both tests, we found that ASD is probabilistically linked to ankylosing spondylitis (AS) and multiple sclerosis (MS). Association coefficients showed that ASD and AS were positively associated, meaning that autism susceptibility alleles may have a similar collective effect in AS. The association coefficients were negative between ASD and MS. Significant associations between ASD and two autoimmune disorders were identified. This genetic association supports the idea that specific immunological abnormalities may underlie the etiology of autism, at least in a number of cases. © 2011 Macmillan Publishers Limited All rights reserved.


Jacobsen J.C.,Massachusetts General Hospital | Gregory G.C.,Massachusetts General Hospital | Woda J.M.,Massachusetts General Hospital | Woda J.M.,Athersys | And 11 more authors.
Human Molecular Genetics | Year: 2011

Huntington's disease is initiated by the expression of a CAG repeat-encoded polyglutamine region in full-length huntingtin, with dominant effects that vary continuously with CAG size. The mechanism could involve a simple gain of function or a more complex gain of function coupled to a loss of function (e.g. dominant negative-graded loss of function). To distinguish these alternatives, we compared genome-wide gene expression changes correlated with CAG size across an allelic series of heterozygous CAG knock-in mouse embryonic stem (ES) cell lines (HdhQ20/7, HdhQ50/7, HdhQ91/7, HdhQ111/7), to genes differentially expressed between Hdhex4/5/ex4/5 huntingtin null and wild-type (HdhQ7/7) parental ES cells. The set of 73 genes whose expression varied continuously with CAG length had minimal overlap with the 754-member huntingtin-null gene set but the two were not completely unconnected. Rather, the 172 CAG length-correlated pathways and 238 huntingtin-null significant pathways clustered into 13 shared categories at the network level. A closer examination of the energy metabolism and the lipid/sterol/lipoprotein metabolism categories revealed that CAG length-correlated genes and huntingtin-null-altered genes either were different members of the same pathways or were in unique, but interconnected pathways. Thus, varying the polyglutamine size in full-length huntingtin produced gene expression changes that were distinct from, but related to, the effects of lack of huntingtin. These findings support a simple gain-of-function mechanism acting through a property of the full-length huntingtin protein and point to CAG-correlative approaches to discover its effects. Moreover, for therapeutic strategies based on huntingtin suppression, our data high-light processes that may be more sensitive to the disease trigger than to decreased huntingtin levels. © The Author 2011. Published by Oxford University Press. All rights reserved.


Galkina E.I.,Massachusetts General Hospital | Shin A.,Massachusetts General Hospital | Coser K.R.,Massachusetts General Hospital | Shioda T.,Massachusetts General Hospital | And 8 more authors.
PLoS ONE | Year: 2014

Background: The length of the huntingtin (HTT) CAG repeat is strongly correlated with both age at onset of Huntington's disease (HD) symptoms and age at death of HD patients. Dichotomous analysis comparing HD to controls is widely used to study the effects of HTT CAG repeat expansion. However, a potentially more powerful approach is a continuous analysis strategy that takes advantage of all of the different CAG lengths, to capture effects that are expected to be critical to HD pathogenesis. Methodology/Principal Findings: We used continuous and dichotomous approaches to analyze microarray gene expression data from 107 human control and HD lymphoblastoid cell lines. Of all probes found to be significant in a continuous analysis by CAG length, only 21.4% were so identified by a dichotomous comparison of HD versus controls. Moreover, of probes significant by dichotomous analysis, only 33.2% were also significant in the continuous analysis. Simulations revealed that the dichotomous approach would require substantially more than 107 samples to either detect 80% of the CAG-length correlated changes revealed by continuous analysis or to reduce the rate of significant differences that are not CAG length-correlated to 20% (n = 133 or n = 206, respectively). Given the superior power of the continuous approach, we calculated the correlation structure between HTT CAG repeat lengths and gene expression levels and created a freely available searchable website, "HD CAGnome," that allows users to examine continuous relationships between HTT CAG and expression levels of ∼20,000 human genes. Conclusions/Significance: Our results reveal limitations of dichotomous approaches compared to the power of continuous analysis to study a disease where human genotype-phenotype relationships strongly support a role for a continuum of CAG length-dependent changes. The compendium of HTT CAG length-gene expression level relationships found at the HD CAGnome now provides convenient routes for discovery of candidates influenced by the HD mutation. © 2014 Galkina et al.


Lee J.-M.,Massachusetts General Hospital | Galkina E.I.,Massachusetts General Hospital | Levantovsky R.M.,Massachusetts General Hospital | Fossale E.,Massachusetts General Hospital | And 15 more authors.
Human Molecular Genetics | Year: 2013

In Huntington's disease (HD), the size of the expanded HTTCAGrepeat mutation is the primary driver of the processes that determine age at onset of motor symptoms. However, correlation of cellular biochemical parameters also extends across the normal repeat range, supporting the view that the CAG repeat represents a functional polymorphism with dominant effects determined by the longer allele. A central challenge to defining the functional consequences of this single polymorphism is the difficulty of distinguishing its subtle effects from the multitude of other sources of biological variation.We demonstrate that an analytical approach based upon continuous correlation with CAG size was able to capture the modest (̃21%) contribution of the repeat to the variation in genome-wide gene expression in 107 lymphoblastoid cell lines, with alleles ranging from 15 to 92CAGs. Furthermore, a mathematical model from an iterative strategy yielded predicted CAG repeat lengths that were significantly positively correlated with true CAG allele size and negatively correlated with age at onset of motor symptoms. Genes negatively correlated with repeat size were also enriched in a set of genes whose expression were CAG-correlated in human HD cerebellum. These findings both reveal the relatively small, but detectable impact of variation in the CAG allele in global data in these peripheral cells and provide a strategy for building multi-dimensional data-driven models of the biological network that drives the HD disease process by continuous analysis across allelic panels of neuronal cells vulnerable to the dominant effects of the HTT CAG repeat. © The Author 2013. Published by Oxford University Press. All rights reserved.


Kurreeman F.,Brigham and Women's Hospital | Kurreeman F.,Cambridge Broad Institute | Kurreeman F.,Leiden University | Liao K.,Brigham and Women's Hospital | And 25 more authors.
American Journal of Human Genetics | Year: 2011

Discovering and following up on genetic associations with complex phenotypes require large patient cohorts. This is particularly true for patient cohorts of diverse ancestry and clinically relevant subsets of disease. The ability to mine the electronic health records (EHRs) of patients followed as part of routine clinical care provides a potential opportunity to efficiently identify affected cases and unaffected controls for appropriate-sized genetic studies. Here, we demonstrate proof-of-concept that it is possible to use EHR data linked with biospecimens to establish a multi-ethnic case-control cohort for genetic research of a complex disease, rheumatoid arthritis (RA). In 1,515 EHR-derived RA cases and 1,480 controls matched for both genetic ancestry and disease-specific autoantibodies (anti-citrullinated protein antibodies [ACPA]), we demonstrate that the odds ratios and aggregate genetic risk score (GRS) of known RA risk alleles measured in individuals of European ancestry within our EHR cohort are nearly identical to those derived from a genome-wide association study (GWAS) of 5,539 autoantibody-positive RA cases and 20,169 controls. We extend this approach to other ethnic groups and identify a large overlap in the GRS among individuals of European, African, East Asian, and Hispanic ancestry. We also demonstrate that the distribution of a GRS based on 28 non-HLA risk alleles in ACPA+ cases partially overlaps with ACPA- subgroup of RA cases. Our study demonstrates that the genetic basis of rheumatoid arthritis risk is similar among cases of diverse ancestry divided into subsets based on ACPA status and emphasizes the utility of linking EHR clinical data with biospecimens for genetic studies. © 2011 The American Society of Human Genetics.


Sinnott J.A.,Harvard University | Dai W.,Harvard University | Liao K.P.,Brigham And Womens Hospital | Shaw S.Y.,Massachusetts General Hospital | And 11 more authors.
Human Genetics | Year: 2014

To reduce costs and improve clinical relevance of genetic studies, there has been increasing interest in performing such studies in hospital-based cohorts by linking phenotypes extracted from electronic medical records (EMRs) to genotypes assessed in routinely collected medical samples. A fundamental difficulty in implementing such studies is extracting accurate information about disease outcomes and important clinical covariates from large numbers of EMRs. Recently, numerous algorithms have been developed to infer phenotypes by combining information from multiple structured and unstructured variables extracted from EMRs. Although these algorithms are quite accurate, they typically do not provide perfect classification due to the difficulty in inferring meaning from the text. Some algorithms can produce for each patient a probability that the patient is a disease case. This probability can be thresholded to define case–control status, and this estimated case–control status has been used to replicate known genetic associations in EMR-based studies. However, using the estimated disease status in place of true disease status results in outcome misclassification, which can diminish test power and bias odds ratio estimates. We propose to instead directly model the algorithm-derived probability of being a case. We demonstrate how our approach improves test power and effect estimation in simulation studies, and we describe its performance in a study of rheumatoid arthritis. Our work provides an easily implemented solution to a major practical challenge that arises in the use of EMR data, which can facilitate the use of EMR infrastructure for more powerful, cost-effective, and diverse genetic studies. © 2014, Springer-Verlag Berlin Heidelberg.


Ananthakrishnan A.N.,Massachusetts General Hospital | Ananthakrishnan A.N.,Harvard University | Guzman-Perez R.,Partners Health Care System | Gainer V.,Partners Health Care System | And 7 more authors.
Alimentary Pharmacology and Therapeutics | Year: 2012

Background The increasing incidence of Clostridium difficile (C. difficile) infection (CDI) among patients with inflammatory bowel disease is well recognised. However, most studies have focused on demonstrating that CDI is associated with adverse outcomes in IBD patients. Few have attempted to identify predictors of severe outcomes associated with CDI among IBD patients. Aim To identify clinical and laboratory factors that predict severe outcomes associated with CDI in IBD patients. Methods From a multi-institution EMR database, we identified all hospitalised patients with at least one diagnosis code for C. difficile from among those with a diagnosis of Crohn's disease or ulcerative colitis. Our primary outcome was time to total colectomy or death with follow-up censored at 180 days after CDI. Cox proportional hazards models were used to identify predictors of the primary outcome from among demographic, disease-related, laboratory and medication variables. Results A total of 294 patients with CDI-IBD were included in our study. Of these, 58 patients (20%) met our primary outcome (45 deaths, 13 colectomy) at a median of 31 days. On multivariate analysis, serum albumin <3 g/dL (HR 5.75, 95% CI 1.34-24.56), haemoglobin below 9 g/dL (HR 5.29, 95% CI 1.58-17.69) and creatinine above 1.5 mg/dL (HR 1.98, 95% CI 1.04-3.79) were independent predictors of our primary outcome. Examining laboratory parameters as continuous variables or shortening our primary outcome to include events within 90 days yielded similar results. Conclusion Serum albumin below 3 g/dL, haemoglobin below 9 g/dL and serum creatinine above 1.5 mg/dL were independent predictors of severe outcomes in hospitalised IBD patients with Clostridium difficile infection. © 2012 Blackwell Publishing Ltd.


Ananthakrishnan A.N.,Massachusetts General Hospital | Ananthakrishnan A.N.,Harvard University | Cagan A.,HealthCare Partners | Cai T.,Harvard University | And 16 more authors.
Inflammatory Bowel Diseases | Year: 2015

Background: The accuracy and utility of electronic health record (EHR)-derived phenotypes in replicating genotype-phenotype relationships have been infrequently examined. Low circulating Vitamin D levels are associated with severe outcomes in inflammatory bowel disease (IBD); however, the genetic basis for Vitamin D insufficiency in this population has not been examined previously. Methods: We compared the accuracy of physician-assigned phenotypes in a large prospective IBD registry to that identified by an EHR algorithm incorporating codified and structured data. Genotyping for IBD risk alleles was performed on the Immunochip and a genetic risk score calculated and compared between EHR-defined patients and those in the registry. Additionally, 4 Vitamin D risk alleles were genotyped and serum 25-hydroxy Vitamin D [25(OH)D] levels compared across genotypes. Results: A total of 1131 patients captured by our EHR algorithm were also included in our prospective registry (656 Crohn's disease, 475 ulcerative colitis). The overall genetic risk score for Crohn's disease (P 0.13) and ulcerative colitis (P 0.32) was similar between EHR-defined patients and a prospective registry. Three of the 4 Vitamin D risk alleles were associated with low Vitamin D levels in patients with IBD and contributed an additional 3% of the variance explained. Vitamin D genetic risk score did not predict normalization of Vitamin D levels. Conclusions: EHR cohorts form valuable data sources for examining genotype-phenotype relationships. Vitamin D risk alleles explain 3% of the variance in Vitamin D levels in patients with IBD. © 2015 Crohn's & Colitis Foundation of America, Inc.


Fossale E.,Massachusetts General Hospital | Seong I.S.,Massachusetts General Hospital | Coser K.R.,Massachusetts General Hospital | Shioda T.,Massachusetts General Hospital | And 6 more authors.
Human Molecular Genetics | Year: 2011

Huntington's disease (HD) involves marked early neurodegeneration in the striatum, whereas the cerebellum is relatively spared despite the ubiquitous expression of full-length mutant huntingtin, implying that inherent tissue-specific differences determine susceptibility to the HD CAG mutation. To understand this tissue specificity, we compared early mutant huntingtin-induced gene expression changes in striatum to those in cerebellum in young Hdh CAG knock-in mice, prior to onset of evident pathological alterations. Endogenous levels of full-length mutant huntingtin caused qualitatively similar, but quantitatively different gene expression changes in the two brain regions. Importantly, the quantitatively different responses in the striatum and cerebellum in mutant mice were well accounted for by the intrinsic molecular differences in gene expression between the striatum and cerebellum in wild-type animals. Tissue-specific gene expression changes in response to the HD mutation, therefore, appear to reflect the different inherent capacities of these tissues to buffer qualitatively similar effects of mutant huntingtin. These findings highlight a role for intrinsic quantitative tissue differences in contributing to HD pathogenesis, and likely to other neurodegenerative disorders exhibiting tissue-specificity, thereby guiding the search for effective therapeutic interventions. © The Author 2011. Published by Oxford University Press. All rights reserved.

Loading i2b2 National Center for Biomedical Computing collaborators
Loading i2b2 National Center for Biomedical Computing collaborators