Stockholm Bioinformatics Center

Solna, Sweden

Stockholm Bioinformatics Center

Solna, Sweden
Time filter
Source Type

Alexeyenko A.,KTH Royal Institute of Technology | Alexeyenko A.,Science for Life Laboratory | Schmitt T.,Science for Life Laboratory | Schmitt T.,Stockholm Bioinformatics Center | And 15 more authors.
Nucleic Acids Research | Year: 2012

FunCoup ( is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website. © The Author(s) 2011.

Dessimoz C.,Swiss Institute of Bioinformatics | Gabaldon T.,Center for Genomic Regulation CRG and UPF | Roos D.S.,University of Pennsylvania | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Herrero J.,European Bioinformatics Institute
Bioinformatics | Year: 2012

The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. © The Author(s) 2012. Published by Oxford University Press.

Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,Swedish cience Research Center | Sonnhammer E.L.L.,University of Stockholm | Gabaldon T.,Center for Genomic Regulation | And 10 more authors.
Bioinformatics | Year: 2014

Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. © The Author 2014. Published by Oxford University Press.

Schmitt T.,Stockholm Bioinformatics Center | Schmitt T.,University of Stockholm | Ogris C.,Stockholm Bioinformatics Center | Ogris C.,University of Stockholm | And 3 more authors.
Nucleic Acids Research | Year: 2014

We present an update of the FunCoup database ( of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction. © 2013 The Author(s). Published by Oxford University Press.

Frings O.,Stockholm Bioinformatics Center | Frings O.,University of Stockholm | Augsten M.,Karolinska Institutet | Tobin N.P.,Karolinska Institutet | And 11 more authors.
American Journal of Pathology | Year: 2013

In this study, we describe a novel gene expression signature of platelet-derived growth factor (PDGF)-activated fibroblasts, which is able to identify breast cancers with a PDGF-stimulated fibroblast stroma and displays an independent and strong prognostic significance. Global gene expression was compared between PDGF-stimulated human fibroblasts and cultured resting fibroblasts. The most differentially expressed genes were reduced to a gene expression signature of 113 genes. The biological significance and prognostic capacity of this signature were investigated using four independent clinical breast cancer data sets. Concomitant high expression of PDGFβ receptor and its cognate ligands is associated with a high PDGF signature score. This supports the notion that the signature detects tumors with PDGF-activated stroma. Subsequent analyses indicated significant associations between high PDGF signature score and clinical characteristics, including human epidermal growth factor receptor 2 positivity, estrogen receptor negativity, high tumor grade, and large tumor size. A high PDGF signature score is associated with shorter survival in univariate analysis. Furthermore, the high PDGF signature score acts as a significant marker of poor prognosis in multivariate survival analyses, including classic prognostic markers, Ki-67 status, a proliferation gene signature, or other recently described stroma-derived gene expression signatures. Copyright © 2013 American Society for Investigative Pathology.

Ostlund G.,Stockholm Bioinformatics Center | Ostlund G.,University of Stockholm | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,University of Stockholm | Sonnhammer E.L.L.,Swedish eScience Research Center
Gene | Year: 2012

mRNA expression is widely used as a proxy for protein expression. However, their true relation is not known and two genes with the same mRNA levels might have different abundances of respective proteins. A related question is whether the coexpression of mRNA for gene pairs is reflected by the corresponding protein pairs.We examined the mRNA-protein correlation for both expression and coexpression. This analysis yielded insights into the relationship between mRNA and protein abundance, and allowed us to identify subsets of greater mRNA-protein coherence.The correlation between mRNA and protein was low for both expression and coexpression, 0.12 and 0.06 respectively. However, applying the best-performing quality measure, high-quality subsets reached a Spearman correlation of 0.31 for expression, 0.34 for coexpression and 0.49 for coexpression when restricted to functionally coupled genes. Our methodology can thus identify subsets for which the mRNA levels are expected to be the strongest correlated with protein levels. © 2012 Elsevier B.V.

Ostlund G.,Stockholm Bioinformatics Center | Ostlund G.,University of Stockholm | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,University of Stockholm | Sonnhammer E.L.L.,Swedish eScience Research Center
Genomics | Year: 2014

Differential gene expression analysis between healthy and diseased groups is a widely used approach to understand the molecular underpinnings of disease. A wide variety of experimental and bioinformatics techniques are available for this type of analysis, yet their impact on the reliability of the results has not been systematically studied.We performed a large scale comparative analysis of clinical expression data, using several background corrections and differential expression metrics. The agreement between studies was analyzed for study pairs of same cancer type, of different cancer types, and between cancer and non-cancer studies. We also replicated the analysis using differential coexpression.We found that agreement of differential expression is primarily dictated by the microarray platform, while differential coexpression requires large sample sizes. Two studies using different differential expression metrics may show no agreement, even if they agree strongly using the same metric. Our analysis provides practical recommendations for gene (co)expression analysis. © 2013 Elsevier Inc.

Forslund K.,Stockholm Bioinformatics Center | Forslund K.,University of Stockholm | Pekkari I.,Stockholm Bioinformatics Center | Pekkari I.,University of Stockholm | And 3 more authors.
BMC Bioinformatics | Year: 2011

Background: As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence.To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.Results: The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation.The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.Conclusions: On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. © 2011 Forslund et al.; licensee BioMed Central Ltd.

Schmitt T.,Stockholm Bioinformatics Center | Messina D.N.,Stockholm Bioinformatics Center | Schreiber F.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,University of Stockholm
Briefings in Bioinformatics | Year: 2011

There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult.We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence recordsc-the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e.g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information. © The Author 2011. Published by Oxford University Press.

Forslund K.,Stockholm Bioinformatics Center | Schreiber F.,Stockholm Bioinformatics Center | Thanintorn N.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,University of Stockholm
Briefings in Bioinformatics | Year: 2011

Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases.We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version.Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups. © The Author 2011. Published by Oxford University Press.

Loading Stockholm Bioinformatics Center collaborators
Loading Stockholm Bioinformatics Center collaborators