Swedish eScience Research Center

Swedish, Sweden

Swedish eScience Research Center

Swedish, Sweden
Time filter
Source Type

Humphreys K.,Karolinska Institutet | Grankvist A.,Karolinska Institutet | Leu M.,Karolinska Institutet | Hall P.,Karolinska Institutet | And 27 more authors.
PLoS ONE | Year: 2011

Patterns of genetic diversity have previously been shown to mirror geography on a global scale and within continents and individual countries. Using genome-wide SNP data on 5174 Swedes with extensive geographical coverage, we analyzed the genetic structure of the Swedish population. We observed strong differences between the far northern counties and the remaining counties. The population of Dalarna county, in north middle Sweden, which borders southern Norway, also appears to differ markedly from other counties, possibly due to this county having more individuals with remote Finnish or Norwegian ancestry than other counties. An analysis of genetic differentiation (based on pairwise F st) indicated that the population of Sweden's southernmost counties are genetically closer to the HapMap CEU samples of Northern European ancestry than to the populations of Sweden's northernmost counties. In a comparison of extended homozygous segments, we detected a clear divide between southern and northern Sweden with small differences between the southern counties and considerably more segments in northern Sweden. Both the increased degree of homozygosity in the north and the large genetic differences between the south and the north may have arisen due to a small population in the north and the vast geographical distances between towns and villages in the north, in contrast to the more densely settled southern parts of Sweden. Our findings have implications for future genome-wide association studies (GWAS) with respect to the matching of cases and controls and the need for within-county matching. We have shown that genetic differences within a single country may be substantial, even when viewed on a European scale. Thus, population stratification needs to be accounted for, even within a country like Sweden, which is often perceived to be relatively homogenous and a favourable resource for genetic mapping, otherwise inferences based on genetic data may lead to false conclusions. © 2011 Humphreys et al.

McCormack T.,Stockholm Bioinformatics Center | McCormack T.,University of Stockholm | Frings O.,Stockholm Bioinformatics Center | Frings O.,University of Stockholm | And 5 more authors.
PLoS ONE | Year: 2013

Motivation: Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results: Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation: CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions. © 2013 McCormack et al.

Frings O.,Stockholm Bioinformatics Center | Frings O.,University of Stockholm | Augsten M.,Karolinska Institutet | Tobin N.P.,Karolinska Institutet | And 11 more authors.
American Journal of Pathology | Year: 2013

In this study, we describe a novel gene expression signature of platelet-derived growth factor (PDGF)-activated fibroblasts, which is able to identify breast cancers with a PDGF-stimulated fibroblast stroma and displays an independent and strong prognostic significance. Global gene expression was compared between PDGF-stimulated human fibroblasts and cultured resting fibroblasts. The most differentially expressed genes were reduced to a gene expression signature of 113 genes. The biological significance and prognostic capacity of this signature were investigated using four independent clinical breast cancer data sets. Concomitant high expression of PDGFβ receptor and its cognate ligands is associated with a high PDGF signature score. This supports the notion that the signature detects tumors with PDGF-activated stroma. Subsequent analyses indicated significant associations between high PDGF signature score and clinical characteristics, including human epidermal growth factor receptor 2 positivity, estrogen receptor negativity, high tumor grade, and large tumor size. A high PDGF signature score is associated with shorter survival in univariate analysis. Furthermore, the high PDGF signature score acts as a significant marker of poor prognosis in multivariate survival analyses, including classic prognostic markers, Ki-67 status, a proliferation gene signature, or other recently described stroma-derived gene expression signatures. Copyright © 2013 American Society for Investigative Pathology.

Ostlund G.,Stockholm Bioinformatics Center | Ostlund G.,University of Stockholm | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,University of Stockholm | Sonnhammer E.L.L.,Swedish eScience Research Center
Gene | Year: 2012

mRNA expression is widely used as a proxy for protein expression. However, their true relation is not known and two genes with the same mRNA levels might have different abundances of respective proteins. A related question is whether the coexpression of mRNA for gene pairs is reflected by the corresponding protein pairs.We examined the mRNA-protein correlation for both expression and coexpression. This analysis yielded insights into the relationship between mRNA and protein abundance, and allowed us to identify subsets of greater mRNA-protein coherence.The correlation between mRNA and protein was low for both expression and coexpression, 0.12 and 0.06 respectively. However, applying the best-performing quality measure, high-quality subsets reached a Spearman correlation of 0.31 for expression, 0.34 for coexpression and 0.49 for coexpression when restricted to functionally coupled genes. Our methodology can thus identify subsets for which the mRNA levels are expected to be the strongest correlated with protein levels. © 2012 Elsevier B.V.

Ostlund G.,Stockholm Bioinformatics Center | Ostlund G.,University of Stockholm | Sonnhammer E.L.L.,Stockholm Bioinformatics Center | Sonnhammer E.L.L.,University of Stockholm | Sonnhammer E.L.L.,Swedish eScience Research Center
Genomics | Year: 2014

Differential gene expression analysis between healthy and diseased groups is a widely used approach to understand the molecular underpinnings of disease. A wide variety of experimental and bioinformatics techniques are available for this type of analysis, yet their impact on the reliability of the results has not been systematically studied.We performed a large scale comparative analysis of clinical expression data, using several background corrections and differential expression metrics. The agreement between studies was analyzed for study pairs of same cancer type, of different cancer types, and between cancer and non-cancer studies. We also replicated the analysis using differential coexpression.We found that agreement of differential expression is primarily dictated by the microarray platform, while differential coexpression requires large sample sizes. Two studies using different differential expression metrics may show no agreement, even if they agree strongly using the same metric. Our analysis provides practical recommendations for gene (co)expression analysis. © 2013 Elsevier Inc.

Forslund K.,Stockholm Bioinformatics Center | Forslund K.,University of Stockholm | Pekkari I.,Stockholm Bioinformatics Center | Pekkari I.,University of Stockholm | And 3 more authors.
BMC Bioinformatics | Year: 2011

Background: As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence.To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.Results: The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation.The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.Conclusions: On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. © 2011 Forslund et al.; licensee BioMed Central Ltd.

Loading Swedish eScience Research Center collaborators
Loading Swedish eScience Research Center collaborators