Institute of Advanced Study TUM IAS

Garching bei München, Germany

Institute of Advanced Study TUM IAS

Garching bei München, Germany
Time filter
Source Type

Kloppmann E.,TU Munich | Kloppmann E.,New York Structural Biology Center | Punta M.,Wellcome Trust Sanger Institute | Rost B.,TU Munich | And 2 more authors.
Current Opinion in Structural Biology | Year: 2012

Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology. © 2012 Elsevier Ltd.

Vicedo E.,TUM | Vicedo E.,11 Health | Schlessinger A.,Mount Sinai School of Medicine | Rost B.,TUM | And 2 more authors.
PLoS ONE | Year: 2015

Many prokaryotic organisms have adapted to incredibly extreme habitats. The genomes of such extremophiles differ from their non-extremophile relatives. For example, some proteins in thermophiles sustain high temperatures by being more compact than homologs in nonextremophiles. Conversely, some proteins have increased volumes to compensate for freezing effects in psychrophiles that survive in the cold. Here, we revealed that some differences in organisms surviving in extreme habitats correlate with a simple single feature, namely the fraction of proteins predicted to have long disordered regions. We predicted disorder with different methods for 46 completely sequenced organisms from diverse habitats and found a correlation between protein disorder and the extremity of the environment. More specifically, the overall percentage of proteins with long disordered regions tended to be more similar between organisms of similar habitats than between organisms of similar taxonomy. For example, predictions tended to detect substantially more proteins with long disordered regions in prokaryotic halophiles (survive high salt) than in their taxonomic neighbors. Another peculiar environment is that of high radiation survived, e.g. by Deinococcus radiodurans. The relatively high fraction of disorder predicted in this extremophile might provide a shield against mutations. Although our analysis fails to establish causation, the observed correlation between such a simplistic, coarse-grained, microscopic molecular feature (disorder content) and a macroscopic variable (habitat) remains stunning. © 2015 Vicedo et al.

Hamp T.,TUM | Kassner R.,TUM | Seemayer S.,TUM | Vicedo E.,TUM | And 18 more authors.
BMC Bioinformatics | Year: 2013

Background: Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference.Methods: Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements.Results and conclusions: During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA. © 2013 Hamp et al.; licensee BioMed Central Ltd.

Hecht M.,TU Munich | Bromberg Y.,Rutgers University | Bromberg Y.,Institute of Advanced Study TUM IAS | Rost B.,TU Munich | Rost B.,Institute of Advanced Study TUM IAS
BMC Genomics | Year: 2015

Definitions used: Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant. © 2015 Hecht et al.; licensee BioMed Central Ltd.

Kajan L.,TU Munich | Hopf T.A.,TU Munich | Hopf T.A.,Harvard University | Kalas M.,University of Bergen | And 3 more authors.
BMC Bioinformatics | Year: 2014

Background: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software.Results: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library " libfreecontact" , complete with command line tool " freecontact" , as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability.Conclusions: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud). © 2014 Kaján et al.; licensee BioMed Central Ltd.

Reeb J.,TU Munich | Kloppmann E.,TU Munich | Kloppmann E.,New York Structural Biology Center | Bernhofer M.,TU Munich | And 4 more authors.
Proteins: Structure, Function and Bioinformatics | Year: 2015

Experimental structure determination continues to be challenging for membrane proteins. Computational prediction methods are therefore needed and widely used to supplement experimental data. Here, we re-examined the state of the art in transmembrane helix prediction based on a nonredundant dataset with 190 high-resolution structures. Analyzing 12 widely-used and well-known methods using a stringent performance measure, we largely confirmed the expected high level of performance. On the other hand, all methods performed worse for proteins that could not have been used for development. A few results stood out: First, all methods predicted proteins in eukaryotes better than those in bacteria. Second, methods worked less well for proteins with many transmembrane helices. Third, most methods correctly discriminated between soluble and transmembrane proteins. However, several older methods often mistook signal peptides for transmembrane helices. Some newer methods have overcome this shortcoming. In our hands, PolyPhobius and MEMSAT-SVM outperformed other methods. © 2014 Wiley Periodicals, Inc.

Bernhofer M.,TU Munich | Kloppmann E.,TU Munich | Kloppmann E.,New York Structural Biology Center | Reeb J.,TU Munich | And 4 more authors.
Proteins: Structure, Function and Bioinformatics | Year: 2016

Transmembrane proteins (TMPs) are important drug targets because they are essential for signaling, regulation, and transport. Despite important breakthroughs, experimental structure determination remains challenging for TMPs. Various methods have bridged the gap by predicting transmembrane helices (TMHs), but room for improvement remains. Here, we present TMSEG, a novel method identifying TMPs and accurately predicting their TMHs and their topology. The method combines machine learning with empirical filters. Testing it on a non-redundant dataset of 41 TMPs and 285 soluble proteins, and applying strict performance measures, TMSEG outperformed the state-of-the-art in our hands. TMSEG correctly distinguished helical TMPs from other proteins with a sensitivity of 98 ± 2% and a false positive rate as low as 3 ± 1%. Individual TMHs were predicted with a precision of 87 ± 3% and recall of 84 ± 3%. Furthermore, in 63 ± 6% of helical TMPs the placement of all TMHs and their inside/outside topology was correctly predicted. There are two main features that distinguish TMSEG from other methods. First, the errors in finding all helical TMPs in an organism are significantly reduced. For example, in human this leads to 200 and 1600 fewer misclassifications compared to the second and third best method available, and 4400 fewer mistakes than by a simple hydrophobicity-based method. Second, TMSEG provides an add-on improvement for any existing method to benefit from. Proteins 2016; 84:1706–1716. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

Hamp T.,TU Munich | Goldberg T.,TU Munich | Rost B.,TU Munich | Rost B.,Institute of Advanced Study TUM IAS | Rost B.,Columbia University
PLoS ONE | Year: 2013

One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at Bugs and other issues may be reported at © 2013 Hamp et al.

Loading Institute of Advanced Study TUM IAS collaborators
Loading Institute of Advanced Study TUM IAS collaborators