Time filter

Source Type

Szalkai B.,Eötvös Loránd University | Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
Genomics | Year: 2016

Discoveries of new biomarkers for frequently occurring diseases are of special importance in today's medicine. While fully developed type II diabetes (T2D) can be detected easily, the early identification of high risk individuals is an area of interest in T2D, too. Metagenomic analysis of the human bacterial flora has shown subtle changes in diabetic patients, but no specific microbes are known to cause or promote the disease. Moderate changes were also detected in the microbial gene composition of the metagenomes of diabetic patients, but again, no specific gene was found that is present in disease-related and missing in healthy metagenome. However, these fine differences in microbial taxon- and gene composition are difficult to apply as quantitative biomarkers for diagnosing or predicting type II diabetes. In the present work we report some nucleotide 9-mers with significantly differing frequencies in diabetic and healthy intestinal flora. To our knowledge, it is the first time such short DNA fragments have been associated with T2D. The automated, quantitative analysis of the frequencies of short nucleotide sequences seems to be more feasible than accurate phylogenetic and functional analysis, and thus it might be a promising direction of diagnostic research. © 2016 Elsevier Inc.

Szalkai B.,Eötvös Loránd University | Varga B.,Eötvös Loránd University | Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
PLoS ONE | Year: 2015

Deep graph-theoretic ideas in the context with the graph of the World Wide Web led to the definition of Google's PageRank and the subsequent rise of the most popular search engine to date. Brain graphs, or connectomes, are being widely explored today. We believe that non-trivial graph theoretic concepts, similarly as it happened in the case of the World Wide Web, will lead to discoveries enlightening the structural and also the functional details of the animal and human brains. When scientists examine large networks of tens or hundreds of millions of vertices, only fast algorithms can be applied because of the size constraints. In the case of diffusion MRI-based structural human brain imaging, the effective vertex number of the connectomes, or brain graphs derived from the data is on the scale of several hundred today. That size facilitates applying strict mathematical graph algorithms even for some hard-to-compute (or NP-hard) quantities like vertex cover or balanced minimum cut. In the present work we have examined brain graphs, computed from the data of the Human Connectome Project, recorded from male and female subjects between ages 22 and 35. Significant differences were found between the male and female structural brain graphs: we show that the average female connectome has more edges, is a better expander graph, has larger minimal bisection width, and has more spanning trees than the average male connectome. Since the average female brain weighs less than the brain of males, these properties show that the female brain has better graph theoretical properties, in a sense, than the brain of males. It is known that the female brain has a smaller gray matter/white matter ratio than males, that is, a larger white matter/gray matter ratio than the brain of males; this observation is in line with our findings concerning the number of edges, since the white matter consists of myelinated axons, which, in turn, roughly correspond to the connections in the brain graph. We have also found that the minimum bisection width, normalized with the edge number, is also significantly larger in the right and the left hemispheres in females: therefore, the differing bisection widths are independent from the difference in the number of edges. © 2015 Szalkai et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Tothmeresz L.,H+ Technology | Grolmusz V.,H+ Technology | Grolmusz V.,Uratim Ltd.
Protein and Peptide Letters | Year: 2013

New methods for reliable quantitative analysis of biological network data are in high demand in today's bioinformatics and systems biology. Here we demonstrate the applicability of the co-citation, developed earlier for the analysis of scientific literature for finding functionally similar nodes in protein-protein interaction networks in several model organisms. We prove the power of our approach in a novel way: the predicted closely related enzymes are compared to the closeness of their enzyme commission (EC) numbers, therefore we can numerically evaluate our prediction method. We have found clear correspondence between related enzymatic functions and high co-citation of proteins in interaction networks. © 2013 Bentham Science Publishers.

Kerepesi C.,Eötvös Loránd University | Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
Archives of Virology | Year: 2016

The Kutch Desert (Great Rann of Kutch, Gujarat, India) is a unique ecosystem: in the larger part of the year it is a hot, salty desert that is flooded regularly in the Indian monsoon season. In the dry season, the crystallized salt deposits form the “white desert” in large regions. The first metagenomic analysis of the soil samples of Kutch was published in 2013, and the data were deposited in the NCBI Sequence Read Archive. At the same time, the sequences were analyzed phylogenetically for prokaryotes, especially for bacteria. In the present work, we identified DNA sequences of recently discovered giant viruses in the soil samples from the Kutch Desert. Since most giant viruses have been discovered in biofilms in industrial cooling towers, ocean water, and freshwater ponds, we were surprised to find their DNA sequences in soil samples from a seasonally very hot and arid, salty environment. © 2015, Springer-Verlag Wien.

Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
Information Processing Letters | Year: 2015

The PageRank is a widely used scoring function of networks in general and of the World Wide Web graph in particular. The PageRank is defined for directed graphs, but in some special cases applications for undirected graphs occur. In the literature it is widely - but not exclusively - noted that the PageRank for undirected graphs is proportional to the degrees of the vertices of the graph. We prove that statement for a particular personalization vector in the definition of the PageRank, and we also show that in general, the PageRank of an undirected graph is not exactly proportional to the degree distribution of the graph: our main theorem gives an upper and a lower bound to the l1 norm of the difference of the PageRank and the degree distribution vectors. A necessary and sufficient condition is also given for the PageRank for being proportional to the degree. © 2015 Elsevier B.V. Allrightsreserved.

Szalkai B.,Eötvös Loránd University | Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
Biochimica et Biophysica Acta - General Subjects | Year: 2016

Background: Metagenomic analysis of environmental and clinical samples is gaining considerable importance in today's literature. Changes in the composition of the intestinal microbial communities, relative to the healthy control, are reported in numerous conditions. Methods: We have carefully analyzed the frequencies of the short nucleotide sequences in the metagenomes of two different enterotypes; namely of Chinese and European origins. Results: We have identified 255 nucleotide sequences of length up to 9, such that their frequencies significantly differ in the two enterotypes examined. Conclusions: We have demonstrated that short nucleotide sequences are capable of differentiating enterotypes, and not only metagenomes, originating from healthy and diseased subjects. General significance: Our results may imply that the frequency-differences of certain short nucleotides have diagnostical value if properly applied for different clusters of metagenomes. © 2016 Elsevier B.V.

Kerepesi C.,Eötvös Loránd University | Banky D.,Eötvös Loránd University | Banky D.,Uratim Ltd. | Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
Gene | Year: 2014

Motivation: Metagenomics went through an astonishing development in the past few years. Today not only gene sequencing experts, but numerous laboratories of other specializations need to analyze DNA sequences gained from clinical or environmental samples. Phylogenetic analysis of the metagenomic data presents significant challenges for the biologist and the bioinformatician. The program suite AMPHORA and its workflow version are examples of publicly available software that yields reliable phylogenetic results for metagenomic data. Results: Here we present AmphoraNet, an easy-to-use webserver that is capable of assigning a probability-weighted taxonomic group for each phylogenetic marker gene found in the input metagenomic sample; the webserver is based on the AMPHORA2 workflow. Since a large proportion of molecular biologists uses the BLAST program and its clones on public webservers instead of the locally installed versions, we believe that the occasional user may find it comfortable that, in this version, no time-consuming installation of every component of the AMPHORA2 suite or expertise in Linux environment is required. Availability: The webserver is freely available at; no registration is required. © 2013 Elsevier B.V.

Banky D.,H+ Technology | Banky D.,Uratim Ltd. | Szalkai B.,H+ Technology | Grolmusz V.,H+ Technology | Grolmusz V.,Uratim Ltd.
Gene | Year: 2014

Every day tens of thousands of sequence searches and sequence alignment queries are submitted to webservers. The capitalized word "BLAST" becomes a verb, describing the act of performing sequence search and alignment. However, if one needs to search for sequences that contain, for example, two hydrophobic and three polar residues at five given positions, the query formation on the most frequently used webservers will be difficult. Some servers support the formation of queries with regular expressions, but most of the users are unfamiliar with their syntax. Here we present an intuitive, easily applicable webserver, the Protein Sequence Analysis server, that allows the formation of multiple choice queries by simply drawing the residues to their positions; if more than one residue are drawn to the same position, then they will be nicely stacked on the user interface, indicating the multiple choice at the given position. This computer-game-like interface is natural and intuitive, and the coloring of the residues makes possible to form queries requiring not just certain amino acids in the given positions, but also small nonpolar, negatively charged, hydrophobic, positively charged, or polar ones. The webserver is available at © 2014 Elsevier B.V.

Banky D.,Eötvös Loránd University | Banky D.,Uratim Ltd. | Ivan G.,Eötvös Loránd University | Ivan G.,Uratim Ltd. | And 2 more authors.
PLoS ONE | Year: 2013

Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks) that compensates for the low degree (non-hub) vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges) of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well), but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus), and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures importance in the directed edge structure of the graph. © 2013 Bánky et al.

Ivan G.,Eötvös Loránd University | Ivan G.,Uratim Ltd. | Grolmusz V.,Eötvös Loránd University | Grolmusz V.,Uratim Ltd.
Bioinformatics | Year: 2011

Motivation: Enormous and constantly increasing quantity of biological information is represented in metabolic and in protein interaction network databases. Most of these data are freely accessible through large public depositories. The robust analysis of these resources needs novel technologies, being developed today.Results: Here we demonstrate a technique, originating from the PageRank computation for the World Wide Web, for analyzing large interaction networks. The method is fast, scalable and robust, and its capabilities are demonstrated on metabolic network data of the tuberculosis bacterium and the proteomics analysis of the blood of melanoma patients. © The Author 2010. Published by Oxford University Press. All rights reserved.

Loading Uratim Ltd. collaborators
Loading Uratim Ltd. collaborators