Center for Synthetic and Systems Biology
Center for Synthetic and Systems Biology
Lu Y.Y.,University of Southern California |
Chen T.,University of Southern California |
Chen T.,Center for Synthetic and Systems Biology |
Fuhrman J.A.,University of Southern California |
And 2 more authors.
Bioinformatics | Year: 2017
Motivation: The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. Results: The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, Groop M, MaxBin and Meta BAT. The superior performance of COCACOLA relies on two aspects. One is using L1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, Groop M, Max Bin and Meta BAT. © The Author 2016.
Su H.-W.,Tsinghua University |
Zhu J.-H.,Tsinghua University |
Zhu J.-H.,Peking University |
Li H.,Tsinghua University |
And 10 more authors.
Nature Microbiology | Year: 2016
Although regulation of translation fidelity is an essential process 1-7, diverse organisms and organelles have differing requirements of translational accuracy 8-15, and errors in gene translation serve an adaptive function under certain conditions 16-20. Therefore, optimal levels of fidelity may vary according to context. Most bacteria utilize a two-step pathway for the specific synthesis of aminoacylated glutamine and/or asparagine tRNAs, involving the glutamine amidotransferase GatCAB 21-25, but it had not been appreciated that GatCAB may play a role in modulating mistranslation rates. Here, by using a forward genetic screen, we show that the mycobacterial GatCAB enzyme complex mediates the translational fidelity of glutamine and asparagine codons. We identify mutations in gatA that cause partial loss of function in the holoenzyme, with a consequent increase in rates of mistranslation. By monitoring single-cell transcription dynamics, we demonstrate that reduced gatCAB expression leads to increased mistranslation rates, which result in enhanced rifampicin-specific phenotypic resistance. Consistent with this, strains with mutations in gatA from clinical isolates of Mycobacterium tuberculosis show increased mistranslation, with associated antibiotic tolerance, suggesting a role for mistranslation as an adaptive strategy in tuberculosis. Together, our findings demonstrate a potential role for the indirect tRNA aminoacylation pathway in regulating translational fidelity and adaptive mistranslation. © 2016 Macmillan Publishers Limited, part of Springer Nature.
Chen N.,Center for Synthetic and Systems Biology |
Zhu J.,Center for Synthetic and Systems Biology |
Xia F.,Center for Synthetic and Systems Biology |
Zhang B.,Center for Synthetic and Systems Biology
IEEE Transactions on Pattern Analysis and Machine Intelligence | Year: 2015
Relational topic models (RTMs) provide a probabilistic generative process to describe both the link structure and document contents for document networks, and they have shown promise on predicting network structures and discovering latent topic representations. However, existing RTMs have limitations in both the restricted model expressiveness and incapability of dealing with imbalanced network data. To expand the scope and improve the inference accuracy of RTMs, this paper presents three extensions: 1) unlike the common link likelihood with a diagonal weight matrix that allows the-same-topic interactions only, we generalize it to use a full weight matrix that captures all pairwise topic interactions and is applicable to asymmetric networks; 2) instead of doing standard Bayesian inference, we perform regularized Bayesian inference (RegBayes) with a regularization parameter to deal with the imbalanced link structure issue in real networks and improve the discriminative ability of learned latent representations; and 3) instead of doing variational approximation with strict mean-field assumptions, we present collapsed Gibbs sampling algorithms for the generalized relational topic models by exploring data augmentation without making restricting assumptions. Under the generic RegBayes framework, we carefully investigate two popular discriminative loss functions, namely, the logistic log-loss and the max-margin hinge loss. Experimental results on several real network datasets demonstrate the significance of these extensions on improving prediction performance. © 2014 IEEE.