Time filter

Source Type

Illkirch-Graffenstaden, France

Michel C.J.,Equipe de Bioinformatique Theorique | Pirillo G.,CNR Institute for System Analysis and Computer Science Antonio Ruberti | Pirillo G.,University of Marne-la-Vallee
Computational Biology and Chemistry | Year: 2010

A new trinucleotide proposition is proved here and allows all the trinucleotide circular codes on the genetic alphabet to be identified (their numbers and their sets of words). This new class of genetic motifs, i.e. circular codes (or synchronizing genetic motifs), may be involved in the structure and the origin of the genetic code, and in reading frames of genes. © 2010 Elsevier Ltd. All rights reserved.

Michel C.J.,Equipe de Bioinformatique Theorique
Computational Biology and Chemistry | Year: 2012

In 1996, a common trinucleotide circular code, called X, is identified in genes of eukaryotes and prokaryotes (Arqus and Michel, 1996). This circular code X is a set of 20 trinucleotides allowing the reading frames in genes to be retrieved locally, i.e. anywhere in genes and in particular without start codons. This reading frame retrieval needs a window length l of 12 nucleotides (l ≥ 12). With a window length strictly less than 12 nucleotides (l < 12), some words of X, called ambiguous words, are found in the shifted frames (the reading frame shifted by one or two nucleotides) preventing the reading frame in genes to be retrieved. Since 1996, these ambiguous words of X were never studied. In the first part of this paper, we identify all the ambiguous words of the common trinucleotide circular code X. With a length l varying from 1 to 11 nucleotides, the type and the occurrence number (multiplicity) of ambiguous words of X are given in each shifted frame. Maximal ambiguous words of X, words which are not factors of another ambiguous words, are also determined. Two probability definitions based on these results show that the common trinucleotide circular code X retrieves the reading frame in genes with a probability of about 90% with a window length of 6 nucleotides, and a probability of 99.9% with a window length of 9 nucleotides (100% with a window length of 12 nucleotides, by definition of a circular code). In the second part of this paper, we identify X circular code motifs (shortly X motifs) in transfer RNA and 16S ribosomal RNA: a tRNA X motif of 26 nucleotides including the anticodon stem-loop and seven 16S rRNA X motifs of length greater or equal to 15 nucleotides. Window lengths of reading frame retrieval with each trinucleotide of these X motifs are also determined. Thanks to the crystal structure 3I8G (Jenner et al., 2010), a 3D visualization of X motifs in the ribosome shows several spatial configurations involving mRNA X motifs, A-tRNA and E-tRNA X motifs, and four 16S rRNA X motifs. Another identified 16S rRNA X motif is involved in the decoding center which recognizes the codon-anticodon helix in A-tRNA. From a code theory point of view, these identified X circular code motifs and their mathematical properties may constitute a translation code involved in retrieval, maintenance and synchronization of reading frames in genes. © 2011 Elsevier Ltd.

Michel C.J.,Equipe de Bioinformatique Theorique | Pirillo G.,CNR Institute of Neuroscience | Pirillo G.,University of Marne-la-Vallee | Pirillo M.A.,Istituto Statale . Annunziata
Information and Computation | Year: 2012

Trinucleotide comma-free codes and trinucleotide circular codes are two important classes of codes in code theory and theoretical biology. A trinucleotide circular code containing exactly 20 elements is called here a 20-trinucleotide circular code. In this paper, solving a combinatorial problem of hard computational complexity, we extend and improve our results of C.J. Michel, G. Pirillo, and M.A. Pirillo (2008) [14] concerning the small class of 528 self-complementary 20-trinucleotide circular codes, to the complete class of the 20-trinucleotide circular codes which contains 12,964,440 elements. A surprising relation with the symmetric group Σ4 appears but it remains unexplained so far. © 2011 Elsevier Inc.

Lbre S.,Equipe de Bioinformatique Theorique | Michel C.J.,Equipe de Bioinformatique Theorique
Computational Biology and Chemistry | Year: 2010

We develop here a new class of stochastic models of gene evolution based on residue Insertion-Deletion Independent from Substitution (IDIS). Indeed, in contrast to all existing evolution models, insertions and deletions are modeled here by a concept in population dynamics. Therefore, they are not only independent from each other, but also independent from the substitution process. After a separate stochastic analysis of the substitution and the insertion-deletion processes, we obtain a matrix differential equation combining these two processes defining the IDIS model. By deriving a general solution, we give an analytical expression of the residue occurrence probability at evolution time t as a function of a substitution rate matrix, an insertion rate vector, a deletion rate and an initial residue probability vector. Various mathematical properties of the IDIS model in relation with time t are derived: time scale, time step, time inversion and sequence length. Particular expressions of the nucleotide occurrence probability at time t are given for classical substitution rate matrices in various biological contexts: equal insertion rate, insertion-deletion only and substitution only. All these expressions can be directly used for biological evolutionary applications. The IDIS model shows a strongly different stochastic behavior from the classical substitution only model when compared on a gene dataset. Indeed, by considering three processes of residue insertion, deletion and substitution independently from each other, it allows a more realistic representation of gene evolution and opens new directions and applications in this research field. © 2010 Elsevier Ltd. All rights reserved.

Discover hidden collaborations