Bacterial chromosomes must be compacted by three-orders of magnitude to fit within the cell. While such compaction could in theory yield disordered structures, it is becoming increasingly clear that bacterial chromosomes are in fact arranged in regular and reproducible fashions and that their configurations are tightly connected to fundamental processes such as chromosome segregation. Nonetheless, due to throughput and resolution limitations associated with traditional assays, many question regarding bacterial chromosome structure and its relation to genome function remain. Here, I review the related technologies, chromosome conformation capture (3C) and chromosome conformation capture carbon copy (5C), which my collaborators and I recently introduced as tools to probe the high-resolution folding of entire bacterial genomes. These technologies utilize covalent cross-linking and proximity ligation to facilitate the measurement of the spatial positioning of hundreds of genomic loci, thereby opening the door to high-throughput studies of bacterial chromosome structure. Hence, 3C and 5C represent powerful new tools for assaying the three-dimensional architecture of bacterial genomes. © 2012 Elsevier Inc. Source

The invention generally relates to methods for maintaining the integrity and identification of a nucleic acid template in a multiplex sequencing reaction. In certain embodiments, methods of the invention involve obtaining a template nucleic acid, incorporating a pair of sequence identifiers into the template, and sequencing the template.

The invention generally relates to methods for analyzing nucleic acids to identify novel mutations associated with diseases. In certain embodiments, methods of the invention involve obtaining nucleic acid from a subject having a disease, identifying at least one mutation in the nucleic acid, and comparing the mutation to a database of mutations known to be associated with the disease, wherein mutations that do not match to the database are identified as novel mutations.

The present invention generally relates to storing sequence read data. The invention can involve obtaining a plurality of sequence reads from a sample, identifying one or more sets of duplicative sequence reads within the plurality of sequence reads, and storing only one of the sequence reads from each set of duplicative sequence reads in a text file using nucleotide characters.

The invention relates to using a graph database in genetic analyses to link mutation data to extrinsic data. Entities such as mutations, patients, samples, alleles, and clinical information are individually represented and stored as nodes and relationships between entities are also individually represented and stored. Each node and relationship can be stored using a fixed-size record and nodes can be flexibly invoked to represent any entity without disrupting the existing data. Systems and methods of the invention may be used for obtaining data representing a mutation in an individual and using a node in a graph database to store a description of the mutation. The node has stored within it a pointer to an adjacent node that provides information about a clinical significance of the variant. The graph database can be queried to provide a report of the clinical significance of the mutation.

