Northeast Structural Genomics Consortium

Piscataway, NJ, United States

Northeast Structural Genomics Consortium

Piscataway, NJ, United States
Time filter
Source Type

Yang Y.,Miami University Ohio | Ramelot T.A.,Miami University Ohio | Cort J.R.,Northeast Structural Genomics Consortium | Cort J.R.,Pacific Northwest National Laboratory | And 18 more authors.
Journal of Structural and Functional Genomics | Year: 2011

Protein domain family YabP (PF07873) is a family of small protein domains that are conserved in a wide range of bacteria and involved in spore coat assembly during the process of sporulation. The 62-residue fragment of Dsy0195 from Desulfitobacterium hafniense, which belongs to the YabP family, exists as a homodimer in solution under the conditions used for structure determination using NMR spectroscopy. The structure of the Dsy0195 homodimer contains two identical 62-residue monomeric subunits, each consisting of five anti-parallel beta strands (β1, 23-29; β2, 31-38; β3, 41-46; β4, 49-59; β5, 69-80). The tertiary structure of the Dsy0195 monomer adopts a cylindrical fold composed of two beta sheets. The two monomer subunits fold into a homodimer about a single C2 symmetry axis, with the interface composed of two anti-parallel beta strands, β1-β1′ and β5b- β5b′, where β5b refers to the C-terminal half of the bent β5 strand, without any domain swapping. Potential functional regions of the Dsy0195 structure were predicted based on conserved sequence analysis. The Dsy0195 structure reported here is the first representative structure from the YabP family. © 2011 Springer Science+Business Media B.V.

Yang Y.,Miami University Ohio | Ramelot T.A.,Miami University Ohio | McCarrick R.M.,Miami University Ohio | Ni S.,Miami University Ohio | And 21 more authors.
Journal of the American Chemical Society | Year: 2010

There is a general need to develop more powerful and more robust methods for structural characterization of homodimers, homo-oligomers, and multiprotein complexes using solution-state NMR methods. In recent years, there has been increasing emphasis on integrating distinct and complementary methodologies for structure determination of multiprotein complexes. One approach not yet widely used is to obtain intermediate and long-range distance constraints from paramagnetic relaxation enhancements (PRE) and electron paramagnetic resonance (EPR)-based techniques such as double electron electron resonance (DEER), which, when used together, can provide supplemental distance constraints spanning to 10-70 Å. In this Communication, we describe integration of PRE and DEER data with conventional solution-state nuclear magnetic resonance (NMR) methods for structure determination of Dsy0195, a homodimer (62 amino acids per monomer) from Desulfitobacterium hafniense. Our results indicate that combination of conventional NMR restraints with only one or a few DEER distance constraints and a small number of PRE constraints is sufficient for the automatic NMR-based structure determination program CYANA to build a network of interchain nuclear Overhauser effect constraints that can be used to accurately define both the homodimer interface and the global homodimer structure. The use of DEER distances as a source of supplemental constraints as described here has virtually no upper molecular weight limit, and utilization of the PRE constraints is limited only by the ability to make accurate assignments of the protein amide proton and nitrogen chemical shifts. © 2010 American Chemical Society.

Arbing M.A.,Columbia University | Arbing M.A.,University of California at Los Angeles | Handelman S.K.,Columbia University | Kuzin A.P.,Columbia University | And 21 more authors.
Structure | Year: 2010

Bacterial toxin-antitoxin (TA) systems serve a variety of physiological functions including regulation of cell growth and maintenance of foreign genetic elements. Sequence analyses suggest that TA families are linked by complex evolutionary relationships reflecting likely swapping of functional domains between different TA families. Our crystal structures of Phd-Doc from bacteriophage P1, the HigA antitoxin from Escherichia coli CFT073, and YeeU of the YeeUWV systems from E. coli K12 and Shigella flexneri confirm this inference and reveal additional, unanticipated structural relationships. The growth-regulating Doc toxin exhibits structural similarity to secreted virulence factors that are toxic for eukaryotic target cells. The Phd antitoxin possesses the same fold as both the YefM and NE2111 antitoxins that inhibit structurally unrelated toxins. YeeU, which has an antitoxin-like activity that represses toxin expression, is structurally similar to the ribosome-interacting toxins YoeB and RelE. These observations suggest extensive functional exchanges have occurred between TA systems during bacterial evolution. © 2010 Elsevier Ltd.

News Article | January 20, 2016

The data set analysed in this paper was culled from that described in our previous paper analysing correlations between amino acid sequence and protein expression/solubility levels39. In brief, proteins were selected from a wide variety of source organisms based on structural uniqueness, meaning that no sequence with greater than 30% amino acid identity had an experimentally determined structure deposited into the Protein Data Bank at the time of selection. We restricted the data set compared to that used in our earlier paper to contain only non-redundant proteins encoded by genes that do not contain any codons affected by an alternative translation table in the source organism and that were expressed with a C-terminal LEHHHHHH tag. Homologous sequences were eliminated using an iterative procedure that reduced the level of amino acid sequence identity between any pair to less than 60%, which results in a lower level of nucleic acid sequence identity. At each step, all pairs of proteins sharing at least 60% identical amino acid sequence identity were transitively grouped together into a set, and the shortest sequence was eliminated from each set before reinitiating the same set-assignment procedure on all remaining proteins. The resulting data set included 6,348 genes from 171 organisms, as detailed in the cladogram in Extended Data Fig. 1 and Supplementary Data File 2. It contained 95 endogenous E. coli genes, including ycaQ that was examined in our follow-up biochemical experiments (Extended Data Fig. 6), and 6,253 genes from heterologous sources, including 47 from mammals, 809 from archaeabacteria, and the remainder from 151 different eubacterial organisms. The methods used in our large-scale protein expression experiments were described in detail previously38, 51, 52, and they are similar to those described below for evaluation of protein expression in vivo except that induction was performed in 0.5-ml cultures in 96-well plates. In brief, native genes for the 6,348 proteins were cloned with a C-terminal LEHHHHHH affinity tag under the control of the bacteriophage T7 promoter in pET21, a 5.4-kb pBR322-derived plasmid harbouring an ampicillin resistance marker38. Protein expression38 was induced overnight at 17 °C in E. coli strain BL21(DE3) growing in chemically defined medium containing glucose as a carbon source. The expression strain also contained pMGK (GenBank accession number KT203761), a 5.4-kb pACYC177-derived plasmid that harbours a kanamycin-resistant gene, a single copy of the lacI gene, and a single copy of the argU gene encoding the tRNA cognate to the rare AGA codon for Arg. As previously described, we scored the protein expression level from two transformants of the same plasmid on an integer scale from 0 (no expression) to 5 (highest expression), based on visual inspection of whole-cell lysates on Coomassie-blue-stained SDS–PAGE gels. There is an unmistakable difference between the 0 and 5 expression scores used for most of the analyses reported in this paper. A score of 5 indicates the target protein was the most abundant protein expressed in the cell, while a score of 0 indicates it was undetectable against the background of cellular proteins. The reproducibility of the integer scores in our large-scale data set was excellent, as analysed in detail previously39. There was no difference between all measurements for over 70% of the genes and a maximum difference of one unit between all measurements for over 80% of the genes. When replicates gave different scores, the maximum score was used, because most sources of experimental error tend to reduce expression score, and bell-weather analyses reported in our previously published paper39 showed a small increase in the significance of correlations when using maximum rather than mean score. Our binary multi-parameter logistic regression model gives θ, the logarithm of the ratio of the probabilities of obtaining the highest level of protein expression (P ) versus none (P ) from an mRNA sequence in the large-scale data set, as a linear function of generalized variables : The probability of obtaining the highest level (E = 5) versus no (E = 0) protein expression from a given sequence is therefore given by: Note that, to capture nonlinear relationships between mRNA sequence parameters and outcome, the generalized variables x can represent mathematical functions of mRNA sequence parameters as well as those parameters themselves. We used the R statistics program53 to compute the most probable values of the model parameters (A, β ). Logistic-regression slopes β  > 0 indicate that the probability of high expression increases as the associated variable increases in numerical value. (Note that, because ΔG increases in numerical value as folding stability decreases, a positive slope for free-energy terms indicates an increase in the probability of high expression as predicted folding stability decreases, while a negative slope for these terms indicates an increase in the probability of high expression as predicted folding stability increases.) Our final model, which we call model M (Extended Data Table 1a and Fig. 4), is given in the main text, and the codon slopes β from this model are depicted in Fig. 3a. In principle, the probability of high protein expression can be increased by manipulating mRNA sequence properties to maximize the value of θ and thus π in the equations above using the parameters (A, β ) from model M. Inclusion of parameters was guided by the likelihood ratio test in conjunction with the AIC54, a standard measure of whether an improvement in model quality exceeds that expected at random from increasing the number of degrees of freedom in the model. The likelihood ratio χ2 (LR χ2) is asymptotic to the χ2 distribution and defined as the reduction in the deviance D of the observed data from the predictions of the model compared to the null model containing just the constant term A (in the first equation above), while the AIC is given by the LR χ2 minus two times the number of degrees of freedom. The deviance is defined as: This sum is conducted over the n = 3,727 proteins giving expression scores of 5 or 0 among the 6,348 in the large-scale protein expression data set, and the logistic variable E assumes values of 1 or 0 if protein ‘j’ is expressed at the E = 5 or E = 0 levels, respectively. The variable π  = π(θ ) gives the predicted probability of obtaining expression of protein ‘j’ at the E = 5 rather than E = 0 level according to the equations given above describing the multi-parameter binary logistic model. For the data set analysed in this paper, the deviance has values of 5,154 and 3,952 for the null model and our final model M, respectively (Extended Data Table 1a). In addition to using the AIC, we ensured that the final model is not over-fit via bootstrapping with replacement 1,000 times using the RMS package55. This validation procedure is considered more robust than splitting the data set into training and test sets, which requires very careful selection of the test set. The sequence parameters explored in the course of model development (Extended Data Table 1 and additional data not shown) included the length of the gene, the individual codon frequencies in-frame or out-of-frame in the entire gene, the individual codon frequencies in-frame calculated separately in the head and the tail or in the first and second halves of the coding sequence, di-codon frequencies, the statistical entropy of the codon sequence, the codon and amino acid repetition rates (defined below), the frequencies of the nucleotide bases at each codon position in the entire gene and in defined windows within its sequence, and a variety of predicted mRNA-folding energy parameters including those shown in Fig. 1 and Extended Data Fig. 2, which were evaluated individually and as statistical aggregates. The codon repetition rate r and amino acid repetition rate r are defined as < d −1>, where is the distance at every position in the sequence to the next occurrence of the same species moving towards the 3′ end of the gene. The value of d −1 is set to zero if the codon or amino acid does not occur again, so the value of r for the protein sequence LRPRL is the average of (1/4, 1/2, 0, 0, 0), which is 0.15. The sequence of the C-terminal LEHHHHHH affinity tag was omitted from all computational analyses to avoid biasing statistics on its constituent amino acids and codons. Because this sequence is present in every gene included in our large-scale protein expression data set, it cannot directly influence outcome on its own and can only have an influence via differential interaction with other sequence features. No evidence of such interactions was detected in bell-weather analyses including the tag sequence, so it was omitted in the final analyses reported in this paper. The number of degrees of freedom for codon variables is one fewer than the number of non-stop codons because their frequencies f in a sequence must sum to 1 (that is, ). Therefore, for the analyses shown in Figs 3 and 4, we removed ATG, effectively constraining its slope to be zero (that is, β  = 0) and its contribution to the model to be absorbed into the constant A. The inclusion of mean codon-slope variables s and s in model M uniformly reduces the individual codon slopes β to ~86% of their values when no mean-slope terms are included in the model, reflecting the disproportionate influence of codons near the 5′ terminus compared to those in the rest of the gene (Extended Data Fig. 6). We tested expanded codons models including the next base or the previous base in addition to the in-frame codon, but these were rejected based on the AIC and bootstrap validation criteria described above. We also examined introducing additional variables into model M (Extended Data Table 1b and additional data not shown). Adding the mean value of the predicted free energy of mRNA folding in the tail does not significantly improve the model, even though unstable folding in the tail correlates with reduced protein expression (Fig. 1g, h). Therefore, this correlation as well as those of the overall A, T, G and C content in the gene (Extended Data Fig. 2a–e) are captured more effectively by the cross-correlated sequence parameters (Extended Data Figs 3 and 4) that are included in the model, suggesting that these other parameters are more influential mechanistically. Adding the mean slope of codons 2–6 does not produce a statistically significant improvement, and using this term instead of the base-composition terms in this region yields inferior results, consistent with the analyses shown in Extended Data Fig. 5. Finally, adding the frequency of the Shine–Dalgarno consensus AGGA in any frame (f in Extended Data Fig. 2i, j and Extended Data Table 1b) fails to produce a statistically significant improvement. We also used the Bindigo program ( to compute the binding energy of all hexamer sequences in a gene with the anti-Shine–Dalgarno sequence CACCUCCU, and neither the minimum nor the average value of the predicted free energy of hybridization to the anti-Shine–Dalgarno sequence has any correlation with protein expression level our large-scale data set (Extended Data Table 1b). In the 6AA method, codons for six amino acids were changed to the single codon specified in Extended Data Table 2, which has a larger slope than that of any synonymous codon in our single-parameter binary logistic regression analyses (dark grey symbols in Fig. 3a). Although no explicit free energy optimization was performed with the 6AA method, it produced genes in which the predicted free energies of mRNA folding were more favourable than those in the naturally occurring starting sequences. In the 31C-FO method, predicted mRNA-folding energy was optimized while selecting codons from the 31 listed in Extended Data Table 2, which have slopes greater than zero in our single-parameter binary logistic regression analyses (dark grey symbols in Fig. 3a). The predicted free energy of folding of the head plus 5′-UTR (ΔG ) was maximized numerically (that is, to yield the least stable folding), while the predicted free energy of the folding in the tail was optimized to be near −10 kcal mol−1 in windows of 48 nucleotides. The 31C-FD used the same set of codons to produce genes in which the predicted free energy of folding was minimized numerically (that is, to yield the most stable folding). The E. coli strain DH5α was used for cloning. Expression experiments used E. coli strain BL21(DE3) pMGK (ref. 38). Ampicillin was added at 100 μg ml−1 for cultures harbouring pET21-based plasmids. Kanamycin was added at 25 μg ml−1 to maintain the pMGK plasmid. Bacterial growth for protein expression and northern blot experiments employing pET21-based plasmids was performed using the same medium and conditions that were used to generate our high-throughput protein-expression data set38 (that is, MJ9 minimum medium56 with 250 r.p.m. agitation at 37 °C before induction at 17 °C). The pET-21 clones of the genes APE_0230.1 (Aeropyrum pernix K1), RSP_2139 from (Rhodobacter sphaeroides), SRU_1983 (Salinibacter ruber), SCO1897 (Streptomyces coelicolor) and ycaQ (E. coli) were obtained from the protein-production laboratory of the Northeast Structural Genomics Consortium ( at Rutgers University (NESG targets Xr92, RhR13, SrR141, RR162 and ER449, respectively). The DNAs encoding the 6AA and 31C-FO /31C-FO variants of the genes were synthesized by GenScript. The head variants 31C-FO and 31C-FO were generated by PCR amplification using long forward primers containing an NcoI restriction site, the new head sequence, and a sequence complementary to the downstream region in the target gene. A plasmid containing the starting construct was used as DNA template for PCR amplification using the corresponding long forward primers and a reverse primer hybridizing at the 3′ end of the target gene including the XhoI restriction site. The resulting PCR products were cloned using the In-Fusion kit (Clontech) into a pET-21 derivative linearized with NcoI and XhoI. The full protein-coding sequence in every plasmid was verified by DNA sequencing (Genewiz and Eton Bioscience) and corrected when necessary using the QuikChange II Site-Directed Mutagenesis kit (Agilent Technologies). The wild-type and 31C-FO /31C-FO (31C-FO / ) genes for SRU_1983, APE_0230.1 and E. coli YcaQ were re-cloned into a pBAD expression plasmid (Life Technologies) with a C-terminal hexa-histidine tag for transcription by the native E. coli RNA polymerase under control of an arabinose-inducible promoter; these experiments yielded similar results (Extended Data Fig. 6e, f) to those shown for the same genes under T7 polymerase control in a pET plasmid (Fig. 5 and Extended Data Fig. 6a–d). DNA sequences of the final constructs are provided in Supplementary Data File 3. Overnight cell growth was measured by transferring 200 μl of each induced culture to a 96-well sterile plate (Greiner Bio-One) and covering each well with 50 μl of sterile paraffin oil. A negative control non-induced sample was loaded for each wild-type target. Duplicate wells were measured for each sample. Plates were loaded into a platereader (Biotek Synergy) at room temperature and shaken for 30 s. An initial A   reading was taken and then followed by 30 min of shaking until the next absorbance reading. Readings were repeated at 30 min intervals during 9 h of cell growth. Starting cultures from a single colony were inoculated into 6 ml of LB media containing 100 μg ml−1 of ampicillin and 30 μg ml−1 kanamycin. Cultures were grown at 37 °C until highly turbid (4–6 h), then 40 μl was used to inoculate 2 ml of MJ9 chemically defined medium56. This MJ9 pre-culture was grown overnight at 37 °C. The next day, A readings were taken of a 1:10 dilution of the turbid MJ9 pre-culture. This reading was used to calculate the volume of pre-culture necessary to normalize all cell samples to a starting culture density of 0.1 A in 6 ml of fresh medium. The reinoculated culture was grown at 37 °C until A reached 0.5–0.7. Cells were then induced with 1 mM IPTG, with one duplicate tube for each wild-type gene not induced to serve as a negative control. After induction, 200 μl ×2 of each culture was removed and placed into a sterile 96-well plate to monitor cell growth rate (see above). The remaining 5.6 ml of induced samples were then transferred to 17 °C and shaken overnight. The next day, samples were removed from the shaker, placed on ice, and final A was measured. Cells were centrifuged in 14-ml round-bottom Falcon tubes at 5,300g for 10 min, and the pellets were resuspended in 1.2 ml of lysis buffer (30 mM NaCl, 10 mM 2-mercaptoethanol, 50 mM NaH PO , pH 8.0) and then transferred to 1.5 ml Eppendorf tubes on ice. Lysis was accomplished by sonication on ice, using a 40 V setting (~12 W pulse) and pulsing for 1 s followed by a 2 s rest, for a total of 40 pulses. Then 120 μl of each lysed culture was mixed with 40 μl of 4× Laemmli buffer, and samples were analyzed using SDS–PAGE (Bio-Rad, Ready Gel, 15% Tris-HCl), with Bio-Rad Precision Plus All Blue Standard markers. Final A measurements were used to calculate the load volume for each individual sample, normalizing all samples to the density of the least turbid of each unique target. We verified the integrity of the plasmids after growth and induction by DNA sequencing (Genewiz and Eton Bioscience). Every result was confirmed by repeating the experiment. Conducting experiments at physiological protein expression levels (Extended Data Fig. 6e, f) required considerable changes in methods compared to the experiments conducted in pET vectors that were used to generate our large-scale protein-expression data set and the data shown in Fig. 5 and Extended Data Figs 6a, b and 7. Because mRNA expression from IPTG-controlled promoters tends to occur in an all-or-none fashion60, 61, it is not practical to control the level of mRNA expressed from pET vectors. Therefore, we re-cloned three pairs of synonymous native and codon-optimized 31C-FO / genes with C-terminal hexahistidine tags under control of the arabinose-inducible promoter in a pBAD vector62, which provides a more gradual increase in expression as arabinose concentration is raised. This promoter drives transcription using the endogenous E. coli RNA polymerase rather than T7 RNA polymerase, which is employed by the pET vectors used for all other expression experiments reported in this paper. Because transcription from the arabinose promoter is repressed by glucose, which is the carbon source in the chemically defined MJ9 medium used for our pET experiments, we instead used LB as the growth medium for pBAD experiments, which were conducting in BL21 pMGK cells (that is, an isogenic E. coli strain except for the removal of the λ(DE3) prophage carrying the gene for T7 RNA polymerase). Furthermore, because the arabinose inducer can be depleted during long growth periods, we evaluated expression after relatively short 1–4 h induction times during log-phase growth rather than after overnight growth into stationary phase, which was used for our pET experiments. We also changed the growth temperature during induction from 17 °C for pET experiments to 37 °C for pBAD experiments. Non-induced controls were grown in medium containing 0.4% glucose (+Glc). When the A of the cultures reached 0.6, transcription of the target genes was induced for 1 h using final arabinose concentrations of 0.001% (w/v) for APE_0230.1 and 0.01% (w/v) for SRU_1983 and E. coli YcaQ (+Ara). The pET21 plasmids containing optimized or unoptimized inserts were digested with BlpI, phenol–chloroform purified, and concentrated by ethanol precipitation. From the digested samples, 2 μg was added to the RiboMax kit (Promega), and in vitro transcription with bacteriophage T7 RNA polymerase was conducted according to the manufacturer’s protocol. Upon completion of the reaction, samples were treated with DNase (Promega), isopropanol precipitated, and resuspended in RNA Storage Solution (Ambion). Transcript size and purity were verified by agarose gel electrophoresis with ethidium bromide staining. For kinetic analyses, 20-μl reactions with T7 polymerase were assembled and started by addition of 1 μg of template DNA. A 4.5-μl sample of each reaction was removed at 0-, 5-, 10- and 30-min time points for analysis on denaturing formaldehyde-agarose gels. Each experiment was conducted at least twice. In vitro translation assays of the purified mRNAs were performed with the PURExpress system (New England Biolabs) using l-[35S]methionine premium (PerkinElmer). Each 25-μl reaction contained 10 μl of solution A, 7.5 μl of solution B and 2 μl of [35S]methionine (10 μCi). The reactions were started by adding 2 μl of purified mRNA (4 μg  μl−1) and incubating at 37 °C. Aliquots of 5 μl were withdrawn from the reactions at 15, 30, 60 and 90 min, and translation was stopped by adding 10 μl of 2× Laemmli and heating for 2 min at 60 °C. Then 14 μl of each aliquot was run on a 4–20% SDS–PAGE gel (Bio-Rad) with Bio-Rad Precision Plus All Blue Standard markers. The gel was dried on Whatman filter paper and subjected to autoradiography. Each reaction was repeated at least twice. The probe was designed as the reverse complement of the 71-nucleotides of the 5′-UTR of the pET21 vector, and it was synthesized by Eurofins. The probe was labelled with biotin using the BrightStar Psoralen-Biotin Nonisotopic Labelling Kit. BL21(DE3) pMGK E. coli containing the plasmid of interest were grown overnight in LB at 37 °C with shaking. Cultures were diluted 1:50 into MJ9 media and grown overnight at 37 °C with shaking. The next day, the cultures were diluted to an A of 0.15 in MJ9 media and allowed to grow to an A of 0.6–0.7 before induction with 1 mM IPTG. Samples were taken at the indicated time points and RNAs were stabilized in two volumes of RNAProtect Bacteria Reagent. After pelleting, samples were lysozyme digested (15 mg ml−1) for 15 min, and RNAs were purified using the Direct-zol RNA Miniprep Kit and TRI-Reagent. Approximately 1–2 μg of total RNA per sample was separated on a 1.2% formaldahyde-agarose gel in MOPS-formaldahyde buffer. RNA integrity was verified by ethidium bromide staining. RNA was then transferred to a positively charged nylon membrane using downward capillary transfer with an alkaline transfer buffer (1 M NaCl, 10 mM NaOH, pH 9) for 2 h at room temperature. RNAs were crosslinked to the membrane using 1,200 μJ ultraviolet irradiation (Stratalinker). Membranes were pre-hybridized in Ultrahyb hybridization buffer for 1 h at 42 °C in a hybridization oven. Heat-denatured, biotin-labelled probe was then added to 10–20 pM final concentration and hybridized overnight at 42 °C. Membranes were washed twice in buffer (0.2× SSC, 0.5% SDS), and probe signal was detected using the BrightStar BioDetect kit, as per protocol, via exposure to film. Each northern blot experiment was repeated at least twice. E. coli MG1655 cells were cultured in M9 0.4% glucose minimum media to a final A of 1.0. Cells were treated with RNA Protect Bacteria Reagent (Qiagen), and RNA extracted using the RNeasy Mini Kit (Qiagen) was reverse-transcribed using SuperScript II Reverse Transcriptase (Invitrogen) followed by treatment with RNaseH (Invitrogen) and RNaseA (EpiCentre). The resulting cDNA preparation was purified using the MinElute Purification Kit (Qiagen) and then fragmented into 50–200-bp fragments using DNaseI (EpiCentre). Biotinylation was performed with Terminal Deoxynucleotidyl Transferase (New England Biolabs) and Biotin-N6-ddATP (Enzo Life Sciences). Biotinylated cDNA was hybridized on Affymetrix E. coli 2.0 arrays by the Gene Expression Center at the University of Wisconsin Biotechnology Center. Raw data (.cel) files were analysed using the RMA (Robust Multi-chip Average) algorithm in the Affymetrix Expression Console. All predicted proteins in the version of the genome in the Ecocyc database57 were analysed using the programs LipoP58 and TMHMM59, and those without a predicted transmembrane helix or a predicted signal peptide were classified as cytoplasmic proteins and included in the analyses in Fig. 6. We analysed the data sets published previously44 in which RNA-seq was used to quantify global mRNA levels as a function of time after treatment of either exponential or early stationary phase cultures with the transcription-initiation inhibitor rifampicin. To avoid potential complications arising from the encoding of multiple proteins in polycistronic transcripts, we limited our analyses to monocistronic transcripts, which constituted 76% and 82% of the mRNAs for which lifetimes were measured in exponential and stationary phase, respectively. The analyses presented in Fig. 6c, d were also limited to predicted cytoplasmic proteins to avoid possible biases from systematically lower expression of integral membrane proteins and secreted proteins. The set of genes for which Chen et al.44 were able to measure lifetime is strongly biased towards more abundant mRNAs, and the measured lifetimes in both the exponential and stationary phase data sets are also strongly correlated with steady-state concentrations (data not shown).

Feldmann E.A.,Miami University Ohio | Seetharaman J.,Northeast Structural Genomics Consortium | Seetharaman J.,Columbia University | Ramelot T.A.,Miami University Ohio | And 19 more authors.
Journal of Structural and Functional Genomics | Year: 2012

The protein Pspto-3016 is a 117-residue member of the protein domain family PF04237 (DUF419), which is to date a functionally uncharacterized family of proteins. In this report, we describe the structure of Pspto-3016 from Pseudomonas syringae solved by both solution NMR and X-ray crystallography at 2.5 Å resolution. In both cases, the structure of Pspto-3016 adopts a "double wing" α/β sandwich fold similar to that of protein YjbR from Escherichia coli and to the C-terminal DNA binding domain of the MotA transcription factor (MotCF) from T4 bacteriophage, along with other uncharacterized proteins. Pspto-3016 was selected by the Protein Structure Initiative of the National Institutes of Health and the Northeast Structural Genomics Consortium (NESG ID PsR293). © 2012 Springer Science+Business Media B.V.

Aramini J.M.,Rutgers University | Tubbs J.L.,Scripps Research Institute | Kanugula S.,Scripps Research Institute | Rossi P.,Rutgers University | And 17 more authors.
Journal of Biological Chemistry | Year: 2010

Alkyltransferase-like proteins (ATLs) are a novel class of DNA repair proteins related to O6-alkylguanine-DNA alkyltransferases (AGTs) that tightly bind alkylated DNA and shunt the damaged DNA into the nucleotide excision repair pathway. Here, we present the first structure of a bacterial ATL, from Vibrio parahaemolyticus (vpAtl). We demonstrate that vpAtl adopts an AGT-like fold and that the protein is capable of tightly binding to O 6-methylguanine-containing DNA and disrupting its repair by human AGT, a hallmark of ATLs. Mutation of highly conserved residues Tyr23 and Arg37 demonstrate their critical roles in a conserved mechanism of ATL binding to alkylated DNA. NMR relaxation data reveal a role for conformational plasticity in the guanine-lesion recognition cavity. Our results provide further evidence for the conserved role of ATLs in this primordial mechanism of DNA repair.

Zhao L.,Johnson University | Zhao K.Q.,Promega Corporation | Hurst R.,Promega Corporation | Slater M.R.,Promega Corporation | And 10 more authors.
Journal of Structural and Functional Genomics | Year: 2010

Wheat germ cell-free methods provide an important approach for the production of eukaryotic proteins. We have developed a protein expression vector for the TNT® SP6 High-Yield Wheat Germ Cell-Free (TNT WGCF) expression system (Promega) that is also compatible with our T7-based Escherichia coli intracellular expression vector pET15-NESG. This allows cloning of the same PCR product into either one of several pET-NESG vectors and this modified WGCF vector (pWGHisAmp) by In-Fusion LIC cloning (Zhu et al. in Biotechniques 43:354-359, 2007). Integration of these two vector systems allowed us to explore the efficacy of the TNT WGCF system by comparing the expression and solubility characteristics of 59 human protein constructs in both WGCF and pET15-NESG E. coli intracellular expression. While only 30% of these human proteins could be produced in soluble form using the pET15-NESG based system, some 70% could be produced in soluble form using the TNT WGCF system. This high success rate underscores the importance of eukaryotic expression host systems like the TNT WGCF system for eukaryotic protein production in a structural genomics sample production pipeline. To further demonstrate the value of this WGCF system in producing protein suitable for structural studies, we scaled up, purified, and analyzed by 2D NMR two 15N-, 13C-enriched human proteins. The results of this study indicate that the TNT WGCF system is a successful salvage pathway for producing samples of difficult-to-express small human proteins for NMR studies, providing an important complementary pathway for eukaryotic sample production in the NESG NMR structure production pipeline. © 2010 The Author(s).

Loading Northeast Structural Genomics Consortium collaborators
Loading Northeast Structural Genomics Consortium collaborators