News Article | May 16, 2017
A study published now on Nature Communications* shows that breast cancer cells undergo a stiffening state prior to acquiring malignant features and becoming invasive. The discovery made by a research team led by Florence Janody, from Instituto Gulbenkian de Ciencia (IGC; Portugal), identifies a new signal in tumor cells that can be further explored when designing cancer-targeting therapies. The progression of breast cancer disease takes several stages, from a benign lesion to an invasive carcinoma, possibly with metastasis. But actually, only 20 to 50% of benign tumors end up as invasive cancer. Predicting what lesions are within this group could result in a better use of therapeutics accordingly to the severity of the disease. Florence Janody's group has been looking for signals inside the cells that could help predicting benign tumors that will progress to invasive carcinoma. Their attention focus on the cell skeleton - the cytoskeleton --, an intricate network of fibers that can either exert or resist forces, and that may have an impact on tumor invasion and malignancy. These fibers can be organized into distinct architectures to confer cells a more rigid or soft structure. "Previously, it had been shown that cancer cell invasion requires cell softening. What we observed now is that prior to becoming invasive cells undergo a transient stiffening state caused by the accumulation of cytoskeleton fibers ", explains Sandra Tavares, first author of this study. The research team discovered that cell stiffening induces the activity of proteins that promote cell proliferation, driving the growth of benign tumors. Most importantly, this cell rigidity state also triggers the subsequent progression into invasive cancer. The proteins involved in this mechanism were identified by studies on a human breast cell line, which recapitulates the multistep development of breast cancer, and biopsies of breast cancers. The importance of these proteins for the formation of tumors was further confirmed in the fruit fly. Florence Janody says: "Our work adds an important piece to the intricate puzzle of breast tumor progression. Knowing what happens inside the cell before a cell becomes pre-invasive and acquires malignant features may help us predict, in the future, which tumors may result in metastasis. Also, it may help designing therapeutics better tailored for each type of lesion." This study was conducted at the IGC in collaboration with the Instituto de Investigacao e Inovacao em Saude (i3S), Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP), Faculty of Medicine of the University of Porto (Portugal), Biotechnology Center from the Technische Universitat Dresden (Germany), and Ophiomics - Precision Medicine (Portugal). Fundacao para a Ciencia e a Tecnologia, Liga Portuguesa contra o Cancro/Pfizer and Laco Grant in Breast Cancer 2015 funded this research. *Tavares, S., Vieira, A.F., Taubenberger, A.V., Araujo, M., Martins, N.P.S., Bras-Pereira, C., Polonia, A., Herbig, M., Barreto, C., Otto, O., Cardoso, J., Pereira-Leal, J.B., Guck, J., Paredes, J., Janody, F. (2017) Actin stress fiber organization promotes cell stiffening and proliferation of pre-invasive breast cancer cells. Nature Communications. DOI: 10.1038/NCOMMS15237
Mellbin L.G.,Karolinska Institutet |
Ryden L.,Karolinska Institutet |
Brismar K.,Karolinska Institutet |
Morgenthaler N.G.,Biotechnology Center |
And 2 more authors.
Diabetes Care | Year: 2010
OBJECTIVE - To determine whether C-terminal provasopressin (copeptin) explains the prognostic importance of insulin growth factor binding protein-1 (IGFBP-1) in patients with myocardial infarction and type 2 diabetes. RESEARCH DESIGN AND METHODS - Copeptin and IGFBP-1 were analyzed in 393 patients participating in the Diabetes Mellitus Insulin-Glucose Infusion in Acute Myocardial Infarction (DIGAMI) 2 trial. RESULTS - Copeptin was associated with IGFBP-1 (Spearman rank correlation test, r = 0.53; P < 0.001). During follow-up there were 138 cardiovascular events (cardiovascular death, myocardial infarction, and stroke). In univariate Cox proportional hazard regression analyses both biomarkers were predictors of events: the hazard ratio for log copeptin was 1.59 (95% CI 1.41-1.81; P < 0.001) and for log IGFBP-1 was 1.49 (1.26 -1.77; P < 0.001). In the final model, adjusting for age and renal function, copeptin was the only independent predictor (1.35 [1.16-1.57]; P < 0.001). CONCLUSIONS - Copeptin is an independent predictor of cardiovascular events and appears to at least partly explain the prognostic impact of IGFBP-1 in patients with type 2 diabetes and myocardial infarction. Copeptin may be a pathogenic factor to address to improve outcome in these patients. © 2010 by the American Diabetes Association.
News Article | October 28, 2016
The Pennsylvania Biotechnology Center of Bucks County is pleased to announce that it has been awarded a $4.7 million federal grant and expects to secure the needed matching funds that will clear the way for a desperately needed $10.6 million expansion. The grant from the U.S. Economic Development Administration was the key to funding the expansion plan at the biotech incubator just outside of Doylestown, which has been in the planning stage for years. The expansion will add 38,000 square feet of space, including at least 40 offices, 15 laboratories, 80 freezer spaces, a new auditorium and a new cafeteria. It also will add more than 100 parking spaces to the parking lot outside the building on Old Easton Road, which frequently is filled to capacity early each workday. Supported by local, county and state officials, as well as the entrepreneurs and business leaders at the Biotech Center, the expansion will enable the center to admit new tenants that have lingered on a waiting list for space at the fully occupied center. "I am delighted that the Pennsylvania Biotechnology Center will begin its much-needed expansion,” said Pennsylvania State Sen. Chuck McIlhinney, a longtime supporter of the center. “They have taken that almost-empty warehouse and turned it into a facility with more than 300 scientists and entrepreneurs. That they need more space says a lot about their success and this community.” A study released in March found that the PA Biotechnology Center has created 727 jobs and spurred more than $1.8 billion in economic impact in Bucks County and throughout Pennsylvania over the past three years. Chief Operating Officer Lou Kassa said the expansion will boost that impact even further. “The PA Biotechnology Center continues to fulfill its mission, which is to bring knowledge-economy jobs to the heart of Bucks County,” Kassa said. “This expansion enables us to act as a catalyst for even more success by welcoming additional biotech companies to the center, thereby creating jobs, fostering entrepreneurship, and facilitating discoveries that may well change the world.” The PA Biotechnology Center employs an unusual model in which an anchor, nonprofit organization – in this case, the Baruch S. Blumberg Institute of the Hepatitis B Foundation, dedicated to drug and diagnostics discovery for liver disease – actively spins out and attracts new companies and innovative scientists. To date, that model has resulted in 325 jobs directly associated with the center, 237 additional indirect jobs in Bucks County and 165 jobs elsewhere in Pennsylvania, the economic impact survey found. Additionally, the companies located at the center are collectively valued at more than $1.2 billion. “For too long, we’ve had to say ‘no vacancy’ to world-class scientists and entrepreneurs who wanted to join us and be part of the collaborative culture that we’ve built here at the PA Biotechnology Center,” said Dr. Timothy Block, President and Co-founder of the biotech center and president of the Hepatitis B Foundation and the Baruch S. Blumberg Institute. “Starting today, we can tell them ‘Yes, we will have a place for you soon.’ This is the start of something incredible.” Managed by the Baruch S. Blumberg Institute and led by a board appointed by the Hepatitis B Foundation, the nonprofit biotech center opened its doors in 2006. Launching an expansion as it marks its 10th birthday is an appropriate way to celebrate, said Pennsylvania State Rep. Marguerite Quinn. “When we talk about ways that government and the private sector can work together to attract these technology-driven jobs, we point to the PA Biotechnology Center of Bucks County as a prime example,” said Quinn, an early supporter of the center. “The center continues to create the 21st-century jobs that Pennsylvania needs in order to be competitive and attract the top-notch talent that will enable us to thrive in today’s economy.” About the Pennsylvania Biotechnology Center: The Pennsylvania Biotechnology Center of Bucks County offers state-of-the-art laboratory and office space to nonprofit research companies and biotech companies. Managed by the Baruch S. Blumberg Institute and led by a board appointed by the Hepatitis B Foundation, the Center was funded in part by the Commonwealth of Pennsylvania. The facility opened in 2006 in a formerly abandoned warehouse and has since grown to encompass 110,000 square feet on a 10-acre campus. To learn more, visit http://www.pabiotechbc.org. About the Hepatitis B Foundation: The Hepatitis B Foundation is the nation’s leading nonprofit organization solely dedicated to finding a cure for hepatitis B and improving the quality of life for those affected worldwide through research, education and patient advocacy. To learn more, go to http://www.hepb.org, read our blog at http://www.hepb.org/blog/, follow us on Twitter @HepBFoundation, find us on Facebook at http://www.facebook.com/hepbfoundation or call 215-489-4900. About the Baruch S. Blumberg Institute: The Baruch S. Blumberg Institute is an independent, nonprofit research institute established in 2003 by the Hepatitis B Foundation to conduct discovery research and nurture translational biotechnology in an environment conducive to interaction, collaboration and focus. It was renamed in 2013 to honor Baruch S. Blumberg, who won the Nobel Prize for his discovery of the hepatitis B virus and co-founded the Hepatitis B Foundation. To learn more, visit http://www.blumberginstitute.org.
News Article | April 6, 2016
The studies can shed some light on the perennial question of how life arose, but they also have slightly more practical applications in the search for life in space, said senior author Eric Roden, a professor of geoscience at UW–Madison. Animals use oxygen and "reduce" it to produce water, but some bacteria use iron that is deficient in electrons, reducing it to a more electron-rich form of the element. Ironically, electron-rich forms of iron can also supply electrons in the opposite "oxidation" reaction, in which the bacteria literally "eat" the iron to get energy. Iron is the fourth-most abundant element on the planet, and because free oxygen is scarce underwater and underground, bacteria have "thought up," or evolved, a different solution: moving electrons to iron while metabolizing organic matter. These bacteria "eat organic matter like we do," says Roden. "We pass electrons from organic matter to oxygen. Some of these bacteria use iron oxide as their electron acceptor. On the flip side, some other microbes receive electrons donated by other iron compounds. In both cases, the electron transfer is essential to their energy cycles." Whether the reaction is oxidation or reduction, the ability to move an electron is essential for the bacteria to process energy to power its lifestyle. Roden has spent decades studying iron-metabolizing bacteria. "I focus on the activities and chemical processing of microorganisms in natural systems," he says. "We collect material from the environment, bring it back to the lab, and study the metabolism through a series of geochemical and microbiological measurements." The current studies focus on bacteria samples from Chocolate Pot hot spring, a relatively cool geothermal spring in Yellowstone National Park that is named for the dark, reddish-brown color of ferric oxide. Related studies deal with a culture obtained from a much less auspicious environment—a ditch in Germany. Both studies are online, in Applied and Environmental Microbiology and in Geobiology. During the studies, Roden and doctoral student Nathan Fortney and research scientist Shaomei He explored how the cultured organisms changed the oxidation state—the number of electrons—in the iron compounds. They also used an advanced genome-sequencing instrument at the UW–Madison Biotechnology Center to identify strings of DNA in the genomes. "More than 99 percent of microbial diversity cannot be obtained in pure culture," says He, meaning they cannot be grown as a single strain for analysis. "Instead of going through the long, laborious and often unsuccessful process of isolating strains, we apply genomic tools to understand how the organisms were doing what they were doing in mixed communities." The researchers found some unknown bacteria capable of iron metabolism, and also got genetic data on a unique capacity that some of them have: the ability to transport electrons in both directions across the cell's outer membrane. "Bacteria have not only evolved a metabolism that opens niches to use iron as an energy," says He, "but these new electron transport mechanisms give them a way to use forms of iron that can't be brought inside the cell." "These are fundamental studies, but these chemical transformations are at the heart of all kinds of environmental systems, related to soil, sediment, groundwater and waste water," says Roden. "For example, the Department of Energy is interested in finding a way to derive energy from organic matter through the activity of iron-metabolizing bacteria." These bacteria are also critical to the life-giving process of weathering rocks into soil. Iron-metabolizing bacteria have been known for a century, Roden says, and were actually discovered in Madison-area groundwater. "Geologists saw organisms that formed these unique structures that were visible under the light microscope. They formed stalks or sheaths, and it turned out they were used to move iron." Roden and He are geobiologists, interested in how microbes affect geology, but the significance of microbes in Earth's evolution is only now being fully appreciated, Roden says. "Eyebrows rose when we contacted the Biotech Center three or four year ago to discuss sequencing: 'Who are these people from geology, and what are they talking about?' But we stuck with it, and it's turned into a pretty cool collaboration that has allowed us to apply their excellent tools that are more typically applied to biomedical and related microbial issues." Some of the iron-metabolizing bacteria appear quite early on the tree of life, making the studies relevant to discovering the origins of life, but the findings also have implications in the search for life in space, Roden says. "Our support comes from NASA's astrobiology institute at UW–Madison. It's possible that on a rocky planet like Mars, life could rely on iron metabolism instead of oxygen. "A fundamental approach in astrobiology is to use terrestrial sites as analogs, where we look for insight into the possibilities on other worlds," Roden continues. "Some people believe that use of iron oxide as an electron acceptor could have been the first, or one of the first, forms of respiration on Earth. And there's so much iron around on the rocky planets."
News Article | August 26, 2016
Capture of a vesicle by an endosome by the tethering factor EEA1 binding Rab5. Active Rab5 (shiny blue particles) induces a change in flexibility of EEA1 (green filaments) generating an entropic collapse force that pulls the vesicle toward the target membrane to dock and fuse. Credit: Mario Avellaneda In order for cells to function properly, cargo needs to be constantly transported from one point to another within the cell, like on a goods station. This cargo is located in or on intracellular membranes, called vesicles. These membranes have a signature, and only those with the correct signature may fuse with the membrane of another organelle into one compartment. The membrane itself must be recognized by a target membrane, which employs long tethering proteins to find its match. David Murray and Marcus Jahnel from the labs of Marino Zerial at the Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) and Stephan Grill at the Biotechnology Center of the TU Dresden were curious to find out how these large tether proteins are able to recognize the signature of a membrane compartment and pull it in in order for the small fusion proteins to engage. They and their colleagues discovered that when the vesicle docks by an active protein called Rab5, GTPase, this protein is sending a message along the rigid tether protein to become flexible. This change in flexibility results in a force that starts the vesicle's trip towards the target membrane to initiate docking and fusion. This newly found mechanism is published in the journal Nature and intuitively explains how traffic within the cell can be efficient and selective, and resolves a paradox of sizes. Explore further: Membrane fusion a mystery no more More information: David H. Murray et al, An endosomal tether undergoes an entropic collapse to bring vesicles together, Nature (2016). DOI: 10.1038/nature19326
News Article | March 28, 2016
Kent Bradford, left, and Alfred Huo, seen here with a flowering lettuce plant, found that lettuce could be prevented from flowering by increasing the expression of a specific microRNA in the plants. The high levels of this microRNA prevent the plant from transitioning to adulthood and flowering, and the plant continues to make numerous baby leaves rather than forming a compact head of lettuce. Credit: Gregory Urquiaga/UC Davis Like most annuals, lettuce plants live out their lives in quiet, three-act dramas that follow the seasons. Seed dormancy gives way to germination; the young plant emerges and grows; and finally in the climax of flowering, a new generation of seeds is produced. It's remarkably predictable, but the genetics that coordinates these changes with environmental cues has not been well understood. In a recent study of lettuce and the model plant Arabidopsis, researchers at the UC Davis Seed Biotechnology Center and in China show for the first time that a gene known to direct the depth of seed dormancy and the timing of germination also influences flowering. The study further suggests that the gene does this by influencing production of certain microRNAs—tiny snippets of genetic material that govern transition from one phase of the plant's life cycle to another. The findings, which have important implications for the $1.9 billion annual U.S. lettuce crop, will be reported during the week of March 28 in the Proceedings of the National Academy of Sciences. "It appears that the 'Delay of Germination 1,' or DOG1, gene is an environmental sensor, detecting environmental changes and enabling the plant to not only keep the seed dormant but to also delay flowering," said study co-author Kent Bradford, a plant scientist and director of the Seed Biotechnology Center. "This gene could be a particularly valuable tool as climate change shifts our growing seasons and we are forced to develop plants that can adapt to those environmental changes," Bradford said. Annual flowering plants match their life cycles—especially seed germination and flowering—to the appropriate season. If a flowering plant germinates too early, the seedling might appear before temperatures are warm enough for the plant to survive. Such coordination of life cycles and environmental conditions is equally important for cultivated crops. An entire lettuce crop can be lost if the plants respond to early warm temperatures and "bolt," producing flowers and seeds before marketable heads of lettuce have formed. In the new study, the researchers found that suppressing the DOG1 gene in lettuce or Arabidopsis decreased the levels of one microRNA and increased levels of another. As a result, seeds germinated at higher temperatures, and the plants flowered earlier than normal. "This provides evidence of a molecular genetic mechanism that is at work, coordinating adaptation of seed dormancy and flowering traits in the plants to accommodate environmental conditions," said study co-author Heqiang "Alfred" Huo, a postdoctoral researcher in the Bradford lab. "Our results also suggest that the period between seed dormancy and seed germination is a distinct phase in the plant's lifecycle and that this phase appears to be influenced by the same microRNA systems that govern the plant's maturation and flowering stages," Huo said. More information: DELAY OF GERMINATION1 (DOG1) regulates both seed dormancy and flowering time through microRNA pathways, PNAS, www.pnas.org/cgi/doi/10.1073/pnas.1600558113
News Article | April 6, 2016
The studies can shed some light on the perennial question of how life arose, but they also have slightly more practical applications in the search for life in space, says senior author Eric Roden, a professor of geoscience at UW–Madison. Animals use oxygen and "reduce" it to produce water, but some bacteria use iron that is deficient in electrons, reducing it to a more electron-rich form of the element. Ironically, electron-rich forms of iron can also supply electrons in the opposite "oxidation" reaction, in which the bacteria literally "eat" the iron to get energy. Iron is the fourth-most abundant element on the planet, and because free oxygen is scarce underwater and underground, bacteria have "thought up," or evolved, a different solution: moving electrons to iron while metabolizing organic matter. These bacteria "eat organic matter like we do," says Roden. "We pass electrons from organic matter to oxygen. Some of these bacteria use iron oxide as their electron acceptor. On the flip side, some other microbes receive electrons donated by other iron compounds. In both cases, the electron transfer is essential to their energy cycles." Whether the reaction is oxidation or reduction, the ability to move an electron is essential for the bacteria to process energy to power its lifestyle. Roden has spent decades studying iron-metabolizing bacteria. "I focus on the activities and chemical processing of microorganisms in natural systems," he says. "We collect material from the environment, bring it back to the lab, and study the metabolism through a series of geochemical and microbiological measurements." The current studies focus on bacteria samples from Chocolate Pot hot spring, a relatively cool geothermal spring in Yellowstone National Park that is named for the dark, reddish-brown color of ferric oxide. Related studies deal with a culture obtained from a much less auspicious environment—a ditch in Germany. Both studies are online, in Applied and Environmental Microbiology and in Geobiology. During the studies, Roden and doctoral student Nathan Fortney and research scientist Shaomei He explored how the cultured organisms changed the oxidation state—the number of electrons—in the iron compounds. They also used an advanced genome-sequencing instrument at the UW–Madison Biotechnology Center to identify strings of DNA in the genomes. "More than 99 percent of microbial diversity cannot be obtained in pure culture," says He, meaning they cannot be grown as a single strain for analysis. "Instead of going through the long, laborious and often unsuccessful process of isolating strains, we apply genomic tools to understand how the organisms were doing what they were doing in mixed communities." The researchers found some unknown bacteria capable of iron metabolism, and also got genetic data on a unique capacity that some of them have: the ability to transport electrons in both directions across the cell's outer membrane. "Bacteria have not only evolved a metabolism that opens niches to use iron as an energy," says He, "but these new electron transport mechanisms give them a way to use forms of iron that can't be brought inside the cell." "These are fundamental studies, but these chemical transformations are at the heart of all kinds of environmental systems, related to soil, sediment, groundwater and waste water," says Roden. "For example, the Department of Energy is interested in finding a way to derive energy from organic matter through the activity of iron-metabolizing bacteria." These bacteria are also critical to the life-giving process of weathering rocks into soil. Iron-metabolizing bacteria have been known for a century, Roden says, and were actually discovered in Madison-area groundwater. "Geologists saw organisms that formed these unique structures that were visible under the light microscope. They formed stalks or sheaths, and it turned out they were used to move iron." Roden and He are geobiologists, interested in how microbes affect geology, but the significance of microbes in Earth's evolution is only now being fully appreciated, Roden says. "Eyebrows rose when we contacted the Biotech Center three or four year ago to discuss sequencing: 'Who are these people from geology, and what are they talking about?' But we stuck with it, and it's turned into a pretty cool collaboration that has allowed us to apply their excellent tools that are more typically applied to biomedical and related microbial issues." Some of the iron-metabolizing bacteria appear quite early on the tree of life, making the studies relevant to discovering the origins of life, but the findings also have implications in the search for life in space, Roden says. "Our support comes from NASA's astrobiology institute at UW–Madison. It's possible that on a rocky planet like Mars, life could rely on iron metabolism instead of oxygen. "A fundamental approach in astrobiology is to use terrestrial sites as analogs, where we look for insight into the possibilities on other worlds," Roden continues. "Some people believe that use of iron oxide as an electron acceptor could have been the first, or one of the first, forms of respiration on Earth. And there's so much iron around on the rocky planets." Explore further: Organic solids in soil may speed up bacterial breathing More information: N. W. Fortney et al. Microbial Fe(III) oxide reduction potential in Chocolate Pots hot spring, Yellowstone National Park, Geobiology (2016). DOI: 10.1111/gbi.12173
News Article | December 20, 2016
COLUMBUS, Ohio--(BUSINESS WIRE)--BioOhio is excited to announce its 30th Anniversary with a celebration to be held in Cleveland where the organization was founded in 1987. The 2017 BioOhio Annual Conference and 30th Anniversary Celebration will be a celebration of statewide growth in all areas of bioscience, from medical devices, pharmaceuticals, lab equipment, and regenerative medicine, to agricultural compounds, alternative fuels, wellness products, clinical research, digital health, and wearables. The venue, the Great Lakes Science Center, was established to make science, technology, engineering and math come alive; and during the Conference, it will come alive with excitement and success. Attendees will experience different parts of the Center as breakfast, networking breaks, lunch, and a reception move around this unique facility. When: April 23-24, 2017. An opening reception will be held April 23, with the main conference being held on April 24. Join us for a fun and entertaining reception in Cleveland, Ohio. Details to be announced. A full day of engaging panels and excellent networking, set at the fascinating Great Lakes Science Center, will begin with breakfast at 7:30 and conclude with a networking reception at 5:00. Patient Perspectives - The event will include multiple opportunities to hear first-hand how bioscience has positively impacted individuals' lives. Ohio Bioscience Timeline - From 1803 to today, including the Edison Biotechnology Center (EBTC), OMERIS, and BioOhio. Join us for an insightful trip through Ohio's bioscience history, from how we got started to where we are now. Ohio Success Stories - Several fireside chat style discussions will be held to explore success and lessons learned in Ohio's bioscience community. Licensing and Start-up Opportunities - BioOhio Founding and Leadership Members will present exciting opportunities and discoveries from their institutions. Startup Pitch Competition - Six companies chosen from Ohio's six Entrepreneurial Signature Programs will have five minutes to pitch their product or service, followed by five minutes of questions from a panel of Ohio experts. Keynote Futurist Speaker - The day started with how we began, now hear where we are headed... Visit BioOhio.com/ac to learn more and register today! There are a number of sponsorship opportunities available for this event to accommodate any budget, ranging from 30th Anniversary memorabilia, to a photo booth, to the conference mobile app. All sponsorship levels offer a broad range of marketing exposure for your organization. Support BioOhio's Annual Conference and 30th Anniversary Celebration by contacting Jen Goldsberry at email@example.com or (614) 675-3686 x 1004. Come join us to celebrate what Ohio's bioscience community has become, engage with friends, make new connections, and set our sights on where the industry is headed.
News Article | January 20, 2016
The data set analysed in this paper was culled from that described in our previous paper analysing correlations between amino acid sequence and protein expression/solubility levels39. In brief, proteins were selected from a wide variety of source organisms based on structural uniqueness, meaning that no sequence with greater than 30% amino acid identity had an experimentally determined structure deposited into the Protein Data Bank at the time of selection. We restricted the data set compared to that used in our earlier paper to contain only non-redundant proteins encoded by genes that do not contain any codons affected by an alternative translation table in the source organism and that were expressed with a C-terminal LEHHHHHH tag. Homologous sequences were eliminated using an iterative procedure that reduced the level of amino acid sequence identity between any pair to less than 60%, which results in a lower level of nucleic acid sequence identity. At each step, all pairs of proteins sharing at least 60% identical amino acid sequence identity were transitively grouped together into a set, and the shortest sequence was eliminated from each set before reinitiating the same set-assignment procedure on all remaining proteins. The resulting data set included 6,348 genes from 171 organisms, as detailed in the cladogram in Extended Data Fig. 1 and Supplementary Data File 2. It contained 95 endogenous E. coli genes, including ycaQ that was examined in our follow-up biochemical experiments (Extended Data Fig. 6), and 6,253 genes from heterologous sources, including 47 from mammals, 809 from archaeabacteria, and the remainder from 151 different eubacterial organisms. The methods used in our large-scale protein expression experiments were described in detail previously38, 51, 52, and they are similar to those described below for evaluation of protein expression in vivo except that induction was performed in 0.5-ml cultures in 96-well plates. In brief, native genes for the 6,348 proteins were cloned with a C-terminal LEHHHHHH affinity tag under the control of the bacteriophage T7 promoter in pET21, a 5.4-kb pBR322-derived plasmid harbouring an ampicillin resistance marker38. Protein expression38 was induced overnight at 17 °C in E. coli strain BL21(DE3) growing in chemically defined medium containing glucose as a carbon source. The expression strain also contained pMGK (GenBank accession number KT203761), a 5.4-kb pACYC177-derived plasmid that harbours a kanamycin-resistant gene, a single copy of the lacI gene, and a single copy of the argU gene encoding the tRNA cognate to the rare AGA codon for Arg. As previously described, we scored the protein expression level from two transformants of the same plasmid on an integer scale from 0 (no expression) to 5 (highest expression), based on visual inspection of whole-cell lysates on Coomassie-blue-stained SDS–PAGE gels. There is an unmistakable difference between the 0 and 5 expression scores used for most of the analyses reported in this paper. A score of 5 indicates the target protein was the most abundant protein expressed in the cell, while a score of 0 indicates it was undetectable against the background of cellular proteins. The reproducibility of the integer scores in our large-scale data set was excellent, as analysed in detail previously39. There was no difference between all measurements for over 70% of the genes and a maximum difference of one unit between all measurements for over 80% of the genes. When replicates gave different scores, the maximum score was used, because most sources of experimental error tend to reduce expression score, and bell-weather analyses reported in our previously published paper39 showed a small increase in the significance of correlations when using maximum rather than mean score. Our binary multi-parameter logistic regression model gives θ, the logarithm of the ratio of the probabilities of obtaining the highest level of protein expression (P ) versus none (P ) from an mRNA sequence in the large-scale data set, as a linear function of generalized variables : The probability of obtaining the highest level (E = 5) versus no (E = 0) protein expression from a given sequence is therefore given by: Note that, to capture nonlinear relationships between mRNA sequence parameters and outcome, the generalized variables x can represent mathematical functions of mRNA sequence parameters as well as those parameters themselves. We used the R statistics program53 to compute the most probable values of the model parameters (A, β ). Logistic-regression slopes β > 0 indicate that the probability of high expression increases as the associated variable increases in numerical value. (Note that, because ΔG increases in numerical value as folding stability decreases, a positive slope for free-energy terms indicates an increase in the probability of high expression as predicted folding stability decreases, while a negative slope for these terms indicates an increase in the probability of high expression as predicted folding stability increases.) Our final model, which we call model M (Extended Data Table 1a and Fig. 4), is given in the main text, and the codon slopes β from this model are depicted in Fig. 3a. In principle, the probability of high protein expression can be increased by manipulating mRNA sequence properties to maximize the value of θ and thus π in the equations above using the parameters (A, β ) from model M. Inclusion of parameters was guided by the likelihood ratio test in conjunction with the AIC54, a standard measure of whether an improvement in model quality exceeds that expected at random from increasing the number of degrees of freedom in the model. The likelihood ratio χ2 (LR χ2) is asymptotic to the χ2 distribution and defined as the reduction in the deviance D of the observed data from the predictions of the model compared to the null model containing just the constant term A (in the first equation above), while the AIC is given by the LR χ2 minus two times the number of degrees of freedom. The deviance is defined as: This sum is conducted over the n = 3,727 proteins giving expression scores of 5 or 0 among the 6,348 in the large-scale protein expression data set, and the logistic variable E assumes values of 1 or 0 if protein ‘j’ is expressed at the E = 5 or E = 0 levels, respectively. The variable π = π(θ ) gives the predicted probability of obtaining expression of protein ‘j’ at the E = 5 rather than E = 0 level according to the equations given above describing the multi-parameter binary logistic model. For the data set analysed in this paper, the deviance has values of 5,154 and 3,952 for the null model and our final model M, respectively (Extended Data Table 1a). In addition to using the AIC, we ensured that the final model is not over-fit via bootstrapping with replacement 1,000 times using the RMS package55. This validation procedure is considered more robust than splitting the data set into training and test sets, which requires very careful selection of the test set. The sequence parameters explored in the course of model development (Extended Data Table 1 and additional data not shown) included the length of the gene, the individual codon frequencies in-frame or out-of-frame in the entire gene, the individual codon frequencies in-frame calculated separately in the head and the tail or in the first and second halves of the coding sequence, di-codon frequencies, the statistical entropy of the codon sequence, the codon and amino acid repetition rates (defined below), the frequencies of the nucleotide bases at each codon position in the entire gene and in defined windows within its sequence, and a variety of predicted mRNA-folding energy parameters including those shown in Fig. 1 and Extended Data Fig. 2, which were evaluated individually and as statistical aggregates. The codon repetition rate r and amino acid repetition rate r are defined as < d −1>, where is the distance at every position in the sequence to the next occurrence of the same species moving towards the 3′ end of the gene. The value of d −1 is set to zero if the codon or amino acid does not occur again, so the value of r for the protein sequence LRPRL is the average of (1/4, 1/2, 0, 0, 0), which is 0.15. The sequence of the C-terminal LEHHHHHH affinity tag was omitted from all computational analyses to avoid biasing statistics on its constituent amino acids and codons. Because this sequence is present in every gene included in our large-scale protein expression data set, it cannot directly influence outcome on its own and can only have an influence via differential interaction with other sequence features. No evidence of such interactions was detected in bell-weather analyses including the tag sequence, so it was omitted in the final analyses reported in this paper. The number of degrees of freedom for codon variables is one fewer than the number of non-stop codons because their frequencies f in a sequence must sum to 1 (that is, ). Therefore, for the analyses shown in Figs 3 and 4, we removed ATG, effectively constraining its slope to be zero (that is, β = 0) and its contribution to the model to be absorbed into the constant A. The inclusion of mean codon-slope variables s and s in model M uniformly reduces the individual codon slopes β to ~86% of their values when no mean-slope terms are included in the model, reflecting the disproportionate influence of codons near the 5′ terminus compared to those in the rest of the gene (Extended Data Fig. 6). We tested expanded codons models including the next base or the previous base in addition to the in-frame codon, but these were rejected based on the AIC and bootstrap validation criteria described above. We also examined introducing additional variables into model M (Extended Data Table 1b and additional data not shown). Adding the mean value of the predicted free energy of mRNA folding in the tail does not significantly improve the model, even though unstable folding in the tail correlates with reduced protein expression (Fig. 1g, h). Therefore, this correlation as well as those of the overall A, T, G and C content in the gene (Extended Data Fig. 2a–e) are captured more effectively by the cross-correlated sequence parameters (Extended Data Figs 3 and 4) that are included in the model, suggesting that these other parameters are more influential mechanistically. Adding the mean slope of codons 2–6 does not produce a statistically significant improvement, and using this term instead of the base-composition terms in this region yields inferior results, consistent with the analyses shown in Extended Data Fig. 5. Finally, adding the frequency of the Shine–Dalgarno consensus AGGA in any frame (f in Extended Data Fig. 2i, j and Extended Data Table 1b) fails to produce a statistically significant improvement. We also used the Bindigo program (http://rna.williams.edu/) to compute the binding energy of all hexamer sequences in a gene with the anti-Shine–Dalgarno sequence CACCUCCU, and neither the minimum nor the average value of the predicted free energy of hybridization to the anti-Shine–Dalgarno sequence has any correlation with protein expression level our large-scale data set (Extended Data Table 1b). In the 6AA method, codons for six amino acids were changed to the single codon specified in Extended Data Table 2, which has a larger slope than that of any synonymous codon in our single-parameter binary logistic regression analyses (dark grey symbols in Fig. 3a). Although no explicit free energy optimization was performed with the 6AA method, it produced genes in which the predicted free energies of mRNA folding were more favourable than those in the naturally occurring starting sequences. In the 31C-FO method, predicted mRNA-folding energy was optimized while selecting codons from the 31 listed in Extended Data Table 2, which have slopes greater than zero in our single-parameter binary logistic regression analyses (dark grey symbols in Fig. 3a). The predicted free energy of folding of the head plus 5′-UTR (ΔG ) was maximized numerically (that is, to yield the least stable folding), while the predicted free energy of the folding in the tail was optimized to be near −10 kcal mol−1 in windows of 48 nucleotides. The 31C-FD used the same set of codons to produce genes in which the predicted free energy of folding was minimized numerically (that is, to yield the most stable folding). The E. coli strain DH5α was used for cloning. Expression experiments used E. coli strain BL21(DE3) pMGK (ref. 38). Ampicillin was added at 100 μg ml−1 for cultures harbouring pET21-based plasmids. Kanamycin was added at 25 μg ml−1 to maintain the pMGK plasmid. Bacterial growth for protein expression and northern blot experiments employing pET21-based plasmids was performed using the same medium and conditions that were used to generate our high-throughput protein-expression data set38 (that is, MJ9 minimum medium56 with 250 r.p.m. agitation at 37 °C before induction at 17 °C). The pET-21 clones of the genes APE_0230.1 (Aeropyrum pernix K1), RSP_2139 from (Rhodobacter sphaeroides), SRU_1983 (Salinibacter ruber), SCO1897 (Streptomyces coelicolor) and ycaQ (E. coli) were obtained from the protein-production laboratory of the Northeast Structural Genomics Consortium (http://www.NESG.org) at Rutgers University (NESG targets Xr92, RhR13, SrR141, RR162 and ER449, respectively). The DNAs encoding the 6AA and 31C-FO /31C-FO variants of the genes were synthesized by GenScript. The head variants 31C-FO and 31C-FO were generated by PCR amplification using long forward primers containing an NcoI restriction site, the new head sequence, and a sequence complementary to the downstream region in the target gene. A plasmid containing the starting construct was used as DNA template for PCR amplification using the corresponding long forward primers and a reverse primer hybridizing at the 3′ end of the target gene including the XhoI restriction site. The resulting PCR products were cloned using the In-Fusion kit (Clontech) into a pET-21 derivative linearized with NcoI and XhoI. The full protein-coding sequence in every plasmid was verified by DNA sequencing (Genewiz and Eton Bioscience) and corrected when necessary using the QuikChange II Site-Directed Mutagenesis kit (Agilent Technologies). The wild-type and 31C-FO /31C-FO (31C-FO / ) genes for SRU_1983, APE_0230.1 and E. coli YcaQ were re-cloned into a pBAD expression plasmid (Life Technologies) with a C-terminal hexa-histidine tag for transcription by the native E. coli RNA polymerase under control of an arabinose-inducible promoter; these experiments yielded similar results (Extended Data Fig. 6e, f) to those shown for the same genes under T7 polymerase control in a pET plasmid (Fig. 5 and Extended Data Fig. 6a–d). DNA sequences of the final constructs are provided in Supplementary Data File 3. Overnight cell growth was measured by transferring 200 μl of each induced culture to a 96-well sterile plate (Greiner Bio-One) and covering each well with 50 μl of sterile paraffin oil. A negative control non-induced sample was loaded for each wild-type target. Duplicate wells were measured for each sample. Plates were loaded into a platereader (Biotek Synergy) at room temperature and shaken for 30 s. An initial A reading was taken and then followed by 30 min of shaking until the next absorbance reading. Readings were repeated at 30 min intervals during 9 h of cell growth. Starting cultures from a single colony were inoculated into 6 ml of LB media containing 100 μg ml−1 of ampicillin and 30 μg ml−1 kanamycin. Cultures were grown at 37 °C until highly turbid (4–6 h), then 40 μl was used to inoculate 2 ml of MJ9 chemically defined medium56. This MJ9 pre-culture was grown overnight at 37 °C. The next day, A readings were taken of a 1:10 dilution of the turbid MJ9 pre-culture. This reading was used to calculate the volume of pre-culture necessary to normalize all cell samples to a starting culture density of 0.1 A in 6 ml of fresh medium. The reinoculated culture was grown at 37 °C until A reached 0.5–0.7. Cells were then induced with 1 mM IPTG, with one duplicate tube for each wild-type gene not induced to serve as a negative control. After induction, 200 μl ×2 of each culture was removed and placed into a sterile 96-well plate to monitor cell growth rate (see above). The remaining 5.6 ml of induced samples were then transferred to 17 °C and shaken overnight. The next day, samples were removed from the shaker, placed on ice, and final A was measured. Cells were centrifuged in 14-ml round-bottom Falcon tubes at 5,300g for 10 min, and the pellets were resuspended in 1.2 ml of lysis buffer (30 mM NaCl, 10 mM 2-mercaptoethanol, 50 mM NaH PO , pH 8.0) and then transferred to 1.5 ml Eppendorf tubes on ice. Lysis was accomplished by sonication on ice, using a 40 V setting (~12 W pulse) and pulsing for 1 s followed by a 2 s rest, for a total of 40 pulses. Then 120 μl of each lysed culture was mixed with 40 μl of 4× Laemmli buffer, and samples were analyzed using SDS–PAGE (Bio-Rad, Ready Gel, 15% Tris-HCl), with Bio-Rad Precision Plus All Blue Standard markers. Final A measurements were used to calculate the load volume for each individual sample, normalizing all samples to the density of the least turbid of each unique target. We verified the integrity of the plasmids after growth and induction by DNA sequencing (Genewiz and Eton Bioscience). Every result was confirmed by repeating the experiment. Conducting experiments at physiological protein expression levels (Extended Data Fig. 6e, f) required considerable changes in methods compared to the experiments conducted in pET vectors that were used to generate our large-scale protein-expression data set and the data shown in Fig. 5 and Extended Data Figs 6a, b and 7. Because mRNA expression from IPTG-controlled promoters tends to occur in an all-or-none fashion60, 61, it is not practical to control the level of mRNA expressed from pET vectors. Therefore, we re-cloned three pairs of synonymous native and codon-optimized 31C-FO / genes with C-terminal hexahistidine tags under control of the arabinose-inducible promoter in a pBAD vector62, which provides a more gradual increase in expression as arabinose concentration is raised. This promoter drives transcription using the endogenous E. coli RNA polymerase rather than T7 RNA polymerase, which is employed by the pET vectors used for all other expression experiments reported in this paper. Because transcription from the arabinose promoter is repressed by glucose, which is the carbon source in the chemically defined MJ9 medium used for our pET experiments, we instead used LB as the growth medium for pBAD experiments, which were conducting in BL21 pMGK cells (that is, an isogenic E. coli strain except for the removal of the λ(DE3) prophage carrying the gene for T7 RNA polymerase). Furthermore, because the arabinose inducer can be depleted during long growth periods, we evaluated expression after relatively short 1–4 h induction times during log-phase growth rather than after overnight growth into stationary phase, which was used for our pET experiments. We also changed the growth temperature during induction from 17 °C for pET experiments to 37 °C for pBAD experiments. Non-induced controls were grown in medium containing 0.4% glucose (+Glc). When the A of the cultures reached 0.6, transcription of the target genes was induced for 1 h using final arabinose concentrations of 0.001% (w/v) for APE_0230.1 and 0.01% (w/v) for SRU_1983 and E. coli YcaQ (+Ara). The pET21 plasmids containing optimized or unoptimized inserts were digested with BlpI, phenol–chloroform purified, and concentrated by ethanol precipitation. From the digested samples, 2 μg was added to the RiboMax kit (Promega), and in vitro transcription with bacteriophage T7 RNA polymerase was conducted according to the manufacturer’s protocol. Upon completion of the reaction, samples were treated with DNase (Promega), isopropanol precipitated, and resuspended in RNA Storage Solution (Ambion). Transcript size and purity were verified by agarose gel electrophoresis with ethidium bromide staining. For kinetic analyses, 20-μl reactions with T7 polymerase were assembled and started by addition of 1 μg of template DNA. A 4.5-μl sample of each reaction was removed at 0-, 5-, 10- and 30-min time points for analysis on denaturing formaldehyde-agarose gels. Each experiment was conducted at least twice. In vitro translation assays of the purified mRNAs were performed with the PURExpress system (New England Biolabs) using l-[35S]methionine premium (PerkinElmer). Each 25-μl reaction contained 10 μl of solution A, 7.5 μl of solution B and 2 μl of [35S]methionine (10 μCi). The reactions were started by adding 2 μl of purified mRNA (4 μg μl−1) and incubating at 37 °C. Aliquots of 5 μl were withdrawn from the reactions at 15, 30, 60 and 90 min, and translation was stopped by adding 10 μl of 2× Laemmli and heating for 2 min at 60 °C. Then 14 μl of each aliquot was run on a 4–20% SDS–PAGE gel (Bio-Rad) with Bio-Rad Precision Plus All Blue Standard markers. The gel was dried on Whatman filter paper and subjected to autoradiography. Each reaction was repeated at least twice. The probe was designed as the reverse complement of the 71-nucleotides of the 5′-UTR of the pET21 vector, and it was synthesized by Eurofins. The probe was labelled with biotin using the BrightStar Psoralen-Biotin Nonisotopic Labelling Kit. BL21(DE3) pMGK E. coli containing the plasmid of interest were grown overnight in LB at 37 °C with shaking. Cultures were diluted 1:50 into MJ9 media and grown overnight at 37 °C with shaking. The next day, the cultures were diluted to an A of 0.15 in MJ9 media and allowed to grow to an A of 0.6–0.7 before induction with 1 mM IPTG. Samples were taken at the indicated time points and RNAs were stabilized in two volumes of RNAProtect Bacteria Reagent. After pelleting, samples were lysozyme digested (15 mg ml−1) for 15 min, and RNAs were purified using the Direct-zol RNA Miniprep Kit and TRI-Reagent. Approximately 1–2 μg of total RNA per sample was separated on a 1.2% formaldahyde-agarose gel in MOPS-formaldahyde buffer. RNA integrity was verified by ethidium bromide staining. RNA was then transferred to a positively charged nylon membrane using downward capillary transfer with an alkaline transfer buffer (1 M NaCl, 10 mM NaOH, pH 9) for 2 h at room temperature. RNAs were crosslinked to the membrane using 1,200 μJ ultraviolet irradiation (Stratalinker). Membranes were pre-hybridized in Ultrahyb hybridization buffer for 1 h at 42 °C in a hybridization oven. Heat-denatured, biotin-labelled probe was then added to 10–20 pM final concentration and hybridized overnight at 42 °C. Membranes were washed twice in buffer (0.2× SSC, 0.5% SDS), and probe signal was detected using the BrightStar BioDetect kit, as per protocol, via exposure to film. Each northern blot experiment was repeated at least twice. E. coli MG1655 cells were cultured in M9 0.4% glucose minimum media to a final A of 1.0. Cells were treated with RNA Protect Bacteria Reagent (Qiagen), and RNA extracted using the RNeasy Mini Kit (Qiagen) was reverse-transcribed using SuperScript II Reverse Transcriptase (Invitrogen) followed by treatment with RNaseH (Invitrogen) and RNaseA (EpiCentre). The resulting cDNA preparation was purified using the MinElute Purification Kit (Qiagen) and then fragmented into 50–200-bp fragments using DNaseI (EpiCentre). Biotinylation was performed with Terminal Deoxynucleotidyl Transferase (New England Biolabs) and Biotin-N6-ddATP (Enzo Life Sciences). Biotinylated cDNA was hybridized on Affymetrix E. coli 2.0 arrays by the Gene Expression Center at the University of Wisconsin Biotechnology Center. Raw data (.cel) files were analysed using the RMA (Robust Multi-chip Average) algorithm in the Affymetrix Expression Console. All predicted proteins in the version of the genome in the Ecocyc database57 were analysed using the programs LipoP58 and TMHMM59, and those without a predicted transmembrane helix or a predicted signal peptide were classified as cytoplasmic proteins and included in the analyses in Fig. 6. We analysed the data sets published previously44 in which RNA-seq was used to quantify global mRNA levels as a function of time after treatment of either exponential or early stationary phase cultures with the transcription-initiation inhibitor rifampicin. To avoid potential complications arising from the encoding of multiple proteins in polycistronic transcripts, we limited our analyses to monocistronic transcripts, which constituted 76% and 82% of the mRNAs for which lifetimes were measured in exponential and stationary phase, respectively. The analyses presented in Fig. 6c, d were also limited to predicted cytoplasmic proteins to avoid possible biases from systematically lower expression of integral membrane proteins and secreted proteins. The set of genes for which Chen et al.44 were able to measure lifetime is strongly biased towards more abundant mRNAs, and the measured lifetimes in both the exponential and stationary phase data sets are also strongly correlated with steady-state concentrations (data not shown).