Aiyar L.,University of Toronto |
Aiyar L.,Fetal Diagnostic Center |
Shuman C.,University of Toronto |
Hayeems R.,Hospital for Sick Children |
And 7 more authors.
Genetics in Medicine | Year: 2014
Purpose:Personal genome testing allows the identification of single-nucleotide polymorphisms associated with an increased risk for common complex disorders. An area of concern in the use of personal genome testing is how risk estimates generated differ from traditional measures of risk (e.g., family history analysis). We sought to analyze the concordance of risk estimates generated by family history analysis and by personal genome testing.Methods:Risk categorizations for 20 complex conditions included in Navigenics personal genome testing were compared with risk categorization estimates derived from family history assessment using the kappa (κ) statistic.Results:The only conditions showing slight agreement between risk assessment methods were Alzheimer disease (κ = 0.131), breast cancer (κ = 0.154), and deep vein thrombosis (κ = 0.201) in females, and colon cancer (κ = 0.124) in males. Eighty-six individuals (11.4%) were found to have additional genetic risks not assessed by personal genome testing after family and medical history assessment, including 38 individuals with family histories suggestive of hereditary cancer syndromes.Conclusion: Discordance between personal genome testing and family history risk estimates suggests that these methods may provide independent information that could be used in a complementary manner. Results also support that eliciting family history adds value to overall risk assessment for individuals undergoing personal genome testing. © American College of Medical Genetics and Genomics. Source
Lee S.-J.,Center for Computational Biology |
Schlesinger P.H.,Washington University in St. Louis |
Wickline S.A.,Washington University in St. Louis |
Lanza G.M.,Washington University in St. Louis |
Baker N.A.,Pacific Northwest National Laboratory
Journal of Physical Chemistry B | Year: 2011
Melittin, an antimicrobial peptide, forms pores in biological membranes and triggers cell death. Therefore, it has potential as an anticancer therapy. However, until recently, the therapeutic application of melittin has been impractical because a suitable platform for delivery was not available. Recently, we showed that phospholipid-stabilized perfluorooctyl bromide based nanoemulsion particles (PFOB-NEPs) were resistant to destruction by melittin and enabled specific delivery of melittin to tumor cells, killing them and reducing tumor growth. Earlier, prior work also showed that melittin adsorbed onto the stabilizing phospholipid monolayer of PFOB-NEP but did not disrupt the phospholipid monolayer or produce "cracking" of the PFOB-NEPs. The present work identifies the important structural motifs for melittin binding to PFOB-NEPs through a series of atomistic molecular dynamics simulations. The conformational ensemble of melittin bound to PFOB-NEP lipid monolayer was compared to structure from a control simulation of melittin bound to a lipid bilayer to identify several differences in melittin-lipid interactions between the two systems. First, melittin was deeply buried in the hydrophobic tail region of bilayer, while its depth was attenuated in the PFOB-NEP monolayer. Second, a helical conformation was the major secondary structure in the bilayer, but the fraction of helix was reduced in the PFOB-NEP. Finally, the overall pattern for the direct interaction of melittin with surrounding lipids was similar between liposome and PFOB-NEP, but the level of interaction was slightly decreased in the PFOB-NEP. These results suggest that melittin interacts with the monolayer of PFOB-NEP in a way that is similar way to its interaction with bilayers but that deeper penetration into the hydrophobic interior is inhibited. © 2011 American Chemical Society. Source
Billingsley G.,Hospital for Sick Children |
Bin J.,Hospital for Sick Children |
Fieggen K.J.,University of Cape Town |
Duncan J.L.,University of California at San Francisco |
And 14 more authors.
Journal of Medical Genetics | Year: 2010
Background: Bardet-Biedl syndrome is a pleiotropic disorder with 14 BBS genes identified. BBS1, BBS2, BBS4, BBS5, BBS7, BBS8, and BBS9 form a complex called the BBSome, which is believed to recruit Rab8GTP to the primary cilium and promote ciliogenesis. The second group, the chaperonin-like proteins BBS6, BBS10, and BBS12, have been defined as a vertebrate-specific branch of the type II chaperonin superfamily. These may play a role in the regulation of BBSome assembly. Methods and results: Using sequence analysis, the role of BBS6, 10 and 12 was assessed in the patient population comprising 93 cases from 74 families. Systemic and ocular phenotypes were defined. In the study, chaperonin-like BBS gene mutations accounted for the disease in approximately 36.5% of BBS families. A total of 38 different non-polymorphic exonic sequence variants were identified in 40.5% of BBS families (41.9% cases), of which 26 were novel (68%). Six cases had mutations present in more than one chaperonin-like BBS gene. One case with four mutations in BBS10 had a phenotype of overall greater severity. The phenotypes observed were beyond the classic BBS phenotype as they overlapped with characteristics of MKKS (congenital heart defect, vaginal atresia, hydrometrocolpos, cryptorchidism), as well as Alström syndrome (diabetes, hearing loss, liver abnormalities, endocrine anomalies, cardiomyopathy). Conclusions: While overlap between the MKKS and BBS phenotypes has previously been reported for cases with BBS6 mutations, we also observed MKKS phenotypes involving BBS10 and BBS12 and Alström-like phenotypes associated with mutations in BBS1, BBS2, BBS6, BBS7, BBS9, BBS10 and BBS12 for the first time. Source
Strug L.J.,Child Health Evaluative science |
Strug L.J.,University of Toronto |
Hodge S.E.,Columbia University |
Hodge S.E.,New York State Psychiatric Institute |
And 4 more authors.
European Journal of Human Genetics | Year: 2010
Investigators performing genetic association studies grapple with how to measure strength of association evidence, choose sample size, and adjust for multiple testing. We apply the evidential paradigm (EP) to genetic association studies, highlighting its strengths. The EP uses likelihood ratios (LRs), as opposed to P-values or Bayes factors, to measure strength of association evidence. We derive EP methodology to estimate sample size, adjust for multiple testing, and provide informative graphics for drawing inferences, as illustrated with a Rolandic Epilepsy (RE) fine-mapping study. We focus on controlling the probability of observing weak evidence for or against association (W) rather than type I errors (M). For example, for LR≥32 representing strong evidence, at one locus with n=200 cases, n=200 controls, W=0.134, whereas M=0.005. For n=300 cases and controls, W=0.039 and M=0.004. These calculations are based on detecting an OR1.5. Despite the common misconception, one is not tied to this planning value for analysis; rather one calculates the likelihood at all possible values to assess evidence for association. We provide methodology to adjust for multiple tests across m loci, which adjusts M and W for m. We do so for (a) single-stage designs, (b) two-stage designs, and (c) simultaneously controlling family-wise error rate (FWER) and W. Method (c) chooses larger sample sizes than (a) or (b), whereas (b) has smaller bounds on the FWER than (a). The EP, using our innovative graphical display, identifies important SNPs in elongator protein complex 4 (ELP4) associated with RE that may not have been identified using standard approaches. © 2010 Macmillan Publishers Limited All rights reserved. Source
Handling big data can sometimes feel like driving on an unpaved road for researchers with a need for speed and supercomputers. "When you're in the world of data, there are rocks and bumps in the way, and a lot of things that you have to take care of," said Niall Gaffney, a former Hubble Space Telescope scientist who now heads the Data Intensive Computing group at the Texas Advanced Computing Center (TACC). Gaffney led the effort to bring online a new kind of supercomputer, called Wrangler. Like the old Western cowboys who tamed wild horses, Wrangler tames beasts of big data, such as computing problems that involve analyzing thousands of files that need to be quickly opened, examined and cross-correlated. Wrangler fills a gap in the supercomputing resources of XSEDE, the Extreme Science and Engineering Discovery Environment, supported by the National Science Foundation (NSF). XSEDE is a collection of advanced digital resources that scientists can easily use to share and analyze the massive datasets being produced in nearly every field of research today. In 2013, NSF awarded TACC and its academic partners Indiana University and the University of Chicago $11.2 million to build and operate Wrangler, a supercomputer to handle data-intensive high performance computing. Wrangler was designed to work closely with the Stampede supercomputer, the 10th most powerful in the world according to the bi-annual Top500 list, and the flagship of TACC at The University of Texas at Austin (UT Austin). Stampede has computed over six million jobs for open science since it came online in 2013. "We kept a lot of what was good with systems like Stampede," said Gaffney, "but added new things to it like a very large flash storage system, a very large distributed spinning disc storage system, and high-speed network access. This allows people who have data problems that weren't being fulfilled by systems like Stampede and Lonestar to be able to do those in ways that they never could before." Gaffney made the analogy that supercomputers like Stampede are like racing sports cars, with fantastic compute engines optimized for going fast on smooth, well-defined race-tracks. Wrangler, on the other hand, is built like a rally car to go fast on unpaved, bumpy roads with muddy gravel. "If you take a Ferrari off-road you may want to change the way that the suspension is done," Gaffney said. "You want to change the way that the entire car is put together, even though it uses the same components, to build something suitable for people who have a different job." At the heart of Wrangler lie 600 terabytes of flash memory shared via PCI interconnect across Wrangler's over 3,000 Haswell compute cores. "All parts of the system can access the same storage," Gaffney said. "They can work in parallel together on the data that are stored inside this high-speed storage system to get larger results they couldn't get otherwise." This massive amount of flash storage comes from DSSD, a startup co-founded by Andy Bechtolsheim of Sun Microsystems fame and acquired in May of 2015 by EMC. Bechtolsheim's influence at TACC goes back to the 'Magnum' Infiniband network switch he led design on for the now-decommissioned Ranger supercomputer, the predecessor to Stampede. What's new is that DSSD took a shortcut between the CPU and the data. "The connection from the brain of the computer goes directly to the storage system. There's no translation in between," Gaffney said. "It actually allows people to compute directly with some of the fastest storage that you can get your hands on, with no bottlenecks in between." Gaffney recalled the hang-up scientists had with code called OrthoMCL, which combs through DNA sequences to find common genetic ancestry in seemingly unrelated species. The problem was that OrthoMCL let loose databases wild as a bucking bronco. "It generates a very large database and then runs computational programs outside and has to interact with this database," said biologist Rebecca Young of the Department of Integrative Biology and the Center for Computational Biology and Bioinformatics at UT Austin. She added, "That's not what Lonestar and Stampede and some of the other TACC resources were set up for." Young recounted how at first, using OrthoMCL with online resources, she was only able to pull out 350 comparable genes across 10 species. "When I run OrthoMCL on Wrangler, I'm able to get almost 2,000 genes that are comparable across the species," Young said. "This is an enormous improvement from what is already available. What we're looking to do with OrthoMCL is to allow us to make an increasing number of comparisons across species when we're looking at these very divergent, these very ancient species separated by 450 million years of evolution." "We were able to go through all of these work cases in anywhere between 15 minutes and 6 hours," Gaffney said. "This is a game changer." Gaffney added that getting results quickly lets scientists explore new and deeper questions by working with larger collections of data and driving previously unattainable discoveries. Computer scientist Joshua New with the Oak Ridge National Laboratory (ORNL) hopes to take advantage of Wrangler's ability to tame big data. New is the principal investigator of the Autotune project, which creates a software version of a building and calibrates the model with over 3,000 different data inputs from sources like utility bills to generate useful information, such as what an optimal energy-efficient retrofit might be. "Wrangler has enough horsepower that we can run some very large studies and get meaningful results in a single run," New said. He currently uses the Titan supercomputer of ORNL to run 500,000 simulations and write 45 TB of data to disk in 68 minutes. He said he wants to scale out his parametric studies to simulate all 125.1 million buildings in the U.S. "I think that Wrangler fills a specific niche for us in that we're turning our analysis into an end-to-end workflow, where we define what parameters we want to vary," New said. "It creates the sampling matrix. It creates the input files. It does the computationally challenging task of running all the simulations in parallel. It creates the output. Then we run our artificial intelligence and statistic techniques to analyze that data on the back end. Doing that from beginning to end as a solid workflow on Wrangler is something that we're very excited about." When Gaffney talks about storage on Wrangler, he's talking about is a lot of data storage — a 10 petabyte Lustre-based file system hosted at TACC and replicated at Indiana University. "We want to preserve data," Gaffney said. "The system for Wrangler has been set up for making data a first-class citizen amongst what people do for research, allowing one to hold onto data and curate, share and work with people with it. Those are the founding tenants of what we wanted to do with Wrangler." "Data is really the biggest challenge with our project," said UT Austin astronomer Steve Finkelstein. His NSF-funded project is called HETDEX, the Hobby-Eberly Telescope Dark Energy Experiment. It's the largest survey of galaxies ever attempted. Scientists expect HETDEX to map over a million galaxies in three dimensions, in the process discovering thousands of new galaxies. The main goal is to study dark energy, a mysterious force pushing galaxies apart. "Every single night that we observe — and we plan to observe more or less every single night for at least three years — we're going to make 200 GB of data," Finkelstein said. It'll measure the spectra of 34,000 points of skylight every six minutes. "On Wrangler is our pipeline," Finkelstein said. "It's going to live there. As the data comes in, it's going to have a little routine that basically looks for new data, and as it comes in every six minutes or so it will process it. By the end of the night, it will actually be able to take all the data together to find new galaxies." Another example of a new HPC user Wrangler enables is an NSF-funded science initiative called PaleoCore. It hopes to take advantage of Wrangler's swiftness with databases to build a repository for scientists to dig through geospatially-aware data on all fossils related to human origins. This would combine older digital collections in formats like Excel worksheets and SQL databases with newer ways of gathering data, such as real-time fossil GPS information collected from iPhones or iPads. "We're looking at big opportunities in linked open data," PaleoCore principal investigator Denne Reed said. Reed is an associate professor in the Department of Anthropology at UT Austin. Linked open data allows for queries to get meaning from the relationships of seemingly disparate pieces of data. "Wrangler is the type of platform that enables that," Reed said. "It enables us to store large amounts of data, both in terms of photo imagery, satellite imagery and related things that go along with geospatial data. Then also, it allows us to start looking at ways to effectively link those data with other data repositories in real time." Wrangler's shared memory supports data analytics on the Hadoop and Apache Spark frameworks. "Hadoop is a big buzzword in all of data science at this point," Gaffney said. "We have all of that and are able to configure the system to be able to essentially be like the Google Search engines are today in data centers. The big difference is that we are servicing a few people at a time, as opposed to Google." Users bring data in and out of Wrangler in one of the fastest ways possible. Wrangler connects to Internet2, an optical network which provides 100 gigabytes per second worth of throughput to most of the other academic institutions around the country. What's more, TACC has tools and techniques to transfer their data in parallel. "It's sort of like being at the supermarket," explained Gaffney. "If there's only one lane open, it is just as fast as one person checking you out. But if you go in and have 15 lanes open, you can spread that traffic across and get more people through in less time." Biologists, astronomers, energy efficiency experts, and paleontologists are just a small slice of the new user community Wrangler aims to attract. Wrangler is also more web-enabled than typically found in high performance computing. A web portal allows users to manage the system and gives the ability to use web interfaces such as VNC, RStudio, and Jupyter Notebooks to support more desktop-like user interactions with the system. "We need these bigger systems for science," Gaffney said. "We need more kinds of systems. And we need more kinds of users. That's where we're pushing towards with these sort of portals. This is going to be the new face, I believe, for many of these systems that we're moving forward with now. Much more web-driven, much more graphical, much less command line driven. " Wrangler is primed to lead the way in computing the bumpy world of data-intensive science research. "There are some great systems and great researchers out there who are doing groundbreaking and very important work on data, to change the way we live and to change the world," Gaffney said. "Wrangler is pushing forth on the sharing of these results, so that everybody can see what's going on."