No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. We screened 433 next-generation sequencing libraries from 270 distinct samples for authentic ancient DNA using previously reported protocols7. All libraries that we included in nuclear genome analysis were treated with uracil-DNA-glycosylase (UDG) to reduce characteristic errors of ancient DNA42. We performed in-solution enrichment for a targeted set of 1,237,207 SNPs using previously reported protocols4, 7, 43. The targeted SNP set merges 394,577 SNPs first reported in ref. 7 (390k capture), and 842,630 SNPs first reported in ref. 44 (840k capture). For 67 samples for which we newly report data in this study, there was pre-existing 390k capture data7. For these samples, we only performed 840k capture and merged the resulting sequences with previously generated 390k data. For the remaining samples, we pooled the 390k and 840k reagents together to produce a single enrichment reagent, which we called 1240k. We attempted to sequence each enriched library up to the point where we estimated that it was economically inefficient to sequence further. Specifically, we iteratively sequenced more and more from each sample and only stopped when we estimated that the expected increase in the number of targeted SNPs hit at least once would be less than about one for every 100 new read pairs generated. After sequencing, we filtered out samples with <30,000 targeted SNPs covered at least once, with evidence of contamination based on mitochondrial DNA polymorphism43, a high rate of heterozygosity on chromosome X despite being male45, or an atypical ratio of X to Y sequences. Of the targeted SNPs, 47,384 are ‘potentially functional’ sites chosen as follows (with some overlap): 1,290 SNPs identified as targets of selection in Europeans by the Composite of Multiple Signals (CMS) test1; 21,723 SNPS identified as significant hits by genome-wide association studies, or with known phenotypic effect (GWAS); 1,289 SNPs with extremely differentiated frequencies between HapMap populations46 (HiDiff); 9,116 ‘Immunochip’ SNPs chosen for study of immunity-related phenotypes (Immune); 347 SNPs phenotypically relevant to South America (mostly altitude adaptation SNPs in EGLN1 and EPAS1), 5,387 SNPs which tag HLA haplotypes and 13,672 expression quantitative trait loci47 (eQTL). We used two data sets for population history analysis. ‘HO’ consists of 592,169 SNPs, taking the intersection of the SNP targets and the Human Origins SNP array4; we used this data set for co-analysis of present-day and ancient samples. ‘HOIll’ consists of 1,055,209 SNPs that additionally includes sites from the Illumina genotype array48; we used this data set for analyses only involving the ancient samples. On the HO data set, we carried out principal components analysis in smartpca49 using a set of 777 West Eurasian individuals4, and projected the ancient individuals with the option ‘lsqproject: YES’. We carried out admixture analysis on a set of 2,345 present-day individuals and the ancient samples after pruning for LD in PLINK 1.9 (https://www.cog-genomics.org/plink2)50 with parameters ‘-indep-pairwise 200 25 0.4’. We varied the number of ancestral populations between K = 2 and K = 20, and used cross-validation (–cv.) to identify the value of K = 17 to plot in Extended Data Fig. 2f. We used ADMIXTOOLS11 to compute f-statistics, determining standard errors with a block jackknife and default parameters. We used the option ‘inbreed: YES’ when computing f -statistics of the form f (ancient; Ref , Ref ) as the ancient samples are represented by randomly sampled alleles rather than by diploid genotypes. For the same reason, we estimated F genetic distances between populations on the HO data set with at least two individuals in smartpca also using the ‘inbreed: YES’ option. We estimated ancestral proportions as in supplementary information section 9 of ref. 7, using a method that fits mixture proportions on a ‘test’ population as a mixture of n ‘reference’ populations by using f -statistics of the form f (test or ref, O ; O , O ) that exploit allele frequency correlations of the test or reference populations with triples of outgroup populations We used a set of 15 world outgroup populations4, 7. In Extended Data Fig. 2, we added WHG and EHG as outgroups for those analyses in which they are not used as reference populations. We plot the squared 2-norm of the residuals where â is a vector of n estimated mixture proportions (summing to 1), t is a vector of f -statistics of the form f (test, O ; O , O ) for m outgroups, and R is a matrix of the form f (ref, O ; O , O ) (supplementary information section 9 of ref. 7). We determined sex by examining the ratio of aligned reads to the sex chromosomes51. We assigned Y-chromosome haplogroups to males using version 9.1.129 of the nomenclature of the International Society of Genetic Genealogy (http://www.isogg.org), restricting analysis using samtools52 to sites with map quality and base quality of at least 30, and excluding two bases at the ends of each sequenced fragment. For most ancient samples, we did not have sufficient coverage to make reliable diploid calls. We therefore used the counts of sequences covering each SNP to compute the likelihood of the allele frequency in each population. Suppose that at a particular site, for each population we have M samples with sequence level data, and N samples with full diploid genotype calls (Loschbour, Stuttgart and the 1,000 Genomes samples). For samples i = 1…N, with diploid genotype data, we observe X copies of the reference allele out of 2N total chromosomes. For each of samples i = (N+1)…(N+M), with sequence level data, we observe R sequences with the reference allele out of T total sequences. Then, the likelihood of the population reference allele frequency, p given data is given by where is the binomial probability distribution and ε is a small probability of error, which we set to 0.001. We write for the log-likelihood. To estimate allele frequencies, for example in Fig. 3 or for the polygenic selection test, we maximized this likelihood numerically for each population. To scan for selection across the genome, we used the following test. Consider a single SNP. Assume that we can model the allele frequencies p in A modern populations as a linear combination of allele frequencies in B ancient populations p . That is, p = C p , where C is an A by B matrix with rows summing to 1.We have data D from population j which is some combination of sequence counts and genotypes as described above. Then, writing the log-likelihood of the allele frequencies equals the sum of the log-likelihoods for each population. To detect deviations in allele frequency from expectation, we test the null hypothesis H : p = C p against the alternative H : p unconstrained. We numerically maximize this likelihood in both the constrained and unconstrained model and use the fact that twice the difference in log-likelihood is approximately distributed to compute a test statistic and P value. We defined the ancient source populations by the ‘Selection group 1’ label in Extended Data Table 1 and Supplementary Table 1 and used the 1000 Genomes CEU, GBR, IBS and TSI as the present-day populations. We removed SNPs that were monomorphic in all four of these modern populations as well as in 1000 Genomes Yoruba (YRI). We do not use FIN as one of the modern populations, because they do not fit this three-population model well. We estimated the proportions of (HG, EF, SA) to be CEU = (0.196, 0.257, 0.547), GBR = (0.362, 0.229, 0.409), IBS = (0, 0.686, 0.314) and TSI = (0, 0.645, 0.355). In practice, we found that there was substantial inflation in the test statistic, most likely due to unmodelled ancestry or additional drift. To address this, we applied a genomic control correction14, dividing all the test statistics by a constant, λ, chosen so that the median P value matched the median of the null distribution. Excluding sites in the potentially functional set, we estimated λ = 1.38 and used this value as a correction throughout. One limitation of this test is that, although it identifies likely signals of selection, it cannot provide much information about the strength or date of selection. If the ancestral populations in the model are, in fact, close to the real ancestral populations, then any selection must have occurred after the first admixture event (in this case, after 6500 bc), but if the ancestral populations are mis-specified, even this might not be true. To estimate power, we randomly sampled allele counts from the full data set, restricting to polymorphic sites with a mean frequency across all populations of <0.1. We then simulated what would happen if the allele had been under selection in all of the modern populations by simulating a Wright–Fisher trajectory with selection for 50, 100 or 200 generations, starting at the observed frequency. We took the final frequency from this simulation, sampled observations to replace the actual observations in that population, and counted the proportion of simulations that gave a genome-wide significant result after GC correction (Extended Data Fig. 6a). We resampled sequence counts for the observed distribution for each population to simulate the effect of increasing sample size, assuming that the coverage and distribution of the sequences remained the same (Extended Data Fig. 6b). We investigated how the genomic control correction responded when we simulated small amounts of admixture from a highly diverged population (Yoruba; 1000 Genomes YRI) into a randomly chosen modern population. The genomic inflation factor increases from around 1.38 to around 1.51 with 10% admixture, but there is little reduction in power (Extended Data Fig. 6c). Finally, we investigated how robust the test was to misspecification of the mixture matrix C. We re-ran the power simulations using a matrix C′ = xC + (1 − x)R for where R was a random matrix chosen so that for each modern population the mixture proportions of the three ancient populations were jointly uniformly distributed on [0,1]. Increasing x increases the genomic inflation factor and reduces power, demonstrating the advantage of explicitly modelling the ancestries of the modern populations (Extended Data Fig. 6d). We implemented the test for polygenic selection described by ref. 37. This evaluates whether trait-associated alleles, weighted by their effect size, are over-dispersed compared to randomly sampled alleles, in the directions associated with the effects measured by genome-wide association studies (GWAS). For each trait, we obtained a list of significant SNP associations and effect estimates from GWAS data, and then applied the test both to all populations combined and to selected pairs of populations. For height, we restricted the list of GWAS associations to 169 SNPs where we observed at least two chromosomes in all tested populations (Selection population 2). We estimated frequencies in each population by computing the maximum likelihood estimate (MLE), using the likelihood described above. For each test we sampled SNPs, frequency-matched in 20 bins, computed the test statistic Q and for ease of comparison converted these to Z scores, signed according the direction of the genetic effects. Theoretically Q has a χ2 distribution but in practice, it is over-dispersed. Therefore, we report bootstrap P values computed by sampling 10,000 sets of frequency-matched SNPs. To estimate population-level genetic height in Fig. 4a, we assumed a uniform prior on [0,1] for the frequency of all height-associated alleles, and then sampled from the posterior joint frequency distribution of the alleles, assuming they were independent, using a Metropolis–Hastings sampler with a N(0,0.001) proposal density. We then multiplied the sampled allele frequencies by the effect sizes to get a distribution of genetic height. Code implementing the selection analysis is available at https://github.com/mathii/europe_selection.
News Article | June 19, 2015
Capgemini subsidiary Sogeti and IBM forged an alliance today under which Sogeti will bring IBM's Cloud Foundry-based Bluemix platform-as-a-service (PaaS) offering to its developers and clients in 15 countries. Sogeti is also turning to Bluemix to help it power hybrid cloud applications for commerce, the Internet of Things (IoT) and data analytics for clients in industries ranging from retail and healthcare to transportation, energy and utilities. The new partnership has grown out of an existing relationship between Sogeti and IBM in which Sogeti built its smartEngine gateway for managing buildings using IBM Bluemix. The smartEngine gateway provides a means for using Bluemix to connect sensors that use different protocols and data formats, allowing Sogeti to provide clients with insights on the performance of their heating, ventilation and air conditioning (HVAC), lighting and other energy-producing processes. These insights are used by clients to minimize energy costs, optimize environmental impact and increase security. Sogeti plans to use Bluemix to extend the smartEngine gateway's IoT capabilities to provide insight for clients in other verticals, including manufacturing, agriculture and utilities management. "As cloud continues to transform how we collaborate and work with technology, many of our clients are looking to build more and more of their apps and systems with the cloud," Andreas Sjöström, vice president and global head of Digital at Sogeti, said in a statement Thursday. "With Bluemix Dedicated, we are able to address these concerns by offering dedicated servers and cloud data centers which easily conform to these requisites, while still offering the highly valuable benefits of a cloud infrastructure, such as accelerated business speed, collaboration and visibility." Mohamed Abdula, vice president, Strategy & Offering Management for Cloud Foundation Services, IBM, says three things are attracting customers like Sogeti to the Bluemix PaaS:
News Article | August 14, 2015
A third report in a series on disruptive technologies has been launched today by Sogeti's Research Institute VINT, and is devoted to the new design principle of the blockchain. It outlines the potential impact of a new way of organizing trust in the presence of unreliable parties. Although it owes its current fame to the currency, the bitcoin, in particular, the cryptographic capacities of the network can be deployed in a variety of other ways. The blockchain is presented as a special kind of platform, which in its turn is a basis for numerous other platforms − in other words, a platform for platforms. “Disruption is the New Normal" is the key message from the series of reports of Sogeti's disruptive technologies research project. The project - Design to Disrupt - outlines exponential growth of the new digital opportunities we're facing today. The allegedly inferior propositions of startups and new technologies confuse prominent players, who should in fact be the very first to be open to disruptive innovation. This innovator’s dilemma brings us back to a major question: to disrupt or to be disrupted. The disruptive potential of blockchain exceeds bitcoin. Menno van Doorn, Director of Sogeti's Research Institute: "The blockchain is basically a possibility to do frictionless business, as the control is already embedded in the transaction. This may well be the key to lots of new possibilities: a strong chain without weak links offering a solution to numerous actual problems within the digital economy." Or put it like this. Menno van Doorn "Imagine that with every transaction that you execute on the internet a notary should look over your shoulder to make sure that nothing is wrong. That would be a very costly affair indeed. But it is an entirely different matter if it could be computerized." The report outlines the crypto-economy in three steps: Crypto-economy 1.0 The report outlines the potential impact of this new application in three steps. The first is coined the crypto- economy 1.0. It concerns the currency, the bitcoin, and financial transactions. It also provides a vital explanation of how the protocol operates. Seven pros and cons of the bitcoin are listed, dwell upon questions regarding the supervision of the system and wrap up with the official viewpoints of the financial authorities. Crypto-economy 2.0 When outlining the crypto-economy 2.0, the report goes more deeply into the other possibilities of the blockchain, (bitcoin without bitcoin). This concerns two kinds of applications in particular: smart contracts and smart products, and how the economy can be made to run more smoothly; or, in other words, how wasting can be stamped out. Crypto-economy 3.0 The final result is the crypto-economy 3.0. It describes the DACs (Decentralized Autonomous Corporations), also called Robocorps. It is an Internet-of-Things scenario where objects are increasingly getting a free hand to make decisions and stimulate the economy: a potential forerunner of a zero marginal cost society. In such a case blockchain will be part of a collaborative commons, an advanced form of blockchain technology in society. For more information, please contact: About SogetiLabs and VINT SogetiLabs is a network of over 120 technology leaders from Sogeti worldwide. SogetiLabs covers a wide range of digital technology expertise: from embedded software, cyber security, simulation, and cloud to business information management, mobile apps, analytics, testing, and the Internet of Things. The focus is always on leveraging technologies, systems and applications in actual business situations to maximize results. Together with the Sogeti trend lab VINT, SogetiLabs provides insight, research, and inspiration through articles, presentations, and videos that can be downloaded via the extensive SogetiLabs presence on its website, online portals, and social media. About Sogeti Sogeti is a leading provider of technology and software testing, specializing in Application, Infrastructure and Engineering Services. Sogeti offers cutting-edge solutions around Testing, Business Intelligence & Analytics, Mobile, Cloud and Cyber Security, combining world class methodologies and its global delivery model, Rightshore®. Sogeti brings together more than 20,000 professionals in 15 countries and has a strong local presence in over 100 locations in Europe, USA and India. Sogeti is a wholly-owned subsidiary of Cap Gemini S.A., listed on the Paris Stock Exchange. Learn more on www.sogeti.com. The publication of a press release on this page should not be viewed as an endorsement by CoinDesk. Customers should do their own research before investing funds in any company.
News Article | April 2, 2015
It's a common theme that spans functional areas within the organization: data remains stuck in silos, making it all but impossible for decision-makers to get a glimpse at the big picture. Zeroing in on marketers' experience of this problem, Oracle on Wednesday rolled out several enhancements to its Marketing Cloud designed to help companies develop a more holistic view of their customers. It's a common theme that spans functional areas within the organization: data remains stuck in silos, making it all but impossible for decision-makers to get a glimpse at the big picture. Zeroing in on marketers' experience of this problem, Oracle on Wednesday rolled out several enhancements to its Marketing Cloud designed to help companies develop a more holistic view of their customers. Among the new features unveiled at Oracle's Modern Marketing Experience event this week in Las Vegas are Oracle ID Graph, Rapid Retargeter and AppCloud Connect. Oracle ID Graph is designed to help marketers connect the many identities a consumer may have across channels and devices and understand that they all belong to the same person. "Customers interact with companies in dozens of highly fragmented touchpoints," said Kevin Akeroyd, general manager for the Oracle Marketing Cloud. "Most companies can't put together the 'you' they know in each of those -- you end up being 17 different customers in 17 different databases within one company." By surfacing the person behind all those identities, the new capabilities are designed to help marketers orchestrate a consistent and personalized experience, Oracle said. Rapid Retargeter, meanwhile, works toward a similar goal by enabling marketers to tailor customer interactions as they happen and deliver the most relevant and timely message. When a shopper abandons an online shopping cart, for instance, the marketer can immediately send an email reminder of the cart's contents. AppCloud Connect, on the other hand, is a set of APIs and open frameworks designed to enable organizations and technology vendors to leverage apps and media within the context of Oracle Marketing Cloud. Focusing on the data-management portion of the Oracle Marketing Cloud, meanwhile, are two new features designed to help marketers connect audience data from across disparate marketing tools for better targeting. New Lookalike Modeling capabilities aim to make it easy for users to transfer audience data into the data-management platform, where a self-learning algorithm and automated workflow can help identify a target audience. OnDemand On-Board capabilities allow marketers to import data from their Web analytics platforms and use it as audience data within the data-management platform. Finally, Oracle also announced on Wednesday new integrations between Oracle Marketing Cloud and the company's Commerce and WebCenter Sites products. The goal, once again, is to unify the platforms, people and processes underlying the customer experience for more consistent messaging and branding. "It's no longer acceptable to treat the customer as 17 different people," Akeroyd said. "CIOs and CMOs need to be locked at the hip."
News Article | March 12, 2015