Royston P.,MRC Clinical Trials Unit |
Altman D.G.,Center for Statistics in Medicine
Statistics in Medicine | Year: 2010
Logistic regression models are widely used in medicine for predicting patient outcome (prognosis) and constructing diagnostic tests (diagnosis). Multivariable logistic models yield an (approximately) continuous risk score, a transformation of which gives the estimated event probability for an individual. A key aspect of model performance is discrimination, that is, the model's ability to distinguish between patients who have (or will have) an event of interest and those who do not (or will not). Graphical aids are important in understanding a logistic model. The receiver-operating characteristic (ROC) curve is familiar, but not necessarily easy to interpret. We advocate a simple graphic that provides further insight into discrimination, namely a histogram or dot plot of the risk score in the outcome groups. The most popular performance measure for the logistic model is the c-index, numerically equivalent to the area under the ROC curve. We discuss the comparative merits of the c-index and the (standardized) mean difference in risk score between the outcome groups. The latter statistic, sometimes known generically as the effect size, has been computed in slightly different ways by several different authors, including Glass, Cohen and Hedges. An alternative measure is the overlap between the distributions in the outcome groups, defined as the area under the minimum of the two density functions. The larger the overlap, the weaker the discrimination. Under certain assumptions about the distribution of the risk score, the c-index, effect size and overlap are functionally related. We illustrate the ideas with simulated and real data sets. Copyright © 2010 John Wiley & Sons, Ltd.
Bown M.J.,University of Leicester |
Sweeting M.J.,Medical Research Council MRC Biostatistics Unit |
Brown L.C.,MRC Clinical Trials Unit |
Powell J.T.,Imperial College London |
Thompson S.G.,University of Cambridge
JAMA - Journal of the American Medical Association | Year: 2013
Importance: Small abdominal aortic aneurysms (AAAs [3.0 cm-5.4 cm in diameter]) are monitored by ultrasound surveillance. The intervals between surveillance scans should be chosen to detect an expanding aneurysm prior to rupture. Objective: To limit risk of aneurysm rupture or excessive growth by optimizing ultrasound surveillance intervals. Data Sources and Study Selection: Individual patient data from studies of small AAA growth and rupture were assessed. Studies were identified for inclusion through a systematic literature search through December 2010. Study authors were contacted, which yielded 18 data sets providing repeated ultrasound measurements of AAA diameter over time in 15 471 patients. Data Extraction: AAA diameters were analyzed using a random-effects model that allowed for between-patient variability in size and growth rate. Rupture rates were analyzed by proportional hazards regression using the modeled AAA diameter as a time-varying covariate. Predictions of the risks of exceeding 5.5-cm diameter and of rupture within given time intervals were estimated and pooled across studies by random effects meta-analysis. Results: AAA growth and rupture rates varied considerably across studies. For each 0.5-cm increase in AAA diameter, growth rates increased on average by 0.59 mm per year (95% CI, 0.51-0.66) and rupture rates increased by a factor of 1.91 (95% CI, 1.61-2.25). For example, to control the AAA growth risk in men of exceeding 5.5 cm to below 10%, on average, a 7.4-year surveillance interval (95% CI, 6.7-8.1) is sufficient for a 3.0-cm AAA, while an 8-month interval (95% CI, 7-10) is necessary for a 5.0-cm AAA. To control the risk of rupture in men to below 1%, the corresponding estimated surveillance intervals are 8.5 years (95% CI, 7.0-10.5) and 17 months (95% CI, 14-22). Conclusion and Relevance: In contrast to the commonly adopted surveillance intervals in current AAA screening programs, surveillance intervals of several years may be clinically acceptable for the majority of patients with small AAA. ©2013 American Medical Association. All rights reserved.
Kahan B.C.,MRC Clinical Trials Unit |
Morris T.P.,MRC Clinical Trials Unit |
Morris T.P.,Institute of Public Health
Statistics in Medicine | Year: 2013
In multicentre trials, randomisation is often carried out using permuted blocks stratified by centre. It has previously been shown that stratification variables used in the randomisation process should be adjusted for in the analysis to obtain correct inference. For continuous outcomes, the two primary methods of accounting for centres are fixed-effects and random-effects models. We discuss the differences in interpretation between these two models and the implications that each pose for analysis. We then perform a large simulation study comparing the performance of these analysis methods in a variety of situations. In total, we assessed 378 scenarios. We found that random centre effects performed as well or better than fixed-effects models in all scenarios. Random centre effects models led to increases in power and precision when the number of patients per centre was small (e.g. 10 patients or less) and, in some scenarios, when there was an imbalance between treatments within centres, either due to the randomisation method or to the distribution of patients across centres. With small samples sizes, random-effects models maintained nominal coverage rates when a degree-of-freedom (DF) correction was used. We assessed the robustness of random-effects models when assumptions regarding the distribution of the centre effects were incorrect and found this had no impact on results. We conclude that random-effects models offer many advantages over fixed-effects models in certain situations and should be used more often in practice. © 2012 John Wiley & Sons, Ltd.
Kahan B.C.,MRC Clinical Trials Unit |
Morris T.P.,MRC Clinical Trials Unit
Statistics in Medicine | Year: 2012
Many clinical trials restrict randomisation using stratified blocks or minimisation to balance prognostic factors across treatment groups. It is widely acknowledged in the statistical literature that the subsequent analysis should reflect the design of the study, and any stratification or minimisation variables should be adjusted for in the analysis. However, a review of recent general medical literature showed only 14 of 41 eligible studies reported adjusting their primary analysis for stratification or minimisation variables. We show that balancing treatment groups using stratification leads to correlation between the treatment groups. If this correlation is ignored and an unadjusted analysis is performed, standard errors for the treatment effect will be biased upwards, resulting in 95% confidence intervals that are too wide, type I error rates that are too low and a reduction in power. Conversely, an adjusted analysis will give valid inference. We explore the extent of this issue using simulation for continuous, binary and time-to-event outcomes where treatment is allocated using stratified block randomisation or minimisation. © 2011 John Wiley & Sons, Ltd.
Kahan B.C.,MRC Clinical Trials Unit |
Morris T.P.,MRC Clinical Trials Unit
BMC Medical Research Methodology | Year: 2013
Background: Recent reviews have shown that while clustering is extremely common in individually randomised trials (for example, clustering within centre, therapist, or surgeon), it is rarely accounted for in the trial analysis. Our aim is to develop a general framework for assessing whether potential sources of clustering must be accounted for in the trial analysis to obtain valid type I error rates (non-ignorable clustering), with a particular focus on individually randomised trials. Methods. A general framework for assessing clustering is developed based on theoretical results and a case study of a recently published trial is used to illustrate the concepts. A simulation study is used to explore the impact of not accounting for non-ignorable clustering in practice. Results: Clustering is non-ignorable when there is both correlation between patient outcomes within clusters, and correlation between treatment assignments within clusters. This occurs when the intraclass correlation coefficient is non-zero, and when the cluster has been used in the randomisation process (e.g. stratified blocks within centre) or when patients are assigned to clusters after randomisation with different probabilities (e.g. a surgery trial in which surgeons treat patients in only one arm). A case study of an individually randomised trial found multiple sources of clustering, including centre of recruitment, attending surgeon, and site of rehabilitation class. Simulations show that failure to account for non-ignorable clustering in trial analyses can lead to type I error rates over 20% in certain cases; conversely, adjusting for the clustering in the trial analysis gave correct type I error rates. Conclusions: Clustering is common in individually randomised trials. Trialists should assess potential sources of clustering during the planning stages of a trial, and account for any sources of non-ignorable clustering in the trial analysis. © 2013 Kahan and Morris; licensee BioMed Central Ltd.
Babiker A.G.,MRC Clinical Trials Unit
Clinical trials (London, England) | Year: 2013
Untreated human immunodeficiency virus (HIV) infection is characterized by progressive depletion of CD4+ T lymphocyte (CD4) count leading to the development of opportunistic diseases (acquired immunodeficiency syndrome (AIDS)), and more recent data suggest that HIV is also associated with an increased risk of serious non-AIDS (SNA) diseases including cardiovascular, renal, and liver diseases and non-AIDS-defining cancers. Although combination antiretroviral treatment (ART) has resulted in a substantial decrease in morbidity and mortality in persons with HIV infection, viral eradication is not feasible with currently available drugs. The optimal time to start ART for asymptomatic HIV infection is controversial and remains one of the key unanswered questions in the clinical management of HIV-infected individuals. In this article, we outline the rationale and methods of the Strategic Timing of AntiRetroviral Treatment (START) study, an ongoing multicenter international trial designed to assess the risks and benefits of initiating ART earlier than is currently practiced. We also describe some of the challenges encountered in the design and implementation of the study and how these challenges were addressed. A total of 4000 study participants who are HIV type 1 (HIV-1) infected, ART naïve with CD4 count > 500 cells/μL are to be randomly allocated in a 1:1 ratio to start ART immediately (early ART) or defer treatment until CD4 count is <350 cells/μL (deferred ART) and followed for a minimum of 3 years. The primary outcome is time to AIDS, SNA, or death. The study had a pilot phase to establish feasibility of accrual, which was set as the enrollment of at least 900 participants in the first year. Challenges encountered in the design and implementation of the study included the limited amount of data on the risk of a major component of the primary endpoint (SNA) in the study population, changes in treatment guidelines when the pilot phase was well underway, and the complexities of conducting the trial in a geographically wide population with diverse regulatory requirements. With the successful completion of the pilot phase, more than 1000 participants from 100 sites in 23 countries have been enrolled. The study will expand to include 237 sites in 36 countries to reach the target accrual of 4000 participants. START is addressing one of the most important questions in the clinical management of ART. The randomization provided a platform for the conduct of several substudies aimed at increasing our understanding of HIV disease and the effects of antiretroviral therapy beyond the primary question of the trial. The lessons learned from its design and implementation will hopefully be of use to future publicly funded international trials.
Gibb D.M.,MRC Clinical Trials Unit
The Lancet | Year: 2013
Background No trials have investigated routine laboratory monitoring for children with HIV, nor four-drug induction strategies to increase durability of fi rst-line antiretroviral therapy (ART). Methods In this open-label parallel-group trial, Ugandan and Zimbabwean children or adolescents with HIV, aged 3 months to 17 years and eligible for ART, were randomly assigned in a factorial design. Randomisation was to either clinically driven monitoring or routine laboratory and clinical monitoring for toxicity (haematology and biochemistry) and effi cacy (CD4 cell counts; non-inferiority monitoring randomisation); and simultaneously to standard three-drug or to four-drug induction fi rst-line ART, in three groups: three-drug treatment (non-nucleoside reverse transcriptase inhibitor [NNRTI], lamivudine, abacavir; group A) versus four-drug induction (NNRTI, lamivudine, abacavir, zidovudine; groups B and C), decreasing after week 36 to three-drug NNRTI, lamivudine, plus abacavir (group B) or lamivudine, abacavir, plus zidovudine (group C; superiority ART-strategy randomisation). For patients assigned to routine laboratory monitoring, results were returned every 12 weeks to clinicians; for clinically driven monitoring, toxicity results were only returned for requested clinical reasons or if grade 4. Children switched to second-line ART for WHO stage 3 or 4 events or (routine laboratory monitoring only) age-dependent WHO CD4 criteria. Randomisation used computer-generated sequentially numbered tables incorporated securely within the database. Primary effi cacy endpoints were new WHO stage 4 events or death for monitoring and change in CD4 percentage at 72 and 144 weeks for ART-strategy randomisations; the co-primary toxicity endpoint was grade 3 or 4 adverse events. Analysis was by intention to treat. This trial is registered, ISRCTN24791884. Findings 1206 children were randomly assigned to clinically driven (n=606) versus routine laboratory monitoring (n=600), and groups A (n=397), B (n=404), and C (n=405). 47 (8%) children on clinically driven monitoring versus 39 (7%) on routine laboratory monitoring had a new WHO stage 4 event or died (hazard ratio [HR] 1•13, 95% CI 0•73-1•73, p=0•59; non-inferiority criterion met). However, in years 2-5, rates were higher in children on clinically driven monitoring (1•3 vs 0•4 per 100 child-years, diff erence 0•99, 0•37-1•60, p=0•002). One or more grade 3 or 4 adverse events occurred in 283 (47%) children on clinically driven versus 282 (47%) on routine laboratory monitoring (HR 0•98, 0•83-1•16, p=0•83). Mean CD4 percentage change did not diff er between ART groups at week 72 (16•5% [SD 8•6] vs 17•1% [8•5] vs 17•3% [8•0], p=0•33) or week 144 (p=0•69), but four-drug groups (B, C) were superior to three-drug group A at week 36 (12•4% [7•2] vs 14•1% [7•1] vs 14•6% [7•3], p<0•0001). Excess grade 3 or 4 events in groups B (one or more events reported by 157 [40%] children in A, 190 [47%] in B; HR [B:A] 1•32, 1•07-1•63) and C (218 [54%] children in C; HR [C:A] 1•58, 1•29-1•94; global p=0•0001) were driven by asymptomatic neutropenia in zidovudine-containing groups (B, C; 86 group A, 133 group B, 184 group C), but resulted in drug substitutions in only zero versus two versus four children, respectively. Interpretation NNRTI plus NRTI-based three-drug or four-drug ART can be given across childhood without routine toxicity monitoring; CD4 monitoring provided clinical benefi t after the fi rst year on ART, but event rates were very low and long-term survival high, suggesting ART rollout should take priority. CD4 benefi ts from four-drug induction were not durable, but three-NRTI long-term maintenance was immunologically and clinically similar to NNRTIbased ART and could be valuable during tuberculosis co-treatment.
First-line antiretroviral therapy with a protease inhibitor versus non-nucleoside reverse transcriptase inhibitor and switch at higher versus low viral load in HIV-infected children: An open-label, randomised phase 2/3 trial
Harrison L.,MRC Clinical Trials Unit
The Lancet Infectious Diseases | Year: 2011
Background: Children with HIV will be on antiretroviral therapy (ART) longer than adults, and therefore the durability of first-line ART and timing of switch to second-line are key questions. We assess the long-term outcome of protease inhibitor and non-nucleoside reverse transcriptase inhibitor (NNRTI) first-line ART and viral load switch criteria in children. Methods: In a randomised open-label factorial trial, we compared effectiveness of two nucleoside reverse transcriptase inhibitors (NRTIs) plus a protease inhibitor versus two NRTIs plus an NNRTI and of switch to second-line ART at a viral load of 1000 copies per mL versus 30 000 copies per mL in previously untreated children infected with HIV from Europe and North and South America. Random assignment was by computer-generated sequentially numbered lists stratified by age, region, and by exposure to perinatal ART. Primary outcome was change in viral load between baseline and 4 years. Analysis was by intention to treat, which we defined as all patients that started treatment. This study is registered with ISRCTN, number ISRCTN73318385. Findings: Between Sept 25, 2002, and Sept 7, 2005, 266 children (median age 6·5 years; IQR 2·8-12·9) were randomly assigned treatment regimens: 66 to receive protease inhibitor and switch to second-line at 1000 copies per mL (PI-low), 65 protease inhibitor and switch at 30 000 copies per mL (PI-higher), 68 NNRTI and switch at 1000 copies per mL (NNRTI-low), and 67 NNRTI and switch at 30 000 copies per mL (NNRTI-higher). Median follow-up was 5·0 years (IQR 4·2-6·0) and 188 (71%) children were on first-line ART at trial end. At 4 years, mean reductions in viral load were -3·16 log10 copies per mL for protease inhibitors versus -3·31 log10 copies per mL for NNRTIs (difference -0·15 log10 copies per mL, 95% CI -0·41 to 0·11; p=0·26), and -3·26 log10 copies per mL for switching at the low versus -3·20 log10 copies per mL for switching at the higher threshold (difference 0·06 log10 copies per mL, 95% CI -0·20 to 0·32; p=0·56). Protease inhibitor resistance was uncommon and there was no increase in NRTI resistance in the PI-higher compared with the PI-low group. NNRTI resistance was selected early, and about 10% more children accumulated NRTI mutations in the NNRTI-higher than the NNRTI-low group. Nine children had new CDC stage-C events and 60 had grade 3/4 adverse events; both were balanced across randomised groups. Interpretation: Good long-term outcomes were achieved with all treatments strategies. Delayed switching of protease-inhibitor-based ART might be reasonable where future drug options are limited, because the risk of selecting for NRTI and protease-inhibitor resistance is low. Funding: Paediatric European Network for Treatment of AIDS (PENTA) and Pediatric AIDS Clinical Trials Group (PACTG/IMPAACT). © 2011 Elsevier Ltd.
Kahan B.C.,MRC Clinical Trials Unit
Statistics in Medicine | Year: 2013
Factorial trials are an efficient method of assessing multiple treatments in a single trial, saving both time and resources. However, they rely on the assumption of no interaction between treatment arms. Ignoring the possibility of an interaction in the analysis can lead to bias and potentially misleading conclusions. Therefore, it is often recommended that the size of the interaction be assessed during analysis. This approach can be formalised as a two-stage analysis; if the interaction test is not significant, a factorial analysis (where all patients receiving treatment A are compared with all not receiving A, and similarly for treatment B) is performed. If the interaction is significant, the analysis reverts to that of a four-arm trial (where each treatment combination is regarded as a separate treatment arm). We show that estimated treatment effects from the two-stage analysis can be biased, even in the absence of a true interaction. This occurs because the interaction estimate is highly correlated with treatment effect estimates from a four-arm analysis. Simulations show that bias can be severe (over 100% in some cases), leading to inflated type I error rates. Therefore, the two-stage analysis should not be used in factorial trials. A preferable approach may be to design multi-arm trials (i.e. four separate treatment groups) instead. This approach leads to straightforward interpretation of results, is unbiased regardless of the presence of an interaction, and allows investigators to ensure adequate power by basing sample size requirements on a four-arm analysis. © 2013 John Wiley & Sons, Ltd.
Kahan B.C.,MRC Clinical Trials Unit |
Morris T.P.,MRC Clinical Trials Unit
BMJ (Online) | Year: 2012
Objectives: To assess how often stratified randomisation is used, whether analysis adjusted for all balancing variables, and whether the method of randomisation was adequately reported, and to reanalyse a previously reported trial to assess the impact of ignoring balancing factors in the analysis. Design: Review of published trials and reanalysis of a previously reported trial. Setting: Four leading general medical journals (BMJ, Journal of the American Medical Association, Lancet, and New England Journal of Medicine) and the second Multicenter Intrapleural Sepsis Trial (MIST2). Participants: 258 trials published in 2010 in the four journals. Cluster randomised, crossover, non-randomised, single arm, and phase I or II trials were excluded, as were trials reporting secondary analyses, interim analyses, or results that had been previously published in 2010. Main outcome measures: Whether the method of randomisation was adequately reported, how often balanced randomisation was used, and whether balancing factors were adjusted for in the analysis. Results: Reanalysis of MIST2 showed that an unadjusted analysis led to larger P values and a loss of power. The review of published trials showed that balanced randomisation was common, with 163 trials (63%) using at least one balancing variable. The most common methods of balancing were stratified permuted blocks (n=85) and minimisation (n=27). The method of randomisation was unclear in 37% of trials. Most trials that balanced on centre or prognostic factors were not adequately analysed; only 26% of trials adjusted for all balancing factors in their primary analysis. Trials that did not adjust for balancing factors in their analysis were less likely to show a statistically significant result (unadjusted 57% v adjusted 78%, P=0.02). Conclusion: Balancing on centre or prognostic factors is common in trials but often poorly described, and the implications of balancing are poorly understood. Trialists should adjust their primary analysis for balancing factors to obtain correct P values and confidence intervals and to avoid an unnecessary loss in power.