Each month we scan and review the literature to identify new papers describing approaches to active monitoring, signal detection, and other epidemiologic and statistical methods.
Understanding and diagnosing the potential for bias when using machine learning methods with doubly robust causal estimators.Bahamyirou A, Blais L, Forget A, Schnitzer ME. Stat Methods Med Res. 2019 Jun;28(6):1637-1650. doi: 10.1177/0962280218772065. Epub 2018 May 2. Read More
Data-adaptive methods have been proposed to estimate nuisance parameters when using doubly robust semiparametric methods for estimating marginal causal effects. However, in the presence of near practical positivity violations, these methods can produce a separation of the two exposure groups in terms of propensity score densities which can lead to biased estimates of the treatment effect. To motivate the problem, we evaluated the Targeted Minimum Loss-based Estimation procedure using a simulation scenario to estimate the average treatment effect. We highlight the divergence in estimates obtained when using parametric and data-adaptive methods to estimate the propensity score. We then adapted an existing diagnostic tool based on a bootstrap resampling of the subjects and simulation of the outcome data in order to show that the estimation using data-adaptive methods for the propensity score in this study may lead to large bias and poor coverage. The adapted bootstrap procedure is able to identify this instability and can be used as a diagnostic tool.
Emulating a trial of joint dynamic strategies: An application to monitoring and treatment of HIV-positive individuals. Caniglia EC, Robins JM, Cain LE, Sabin C, Logan R, Abgrall S, Mugavero MJ, Hernández-Díaz S, Meyer L, Seng R, Drozd DR,Seage Iii GR, Bonnet F, Le Marec F, Moore RD, Reiss P, van Sighem A, Mathews WC, Jarrín I, Alejos B, Deeks SG, Muga R, Boswell SL, Ferrer E, Eron JJ, Gill J, Pacheco A, Grinsztejn B, Napravnik S, Jose S, Phillips A, Justice A, Tate J, Bucher HC, Egger M, Furrer H, Miro JM, Casabona J, Porter K, Touloumi G, Crane H, Costagliola D, Saag M, Hernán MA. Stat Med. 2019 Jun 15;38(13):2428-2446. doi: 10.1002/sim.8120. Epub 2019 Mar 18. Read More
Decisions about when to start or switch a therapy often depend on the frequency with which individuals are monitored or tested. For example, the optimal time to switch antiretroviral therapy depends on the frequency with which HIV-positive individuals have HIV RNA measured. This paper describes an approach to use observational data for the comparison of joint monitoring and treatment strategies and applies the method to a clinically relevant question in HIV research: when can monitoring frequency be decreased and when should individuals switch from a first-line treatment regimen to a new regimen? We outline the target trial that would compare the dynamic strategies of interest and then describe how to emulate it using data from HIV-positive individuals included in the HIV-CAUSAL Collaboration and the Centers for AIDS Research Network of Integrated Clinical Systems. When, as in our example, few individuals follow the dynamic strategies of interest over long periods of follow-up, we describe how to leverage an additional assumption: no direct effect of monitoring on the outcome of interest. We compare our results with and without the "no direct effect" assumption. We found little differences on survival and AIDS-free survival between strategies where monitoring frequency was decreased at a CD4 threshold of 350 cells/μl compared with 500 cells/μl and where treatment was switched at an HIV-RNA threshold of 1000 copies/ml compared with 200 copies/ml. The "no direct effect" assumption resulted in efficiency improvements for the risk difference estimates ranging from an 7- to 53-fold increase in the effective sample size.
Bayesian estimation of the average treatment effect on the treated using inverse weighting. Capistrano ESM, Moodie EEM, Schmidt AM. Stat Med. 2019 Jun 15;38(13):2447-2466. doi: 10.1002/sim.8121. Epub 2019 Mar 11. Read More
We develop a Bayesian approach to estimate the average treatment effect on the treated in the presence of confounding. The approach builds on developments proposed by Saarela et al in the context of marginal structural models, using importance sampling weights to adjust for confounding and estimate a causal effect. The Bayesian bootstrap is adopted to approximate posterior distributions of interest and avoid the issue of feedback that arises in Bayesian causal estimation relying on a joint likelihood. We present results from simulation studies to estimate the average treatment effect on the treated, evaluating the impact of sample size and the strength of confounding on estimation. We illustrate our approach using the classic Right Heart Catheterization data set and find a negative causal effect of the exposure on 30-day survival, in accordance with previous analyses of these data. We also apply our approach to the data set of the National Center for Health Statistics Birth Data and obtain a negative effect of maternal smoking during pregnancy on birth weight.
A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11. Read More
OBJECTIVES: The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. STUDY DESIGN AND SETTING: We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. RESULTS: We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. CONCLUSION: We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.
Evaluating the use of bootstrapping in cohort studies conducted with 1:1 propensity score matching-A plasmode simulation study. Desai RJ, Wyss R, Abdia Y, Toh S, Johnson M, Lee H, Karami S, Major JM, Nguyen M, Wang SV, Franklin JM, Gagne JJ. Pharmacoepidemiol Drug Saf. 2019 Jun;28(6):879-886. doi: 10.1002/pds.4784. Epub 2019 Apr 24. Read More
PURPOSE: Bootstrapping can account for uncertainty in propensity score (PS) estimation and matching processes in 1:1 PS-matched cohort studies. While theory suggests that the classical bootstrap can fail to produce proper coverage, practical impact of this theoretical limitation in settings typical to pharmacoepidemiology is not well studied. METHODS: In a plasmode-based simulation study, we compared performance of the standard parametric approach, which ignores uncertainty in PS estimation and matching, with two bootstrapping methods. The first method only accounted for uncertainty introduced during the matching process (the observation resampling approach). The second method accounted for uncertainty introduced during both PS estimation and matching processes (the PS reestimation approach). Variance was estimated based on percentile and empirical standard errors, and treatment effect estimation was based on median and mean of the estimated treatment effects across 1000 bootstrap resamples. Two treatment prevalence scenarios (5% and 29%) across two treatment effect scenarios (hazard ratio of 1.0 and 2.0) were evaluated in 500 simulated cohorts of 10 000 patients each. RESULTS: We observed that 95% confidence intervals from the bootstrapping approaches but not the standard approach, resulted in inaccurate coverage rates (98%-100% for the observation resampling approach, 99%-100% for the PS reestimation approach, and 95%-96% for standard approach). Treatment effect estimation based on bootstrapping approaches resulted in lower bias than the standard approach (less than 1.4% vs 4.1%) at 5% treatment prevalence; however, the performance was equivalent at 29% treatment prevalence. CONCLUSION: Use of bootstrapping led to variance overestimation and inconsistent coverage, while coverage remained more consistent with parametric estimation.
Evaluation of Socioeconomic Status Indicators for Confounding Adjustment in Observational Studies of Medication Use. Gopalakrishnan C, Gagne JJ, Sarpatwari A, Dejene SZ, Dutcher SK, Levin R, Franklin JM, Schneeweiss S, Desai RJ. Clin Pharmacol Ther. 2019 Jun;105(6):1513-1521. doi: 10.1002/cpt.1348. Epub 2019 Feb 25. Read More
Methodologic research evaluating confounding due to socioeconomic status (SES) in observational studies of medications is limited. We identified 7,109 patients who initiated brand or generic atorvastatin from Medicare claims (2011-2013) linked to electronic medical records and census data. We created a propensity score (PS) containing only claims-based covariates and augmented it with additional claims-based proxies for SES, ZIP code, and block group level SES. Cox models with PS fine-stratification and weighting were used to compare rates of a cardiovascular end point and emergency department visits. Adjustment with only claims-based variables substantially improved balance on all SES variables compared with the unadjusted. Although inclusion of SES in PS models further improved balance on SES variables compared with models with claims-based covariates only, it did not materially change point estimates for either outcome. Inclusion of claims-based proxies may mitigate confounding by SES when aggregate-level SES information is unavailable.
On adaptive propensity score truncation in causal inference. Ju C, Schwab J, van der Laan MJ. Stat Methods Med Res. 2019 Jun;28(6):1741-1760. doi: 10.1177/0962280218774817. Epub 2018 Jul 11. Read More
The positivity assumption, or the experimental treatment assignment (ETA) assumption, is important for identifiability in causal inference. Even if the positivity assumption holds, practical violations of this assumption may jeopardize the finite sample performance of the causal estimator. One of the consequences of practical violations of the positivity assumption is extreme values in the estimated propensity score (PS). A common practice to address this issue is truncating the PS estimate when constructing PS-based estimators. In this study, we propose a novel adaptive truncation method, Positivity-C-TMLE, based on the collaborative targeted maximum likelihood estimation (C-TMLE) methodology. We demonstrate the outstanding performance of our novel approach in a variety of simulations by comparing it with other commonly studied estimators. Results show that by adaptively truncating the estimated PS with a more targeted objective function, the Positivity-C-TMLE estimator achieves the best performance for both point estimation and confidence interval coverage among all estimators considered.
Evaluating classification accuracy for modern learning approaches. Li J, Gao M, D'Agostino R. Stat Med. 2019 Jun 15;38(13):2477-2503. doi: 10.1002/sim.8103. Epub 2019 Jan 30. Read More
Deep learning neural network models such as multilayer perceptron (MLP) and convolutional neural network (CNN) are novel and attractive artificial intelligence computing tools. However, evaluation of the performance of these methods is not readily available for practitioners yet. We provide a tutorial for evaluating classification accuracy for various state-of-the-art learning approaches, including familiar shallow and deep learning methods. For qualitative response variables with more than two categories, many traditional accuracy measures such as sensitivity, specificity, and area under the receiver operating characteristic curve are not applicable and we have to consider their extensions properly. In this paper, a few important statistical concepts for multicategory classification accuracy are reviewed and their utilities for various learning algorithms are demonstrated with real medical examples. We offer problem-based R code to illustrate how to perform these statistical computations step by step. We expect that such analysis tools will become more familiar to practitioners and receive broader applications in biostatistics.
A Machine-Learning Algorithm to Optimise Automated Adverse Drug Reaction Detection from Clinical Coding. McMaster C, Liew D, Keith C, Aminian P, Frauman A. Drug Saf. 2019 Jun;42(6):721-725. doi: 10.1007/s40264-018-00794-y. Read More
INTRODUCTION: Adverse drug reaction (ADR) detection in hospitals is heavily reliant on spontaneous reporting by clinical staff, with studies in the literature pointing to high rates of underreporting . International Classification of Diseases, 10th Revision (ICD-10) codes have been used in epidemiological studies of ADRs and offer the potential for automated ADR detection systems. OBJECTIVE: The aim of this study was to develop an automated ADR detection system based on ICD-10 codes, using machine-learning algorithms to improve accuracy and efficiency. METHODS: For a 12-month period from December 2016 to November 2017, every inpatient episode receiving an ICD-10 code in the range Y40.0-Y59.9 (ADR code) was flagged for review as a potential ADR. Each flagged admission was assessed by an expert pharmacist and, if needed, reviewed at regular ADR committee meetings. For each report, a determination was made about ADR probability and severity. The dataset was randomly split into training and test sets. A machine-learning model using the random forest algorithm was developed on the training set to discriminate between true and false ADR reports. The model was then applied to the test set to assess accuracy using the area under the receiver operating characteristic (AUC). RESULTS: In the study period, 2917 Y40.0-Y59.9 codes were applied to admissions, resulting in 245 ADR reports after review. These 245 reports accounted for 44.5% of all ADR reporting in our hospital in the study period. A random forest model built on the training set was able to discriminate between true and false reports on the test set with an AUC of 0.803. CONCLUSIONS: Automated ADR detection using ICD-10 coding significantly improved ADR detection in the study period, with improved discrimination between true and false reports by applying a machine-learning model.
Composite interaction tree for simultaneous learning of optimal individualized treatment rules and subgroups. Qiu X, Wang Y. Stat Med. 2019 Jun 30;38(14):2632-2651. doi: 10.1002/sim.8105. Epub 2019 Mar 19. Read More
Treatment response heterogeneity has long been observed in patients affected by chronic diseases. Administering an individualized treatment rule (ITR) offers an opportunity to tailor treatment strategies according to patient-specific characteristics. Overly complex machine learning methods for estimating ITRs may produce treatment rules that have higher benefit but lack transparency and interpretability. In clinical practices, it is desirable to derive a simple and interpretable ITR while maintaining certain optimality that leads to improved benefit in subgroups of patients, if not on the overall sample. In this work, we propose a tree-based robust learning method to estimate optimal piecewise linear ITRs and identify subgroups of patients with a large benefit. We achieve these goals by simultaneously identifying qualitative and quantitative interactions through a tree model, referred to as the composite interaction tree (CITree). We show that it has improved performance compared to existing methods on both overall sample and subgroups via extensive simulation studies. Lastly, we fit CITree to Research Evaluating the Value of Augmenting Medication with Psychotherapy trial for treating patients with major depressive disorders, where we identified both qualitative and quantitative interactions and subgroups of patients with a large benefit.
An Implementation and Visualization of the Tree-Based Scan Statistic for Safety Event Monitoring in Longitudinal Electronic Health Data. Schachterle SE, Hurley S, Liu Q, Petronis KR, Bate A. Drug Saf. 2019 Jun;42(6):727-741. doi: 10.1007/s40264-018-00784-0. Read More
INTRODUCTION: Longitudinal electronic healthcare data hold great potential for drug safety surveillance. The tree-based scan statistic (TBSS), as implemented by the TreeScan® software, allows for hypothesis-free signal detection in longitudinal data by grouping safety events according to branching, hierarchical data coding systems, and then identifying signals of disproportionate recording (SDRs) among the singular events or event groups. OBJECTIVE: The objective of this analysis was to identify and visualize SDRs with the TBSS in historical data from patients using two antifungal drugs, itraconazole or terbinafine. By examining patients who used either itraconazole or terbinafine, we provide a conceptual replication of a previous TBSS analyses by varying methodological choices and using a data source that had not been previously used with the TBSS, i.e., the Optum Clinformatics™ claims database. With this analysis, we aimed to test a parsimonious design that could be the basis of a broadly applicable method for multiple drug and safety event pairs. METHODS: The TBSS analysis was used to examine incident events and any itraconazole or terbinafine use among US-based patients from 2002 through 2007. Event frequencies before and after the first day of drug exposure were compared over 14- and 56-day periods of observation in a Bernoulli model with a self-controlled design. Safety events were classified into a hierarchical tree structure using the Clinical Classifications Software (CCS) which mapped International Classification of Diseases, 9th Revision (ICD-9) codes to 879 diagnostic groups. Using the TBSS, the log likelihood ratio of observed versus expected events in all groups along the CCS hierarchy were compared, and groups of events that occurred at disproportionally high frequencies were identified as potential SDRs; p-values for the potential SDRs were estimated with Monte-Carlo permutation based methods. Output from TreeScan® was visualized and plotted as a network which followed the CCS tree structure. RESULTS: Terbinafine use (n = 223,968) was associated with SDRs for diseases of the circulatory system (14- and 56-day p = 0.001) and heart (14-day p = 0.026 and 56-day p = 0.001) as well as coronary atherosclerosis and other heart disease (14-day p = 0.003 and 56-day p = 0.004). For itraconazole use (n = 36,025), the TBSS identified SDRs for coronary atherosclerosis and other heart disease (p = 0.002) and complications of an implanted or grafted device (14-day p = 0.001 and 56-day p < 0.05). Use of both drugs was associated with SDRs for diseases of the digestive system at 14 days (p < 0.05) and this SDR had been observed among terbinafine users in a previous TBSS analysis with a different data source. The TreeScan® visualization facilitated the identification of the atherosclerosis and other heart disease SDRs as well as highlighting the consistency of the SDR for diseases of the digestive system across drugs and data sources. CONCLUSION: With the TBSS, we identified potential SDRs related to the circulatory system that may reflect the cardiac risk that was described in the itraconazole product label. SDRs for diseases of the digestive system among terbinafine users were also reported in a previous signal detection analysis, although other SDRs from the previous publications were not replicated. The TBSS visualizations aided in the understanding and interpretation of the TBSS output, including the comparisons to the previous publications. In this conceptual replication, differences in the results observed in our analysis and the previous analyses could be attributable to variation in modeling and design choices as well as factors that were intrinsic to the underlying data sources. The broad consistency, but far from perfect concordance, of our results with the known safety profile of these antifungals including the risks from the itraconazole product label supports the rationale for continued investigations of signal detection methods across differing data sources and populations.
A Bayesian nonparametric causal inference model for synthesizing randomized clinical trial and real-world evidence. Wang C, Rosner GL. Stat Med. 2019 Jun 30;38(14):2573-2588. doi: 10.1002/sim.8134. Epub 2019 Mar 18. Read More
With the wide availability of various real-world data (RWD), there is an increasing interest in synthesizing information from both randomized clinical trials and RWD for health-care decision makings. The task of addressing study-specific heterogeneities is one of the most difficult challenges in synthesizing data from disparate sources. Bayesian hierarchical models with nonparametric extension provide a powerful and convenient platform that formalizes the information borrowing strength across the sources. In this paper, we propose a propensity score-based Bayesian nonparametric Dirichlet process mixture model that summarizes subject-level information from randomized and registry studies to draw inference on the causal treatment effect. Simulation studies are conducted to evaluate the model performance under different scenarios. In addition, we demonstrate the proposed method using data from a clinical study on angiotensin converting enzyme inhibitor for treating congestive heart failure.