This three-year (2022-2025) Medical Research Council project [grant number MR/W021021/1] will improve the robustness of the outputs in public health research.
This will be achieved through increasing the adoption of consolidated, adequate causal inference methods, making the most of recent developments in the fields of statistics, machine learning and data science, such as the targeted maximum likelihood estimation (TMLE) algorithm, a data-adaptive estimation algorithm using machine learning techniques
TMLE is a semiparametric double-robust, efficient substitution estimator allowing for data-adaptive estimation while obtaining valid statistical inference. In addition to being double-robust, TMLE allows the inclusion of machine learning algorithms that minimise the risk of model misspecification, a problem that persists for competing estimators.
Nonetheless, TMLE rests on relatively complex statistical and mathematical concepts that need to be demystified for wider adoption. Thus, we will develop further the user-friendly Stata command, eltmle (https://github.com/migariane/eltmle), to universalise the use of advanced causal inference techniques blending data-adaptive prediction, robust estimation, and statistical inference among applied and public health researchers.
By 2025, we aim to provide applied researchers with
- Tutorials designed to demystify complex mathematical and statistical concepts used in the latest developments of targeted machine learning estimation.
- A simple yet detailed article in the Stata Journal, and online tutorials and empirical applications illustrating the use of eltmle.
- Demonstrations of the good properties of TMLE in simulated scenarios
- Real-life applications of the eltmle command (1) to estimate how the type of working environment causally affects cancer incidence and mortality, and (2) to evaluate the causal effect of the type of colon cancer surgery (laparoscopy vs. open) on 30-day mortality.