Variable selection for confounder
control, flexible modeling and
Collaborative Targeted Minimum
Loss-based Estimation in causal
inference
Mireille E. Schnitzer, Judith J. Lok and Susan Gruber
International Journal of Biostatistics, 2016 May, 1
Presented by
Rajesh Majumder
1st August 2023
Presented by,
Rajesh Majumder
17th August 2023
What is causal inference and confounder ?
Causal inference
In simple words, its the study of cause-and-effect relationships.
Those who practice causal inference ask questions such as does X cause Y, what are the effects
of changing X on Y?
Examples:
Effect of treatment on a disease
Effect of climate change policy on emissions
Effect of social media on mental health
Effect of Telecom Tower on the extinction of sparrows
Confounder
A confounder is a variable that should be casually
associated with both the exposure and the outcome, and
is not on the causal pathway between X and Y.
An unmeasured common cause can also be a source of
confounding of the X→Y relationship.
Example: Effect of sleeping with shoes on waking up with
a headache.
Causal structure, where drinking
the night before is a common cause of
sleeping with shoes on and of waking up
with a headaches, i.e., a confounder.
Reason for variable selection in Causal inference
UE/ consistent estimation of the parameter of interest(marginal treatment-
specific mean) requires full or partial knowledge of the causal structure and
to preselect all potential confounders and avoid post treatment variables.
This leads to a large set of potential pre-treatment variables (or, confounders)
=> (i) inflated variance / or, (ii) inability to fit propensity score model due to
curse of dimensionality.
Variable selection is required in this scenario to fit propensity score model
and to get an UE and consistent estimator of the parameter of interest(using
double-robust estimation, for example, IPTW, TMLE, C-TMLE).
What is Double-robust Estimator ?
Double-robust estimators (Rotnitzky and Robins, 1995a, Robin et. al., 1995)
are a class of methods that require fitting both the propensity score
model and a model for the expectation of the outcome conditional on
treatment and covariates.
These methods are called “double-robustbecause if either of these
two models is correctly specified, the estimator will be consistent for
the parameter of interest.
Example: IPTW, TMLE, C-TMLE etc.
Terminologies & Properties for Causal Inferences
Let, 󰇛
󰇜, i.i.d samples.
where,
A= {0,1} Indicator variable for the treatment
X =>  vector of variables that might
confound A→Y relationship.
Y =>  outcome of interest.
Note that, In the Neyman-Rubin
counterfactual framework (Rubin,1974):
Potential outcome under treatment
Potential outcome under no treatment
Then, the Causal effect would be:
Fundamental problem of causal
inference
Terminologies & Properties for Causal Inferences
Let, 󰇛
󰇜, i.i.d samples.
where,
A= {0,1} Indicator variable for the treatment
X =>  vector of variables that might
confound A→Y relationship.
Y =>  outcome of interest.
Note that, In the Neyman-Rubin
counterfactual framework (Rubin,1974):
Potential outcome under treatment
Potential outcome under no treatment
Then, the Causal effect would be:
Fundamental problem of causal
inference
Terminologies & Properties for Causal Inferences
Individual treatment effect(ITE):
,
Average treatment effect(ATE):
󰇟 ] = 󰇟 ]  󰇟 ]
Average treatment effect Associational difference (may/may not)
So, to estimate this ATE from a statistical model the following criterias need to
be satisfy:
Ignorability & Exchangeability
Identifiability
Conditional Exchangeability/ Unconfoundedness
Positivity
No Interference
Consistency
Ignorability:
󰇟 ]  󰇟  ] =  󰇟]
Allows us to reduce the ATE to the associational difference
Exchangeability:   

It means that if treatment groups were swapped, the new treatment group would
observe the same outcomes as the old treatment group, and the new control group
would observe the same outcomes as the old control group.
Terminologies & Properties for Causal Inferences
Causal structure of confounding
the effect of A on Y
AA
Causal structure when the
treatment assignment mechanism is ignorable.(RCT)
Identifiability:
A causal quantity (e.g. 󰇟 ] ;A=a={0,1}) is identifiable
if we can compute it from a purely statistical quantity (e.g. 󰇟]).
Conditional Exchangeability/ Unconfoundedness:

i.e., Conditional ATE=
󰇟 |X] = 
󰇟 ]=   󰇟  ]
i.e., there is no confounding within levels of X because controlling for X
has made the treatment groups comparable.
So, the marginal ATE/ ATE:
󰇟 ]= 󰇟  󰇠 󰇟  󰇠
Terminologies & Properties for Causal Inferences
A
Causal structure of confounding
the effect of A on Y
A
Illustration of conditioning on X
leading to no confounding.
Positivity: is the condition that all subgroups of the data with different covariates have
some probability of receiving any value of treatment :
For all values of covariates i.e., such that 
Positivity violation implies zero probability event =0 or
=0 for some values of where, .
Terminologies & Properties for Causal Inferences
No interference: No interference means that my outcome is unaffected by anyone
else’s treatment. Rather, my outcome is only a function of my own
treatment.
 
󰇛󰇜
Consistency: If the treatment is , then the observed outcome is the potential outcome
under treatment . Formally,
We could write this equivalently as follow: 󰇛󰇜
Assumptions and Motivation for variable reduction
Assumptions
1. The initial set is sufficient.
2. Ignorability holds on full set of variables as well as  where any superset
of is also sufficient to control confounding bias.
3. Positivity assumption holds but, partial positivity violations may occur.
4. There will be no pre-treatment collider bias/ M-bias.
Motivation for variable reduction
1. Inability to fit propensity score model due to “curse of dimensionality
2. Variance inflation caused by the inclusion of instruments or weak confounders that
strongly predict treatment and Positivity violation.
M Bias
A
Variance inflation for the inclusion of an instrumental variable
Let, observed data: & Interested to estimate: 󰇛 󰇜.
Let, , 󰇛  󰇜 where 
 & let, 󰇛 󰇜
Variance of IPTW : 
󰇛 󰇜
󰇛󰇜 (under ignorability without conditioning on )
Now, suppose is an instrument it influences , but not 󰇛󰇜.
Then, Variance of IPTW:  󰇛   󰇜
󰇛󰇜
So, the large sample variance inflation is: 󰇛󰇜
󰇛󰇜 󰇛󰇜
󰇛󰇜
Variance inflation for the inclusion of an instrumental variable
Large-sample variance inflation (VI) from including a binary instrument
in the IPTW model (a) when varying the probabilities of treatment and
while setting  and (b) when varying the ratio of treatment probabilities
󰇛󰇜 󰇛󰇜 and the probability of having the instrument characteristic.
Targeted Minimum Loss-based Estimation (TMLE)
TMLE (proposed by, van der Laan and Rubin,2006)
produces a semiparametric efficient and double
robust plug-in estimators, by updating Efficient
influence function.
Let, true conditional mean outcome model:
 󰇛 󰇜 (i.e., simple regression model)
which is a plug-in estimator of

Let, The true propensity score model:
󰇛 󰇜
Starts with initial choice of estemates  &
Updates the initial estimate  by updating Efficient
Influence Function 󰇛󰇜 , to get

Then; 

 be the TMLE estimate.
1
2
3
TMLE Steps
TMLE Example
Taken from the paper, page no.: 11
Propensity score model need only condition on the error of a misspecified outcome model.
Let, = correctly specified model with estimate  and let,
 =misspecified
model with estimate
 . (both are consistent)
Then, a double robust estimator using
 is UE. for iff,


 
It, has interesting consequences for variable selection.
Let, = minimal sufficient confounding set with the initial outcome model:


And let



 
Then, the double robust estimator(TMLE) with initial model estimates
 &
 will
produce consistent estimate of target parameter ().
Collaborative double robust adjustment
Collaborative double robust adjustment
Example
Collaborative Targeted Maximum Likelihood Estimation
(C-TMLE)
C-TMLE(van der Laan and Gruber,2010) is based on two principles:
1. Variable selection.
2. Select an optimal estimator
 from a convergent sequence of estimators.
C-TMLE starts with an initial “currentestimate of the conditional expectation
of outcome, and with given an estimate of the propensity score, TMLE
steps modifies and produce an updated estimate of the conditional
expectation of outcome.
The goodness-of-fit of this updated estimate is assessed through a chosen loss
function 󰇛 󰇜.
C-TMLE Procedure
Starts with, 󰇛󰇜 and Set, 󰇛󰇜 and let
P.S. Model= 󰇛󰇜
TMLE update on 󰇛 󰇛󰇜 󰇜
results 󰇛󰇜
Add new variable into the propensity
score model 󰇛󰇜 using forward
variable selection step to get new 󰇛󰇜
Then update TMLE on
󰇛 󰇛󰇜 󰇜to produce
new candidate, .
If, 󰇛 󰇜 󰇛 󰇜,
Define 
If, 󰇛 󰇜 󰇛 󰇜, set
(new estimate) and add new
variable into 󰇛󰇜 using FVS to get
new 󰇛󰇜
Then update TMLE on
󰇛 󰇛󰇜 󰇜to produce
new candidate, .
Creates list of candidate estimates
󰇛󰇜 . Then the C-TMLE
estimate is
󰇛󰇜with lowest
cross-validated estimation of loss.
Let be the size of variable set
. 
Simulation Study
Estimation in the presence of strong instruments
Data generated with:
1. 5 uncorrelated baseline variable from U(0,3)
which are confounders.
2. 2 instruments from Multivariate Normal
distribution which were strong predictor of
treatment.
3. 2 risk factors from U(0,3) that only influence
the outcome.
4. 1 treatment variable, generated by fitting a
logistic regression on confounders &
instruments.
5. 1 outcome variable, linear on confounders,
instruments & risk factors and Normally
distributed.
True value of the ATE is
Simulation Result
P(A=1| Confounders)= 0.2
P(A=0| Confounders)= 0.06
P(A=1| Confounders & Instruments)= 0.004
P(A=0| Confounders & Instruments)= 0.001
Variance estimation and convergence
Estimation in a high-dimensional covariate space
90 baseline variables: 20 confounders, 10 highly correlated instruments, 10 pure causes of the outcome, 20
noise variables, and 30 proxies of the observed confounders (generated using means that were linear
combinations of the realizations of the true confounders).
Estimation
Variance Estimation
Discussion
When the true knowledge of underlying DAG is not fully known, sufficient variable
selection approach advises to select all variables : and/or ., which
creates major problems.
Inclusion of unnecessary variable for Treatment (instrumental variables) would lead to
cause for HD set of variable and for variance inflation.
So, secondary variable selection is needed to get efficient, consistent unbiased
estimate for ATE.
TMLE & C-TMLE , are the alternatives of IPTW which are robust to the inclusion of
strong instruments.
Both Classical and Flexible(IPTW, TMLE & C-TMLE) are not robust in the presence of
colliders.(i.e., M-bias)
From simulation we’ve seen, IPTW can lead higher squared errors and estimation
bias(modeling of propensity score may include variables which are not true
confounders).
TMLE and C-TMLE solves this problem by choosing flexible modeling.
C-TMLE does not perform well when initial model is poorly estimated. So, use flexible
method: Super-Learner
Simulation revealed that influence function-based estimators for SE of TMLE & C-TMLE
with super Learner can be overly liberal for finite samples, resulting in less than 95%
coverage. . Although they performed well for TMLE with logistic regression model.
Discussion
R Packages that were used:
1. SuperLearner
2. earth
3. ipred
4. rpart
5. gam
6. glmnet
7. tmle
Research Paper soft copy link:
https://www.ncbi.nlm.nih.gov/pmc/articles/P
MC4733443/
THANK YOU
Supplementary:1 Construction of Asymptotic Variance
Let, observed data:  where, only binary baseline covariate, binary treatment &
outcome.
Interested to estimate: 󰇛 󰇜.
Suppose, without any conditioning on .
Let, , 󰇛  󰇜 and 󰇛  󰇜
 & let, 󰇛 󰇜
If is not included, can be estimated by,
(empirical mean of )
Then, IPTW estimating equation:
󰇛 󰇜

Therefore, using Delta method; asymp. Variance of IPTW: 
󰇛 󰇜
󰇛󰇜
Supplementary:1 Construction of Asymptotic Variance
Similarly, under the assumption , i.e., conditioning on :

estimate of 󰇛 󰇜
󰇛󰇜
󰇛󰇜 estimate of 󰇛 󰇜
Then by solving IPTW estimation equation, we’ll get IPTW estimator :


󰇛 󰇜
And the asymptotic inference will depend on the causal relationship b/w, and and 󰇛󰇜
Supplementary:1 Variance inflation for the inclusion of an
instrumental variable
Here, suppose is an instrument it influences , but not 󰇛󰇜.
So, Including as a covariate in the propensity score model leads to consistent & asymptotically
normal inference (if 󰇛󰇜 and 󰇛󰇜 ) and, using Delta method we’ll get:
Asymp. Variance of IPTW:  󰇛   󰇜
󰇛󰇜
So, the large sample variance inflation is: 󰇛󰇜
󰇛󰇜 󰇛󰇜
󰇛󰇜
Independent of distribution of .
For , Inflation iff, 󰇛󰇜
For and fixed Inflation is always greater than 1.
This indicates including in propensity score model will never decrease the variance.
Supplementary:2 Targeted Minimum Loss-based
Estimation (TMLE)
TMLE (proposed by, van der Laan and Rubin,2006) produces a semiparametric
efficient and double robust plug-in estimators, by updating Efficient influence
function.
Preliminary assumptions:
Let, Target parameter: 󰇛 󰇜 (under ignorability)
Let, true conditional mean outcome model:  󰇛 󰇜 (i.e., simple
regression model)
which is a plug-in estimator of  .
Let, The true propensity score model: 󰇛 󰇜
TMLE steps:
TMLE solves the empirical mean of the efficient influence function of set equal to
zero.
Efficient influence function:
 󰇛 󰇜
 
TMLE initially starts with estimates  & and updates to produce
 .
Then; 

 be the TMLE estimate.
This  then solves 
 
And will give asymptotically unbiased locally efficient and double robust estimate of .
Note that, if  is a sufficient confounding set, the TMLE estimation of exists.
Supplementary:2 Targeted Minimum Loss-based
Estimation (TMLE)
Supplementary:3 Collaborative double robust
adjustment
This stats that, propensity score model need only condition on the error of a misspecified
outcome model, to obtain UE of targeted parameter().
Let, = correctly specified model & be the estimate; then it satisfies
Now let,
 =misspecified model then,

 
where,
 
Then, a double robust estimator using
 is UE. of targeted parameter() iff,


 
It, has interesting consequences for variable selection.
Let, = minimal sufficient confounding set (where 󰇜
Then, the chosen initial outcome model:


And let, propensity score model:


 
Then, the double robust estimator(TMLE) with initial model estimates
 &
 will produce consistent estimate of target parameter ().
Supplementary:3 Collaborative double robust
adjustment (variable selection property)
Supplementary:4 Simulation study
be a set of potential confounders :  which is sufficient.
The estimation setting was based on two categories:
1.
known & 2. W unknown
be a set of potential confounders :  which is sufficient.
“Super Learner” method for estimation was used in some cases.
Logistic regression was used for estimation of propensity score and conditional outcome.
The simulation was caried out for n= (200, 1000, 5000) respectively.
Each time IPTW and TMLE were compared.
When known: IPTW-W & TMLE-W (model by conditioning on W)
unknown: IPTW-all & TMLE-all (model by conditioning on )
IPTW-SL & TMLE-SL (model by conditioning on and Super Learner)
C-TMLE was defined two ways: (1) CTMLE-all-noQ ( estimated as without including any info
of .)
(2) CTMLE-SL (using Super Lerner)
Also, estimated, IPTW-select (fit both model on full and select the statistically significant  i.e.,
P-value<0.05)
Median statistics(Median SSE, Median bias) were used to validate the simulated estimates.