Modified interactive Q-learning for attenuating the impact of model misspecification with treatment effect heterogeneity

A sequential multiple assignment randomized trial, which incorporates multiple stages of randomization, is a popular approach for collecting data to inform personalized and adaptive treatments. There is an extensive literature on statistical methods to analyze data collected in sequential multiple assignment randomized trials and estimate the optimal dynamic treatment regime. Q-learning with linear regression is widely used for this purpose due to its ease of implementation. However, model misspecification is a common problem with this approach, and little attention has been given to the impact of model misspecification when treatment effects are heterogeneous across subjects. This article describes the integrative impact of two possible types of model misspecification related to treatment effect heterogeneity: omitted early-stage treatment effects in late-stage main effect model, and violated linearity assumption between pseudo-outcomes and predictors despite non-linearity arising from the optimization operation. The proposed method, aiming to deal with both types of misspecification concomitantly, builds interactive models into modified parametric Q-learning with Murphy’s regret function. Simulations show that the proposed method is robust to both sources of model misspecification. The proposed method is applied to a two-stage sequential multiple assignment randomized trial with embedded tailoring aimed at reducing binge drinking in first-year college students.


A.2 Estimation Bias
The proof of the theorems can be found in Section B.
Theorem A.1 (Matrix Version of the Omitted Variable Bias Theorem).Suppose that the true regression model for Y is Y = ψ 0 + X T ψ 1 + V T γ + ε, where X is a random vector formed by measured covariates, V is formed by unmeasured covariates, and ε ∼ N (0, σ 2 ).The parameters associated with measured covariates, ψ ≡ (ψ 0 , ψ T 1 ) T , are thus estimated via the misspecified model y = β 0 + x T β 1 + ε * , where y and x are realizations of Y and X, respectively.Then Let X 2 = (X T 20 , A 2 , A 2 X T 21 ) T denote the full predictor vector and assume that the covariance matrix of X 2 is invertible.By Theorem A.1, where .
The first element of B corresponds to the bias associated with the intercept term and the second element of B corresponds to the bias associated with the covariate predictors.The existence of bias in the estimation of covariate effects is then characterized by the term Cov(X 2 , V 20 ).The bias of β 2 can be rewritten as and Cov(A 2 , V 20 ) = 0 T for SMARTs due to sequential randomization.
Theorem A.2 (Bias of Stage 2 Treatment Effect Estimators).Assume that V 20 is a vector of unmeasured covariates that are independent of A 2 and Cov(X 2 ) is invertible.The estimators of stage 2 heterogeneous treatment effects are unbiased if and only if at least one of the following conditions is satisfied: • V 20 is correlated with neither X 20 nor X 21 .
Theorem A.3 (Bias of Stage 2 Main Effect Estimators).Assume that V 20 is a vector of unmeasured covariates that are independent of A 2 and E(A 2 ) = 0. Suppose that V 20 is correlated with X 20 and Cov(X 2 ) is invertible.Then the estimators of stage 2 main effects are biased and the bias is B ′ γ 20 , where Theorem A.2 shows the importance of balancing sample size in the randomization arms.With unbalanced designs, it is possible to bias the estimation of stage 2 treatment effects.However, this is not the case we consider in this paper.In the M-bridge study, heavydrinkers were re-randomized to A 2 = 1 and A 2 = −1 with equal probabilities, i.e., E(A 2 ) = 0, so no bias would be induced in the identification of stage 2 optimal rules.Theorem A.3 shows that the estimators of stage 2 main effects, however, can be biased if the unmeasured variable V 20 is correlated with X 20 .

Appendix B Proof
Theorem A.1 (Matrix Version of the Omitted Variable Bias Theorem).Suppose that the true where X is a random vector formed by measured covariates, V is formed by unmeasured covariates, and ε ∼ N (0, σ 2 ).The parameters associated with measured covariates, ψ ≡ (ψ 0 , ψ T 1 ) T , are thus estimated via the misspecified model , where y and x are realizations of Y and X, respectively.Then Theorem A.1 is developed based on the omitted variable formula and example in Econometric Analysis by Greene (2002) [2].
Proof.Let X denote a design matrix with x T i as the ith row, i = 1, . . ., n and y denote a vector with y i as the ith element.Suppose that V is observable and let V denote a matrix with v i , a realization of V , as the ith row.Define ∼ X = (1 X), where 1 is a column of 1's.The least squares Consider the multivariate regression The least squares estimator of . The true values of Λ satisfy the equation The first derivative of L with respect to (λ 0 , Λ 1 ) is Hence, the score equations of (λ 0 , Λ 1 ) are Assume that V 20 is a vector of unmeasured covariates that are independent of A 2 and Cov(X 2 ) is invertible.The estimators of stage 2 heterogeneous treatment effects are unbiased if and only if at least one of the following conditions is satisfied: Suppose E(A 2 ) ̸ = 0. Since Cov(X 2 ) is invertible, for an arbitrary vector γ 20 , we have Rewrite Cov(X 2 ) as a partitioned matrix: Therefore, the estimators of stage 2 treatment effects, β 210 and β 211 , are unbiased if and only if Assume that V 20 is a vector of unmeasured covariates that are independent of A 2 and E(A 2 ) = 0. Suppose that V 20 is correlated with X 20 and Cov(X 2 ) is invertible.Then the estimators of stage 2 main effects are biased and the bias is B ′ γ 20 , where Proof.By Theorem A.1, we know that the bias of Note that Cov(X 20 , A 2 X 21 ) = E(A 2 )Cov(X 20 , X 21 ) = 0. Rewrite Cov(X 2 ) as a partitioned matrix: As a preliminary study, we demonstrate existence of the bias caused by an omitted variable in the stage 2 main effect model.The size of the omitted variable can be varied through c 1 .We assume that stage 2 treatment effects are homogeneous in this simulation.For each data generative mechanism with one of the three specifications of ∼ X T 20,i ψ 20 , we evaluate the estimators by Monte Carlo integration using samples of size n (n = 250 or n = 2500) to predict the optimal DTR for a population of N = 10000 subjects with known potential outcomes under the four treatment regimes.The stage 2 model was specified as ) and the stage 1 model was specified as in the stage 2 main effect model causes significant prediction bias in the stage 1 rule identification.Omission of a variable that is uncorrelated with other stage 2 main predictors does not cause bias, but if the omitted variable is correlated with other main predictors, then a small bias is generated.Interactive Q-learning indeed does not tackle this problem, but interactive Qlearning has its virtue in correcting the bias caused by falsely assumed stage 1 linear model under heterogeneous stage 2 treatment effects.
Table C.3.Percentage of correctly identified stage 1 optimal rules when stage 2 treatment effects are homogeneous across patients (α 1 = α 2 = 0), prediction using standard Q-learning and interactive Q-learning, based on a set of test data (N = 10000) and 100 simulations of training data (n = 250).

Method
Omitted Variable

D.2 Identification of Significant Predictors
We determine the most important variables to predict the optimal rules and heterogeneous treatment effects using random forest.The optimal rules are calculated based on an absolute value function of the heterogeneous treatment effects, so their relationship with the predictors is hardly linear and is better represented by decision trees.= 1, based on the primary and secondary outcomes, respectively.Each panel in the figure represents a predictor, and the odds that students would benefit more from early intervention at stage 1 or that heavy drinkers would benefit more from online health coach at stage 2 are plotted against the domain of that predictor, so that characteristics associated with high or lower odds can be easily identified.For example, to minimize the maximum number of drinks consumed in a day, students with the intention and habit of drinking more would benefit more from late intervention at stage 1, and heavy drinkers whose parent has a significant drinking problem would benefit more from online health coach at stage 2. Partial dependence plot for predicting d opt 2 = 1 is more interpretable and meaningful for investigators to identify the subgroup of students in need of the more expensive intervention (online health coach), and allocate resource correctly.The results reviewed by partial dependence plot for predicting d opt 1 = 1 are more intuitive, as completing personalized normative feedback corrects some misperceptions that students with the intention and habit of drinking more would have regarding college drinking, and such students who received the intervention late would have a fresher memory to achieve a more optimal final outcome.

Figure D. 3 .
Figure D.3.Model diagnostics for (a) d opt 1 and d opt 2 and (b) heterogeneous treatment effects at stage 1 and 2, based on the secondary outcome (total number of drinking-related consequences in the past 30 days).

Figure D. 4
Figure D.4 is the variable importance plot for the primary outcome, in which (a) shows the significant variables for predicting the optimal rules and (b) shows the significant variables for predicting the heterogeneous treatment effects, ranked according to their contribution to accuracy and precision.(a) and (b) identify a similar set of significant predictors: at stage 1, the norm on the percentage of first-year college students who had binge drinking during the last two weeks and the habit on the number of days drinking in the past month are the identified variables, whereas at stage 2, whether a student's parent had a significant drinking problem and the intent number of drinks to consume at college on a typical occasion are the identified variables.

Figure D. 4 .
Figure D.4.Variable importance plot for predicting (a) d opt 1 and d opt 2 and (b) heterogeneous treatment effects at stage 1 and 2, based on the primary outcome (maximum number of drinks consumed within a 24-hour period).

Figure D. 5 .
Figure D.5.Variable importance plot for predicting (a) d opt 1 and d opt 2 and (b) heterogeneous treatment effects at stage 1 and 2, based on the secondary outcome (total number of drinkingrelated consequences in the past 30 days).

Figure D. 5 1 =
Figure D.5 is the variable importance plot for the secondary outcome.(a) and (b) identify the same set of significant predictors: at stage 1, race, the habit on the number of days drinking in the past month, and the norm on the percentage of first-year college students who had binge drinking during the last two weeks are the identified variables, whereas at stage 2, the intention to pledge to a Greek life, the norm on the percentage of first-year college students who used alcohol in a month, and the intent drinking frequency per month are the identified variables.Figures D.6 and D.7 show the partial dependence plots for predicting (a) d opt 1 = 1 and (b)

Figure D. 7 . 1 = 1 and (b) d opt 2 = 1 ,
Figure D.7.Partial dependence plot for predicting (a) d opt 1 = 1 and (b) d opt 2 = 1, based on the secondary outcome (total number of drinking-related consequences in the past 30 days).

Table C .
T 2i β 21 when stage 2 treatment effects are homogeneous across patients (α 1 = α 2 = 0) based on a set of test data (N = 10000) and 100 simulations of training data.
2. Bias (mean (SD)) of stage 2 main effect estimator ∼ X T 2i β 20 and stage 2 treatment effect estimator ∼ X We summarize the preliminary results in TableC.2 and Table C.3.Table C.2 shows that omission of the stage 1 heterogeneous treatment effects V i = Z 1i A 1i in the stage 2 main effect as expected from Theorem A.2 as we have a balanced design.TableC.3shows that omission of