[Mediation-information] Causal mediation analysis using R package Mediation - Question

Kosuke Imai kimai at princeton.edu
Sat Dec 31 13:53:42 CET 2011


Hi Mieke,

  Using 169 covariates in a regression model would not be a good idea.  I think propensity score type methods are a reasonable way to balance these variables without having them in a regression model.  Our mediation approach can be applied to the matched sample without any modification.  As for lmer(), we haven't implemented it yet, but the general algorithm described in our psychological methods paper can be applied here too if you know some statistical programming.

Best,
Kosuke

Department of Politics
Princeton University
http://imai.princeton.edu

On Dec 31, 2011, at 3:34 AM, Mieke Goos wrote:

> Dear prof Imai, prof Keele, prof Tingley and prof Yamamoto,
> 
> I am a junior applied researcher, working in the field of educational effectiveness. The topic of my doctoral / post doc project is grade retention. More specifically, I investigate the effects of repeating first grade on children's further academic achievement, psychosocial functioning, and school career throughout elementary school. In my two previous manuscripts, I focused on (1) the retention year effects (short term effects) and how they are moderated by the way the retention year is shaped in terms of additional support given, and (2) longterm effects by using 3level growth curve models. My analyses were framed within the counterfactual framework of causal inference, making use of propensity score stratification (based on 169 child, family, teacher, and school level covariates) to account for selection bias. For my third manuscript, I want to take a next step and have a look at why grade retention has such negative effects on repeaters, as documented in the literature and found in my 2 first manuscripts. Therefore, I built up a theoretical model of which I want to test one potential mediating pathway. To do this properly, I scrolled through the recent literature on mediation analysis and came across your joint contributions. May I congratulate you all for this nice piece of collaborative work. Particularly how you integrated all relevant parts of mediation analysis - definition, identification, estimation, assumptions and sensitivity analysis - into one overarching framework. For applied researchers like me, this really is a huge step forward. I read your articles attentively and must say that I really learned a lot doing that. 
> 
> Still, I have two small practical questions left. And I wonder if you, as experts in the field of causal mediation analysis, could help me herewith.
> - As should be clear from above, my study is observational by nature. I follow about 4000 Belgian students from kindergarten until they enter secondary education. 298 of them repeated first grade. I have data about 169 potential confounders, all measured prior to any student being retained, of which some relate to the student (e.g., achievement scores, family SES, participation in school, ...), some to his/her class or teacher (e.g., teacher characteristics, class composition variables, teacher attitudes, class practices ...), and some to his/her school (e.g., principal characteristics, school size, team collaboration, leadership, school team attitudes ....). Treatment is a dummy (1 = repeater, 0 = promoted student). And I have several outcomes, measured for both promoted and retained students (math IRT calibrated theta score, reading fluency composite score, and several psychosocial scales). My first practical question concerns the covariates. In the simple T - Y version (in my first two manuscripts), I checked which of these 169 covariates were related to both my T and Y. And with this reduced set of covariates, I estimated each child's probability of first-grade retention by means of a 3level logistic regression model, yielding one 'supercovariate' for my final 3level growth curve models and 2level linear regression models with which I estimated my average causal effects. Thus, simple case, with one simple supercovariate. Now I am wondering how to proceed in the T - M - Y case. In your articles, I read that X should be similar over the mediator model and outcome model. And that you need as many pretreatment covariates as possible for the Sequential Ignorability assumption to hold in a credible way. But, I guess your R package will not allow 169 covariates to be controlled for, right? (I didn't try that as I assume it would give problems with degrees of freedom). Do you have any advice in this case? I ran you R package Mediation today with the PS I used in the past as X, and of course it gives some results, but I am wondering whether this in fact is theoretically okay as I only control for T - Y confounders, regardless of relation with M.
> - Second, as all educational scientists, I have clustered data with students (level 1) nested in schools (level 2). Are there currently any plans to extend your package so mediation () would allow for lmer models? (I already tried that today myself, very quickly, and it gave some results, surprisingly, but with a coeff error though)
> 
> Many thanks in advance!
> 
> Happy new year and best wishes for 2012!
> 
> Kind regards,
> Mieke
> 
> 
> ------------------------------------------------------------------------------------
> Mieke Goos
> Centre for Educational Effectiveness and Evaluation
> Catholic University of Leuven
> Dekenstraat 2
> VHI 02.45
> 3000 Leuven
> Belgium
> 
> Tel: (+32)(0)16/32.58.14
> email: mieke.goos at ppw.kuleuven.be
> url: http://www.steunpuntloopbanen.be/



More information about the Mediation-information mailing list