Title: Statistical Methods for high dimensional mediation analysis
We describe statistical methods for mediation analysis in epidemiologic studies with an exposure, a high dimensional set of biomarkers and an outcome. We consider two scenarios. In the first scenario, we assume that the exposure directly influences a group of latent, or unmeasured, factors which are associated with both the outcome and a subset of biomarkers. The biomarkers associated with the latent factors linking the exposure to the outcome are "mediators". We derive and develop maximization algorithm to L1-penalized version of the likelihood for this model to limit the number of factors and associated biomarkers. We demonstrate that these new procedures can have higher power for detecting mediators in simulation and data example. In the second scenario, we assume the biomarkers are apriori grouped into pathways and, in the second scenario, we assume there is aprior information about the biomarkers. In the first scenario, we offer a two-step approach for identifying mediating pathways. In the first step, we select groups of biomarkers showing nominal associations with both the exposure and outcome, and in the second step, we identify specific biomarkers within those candidates.