Title: Methods in Large Scale Multiple Testing: Mixture Null, Small Sample Replicates, and Power Boosting
Dn this dissertation, we study some methods in multiple testing. In the first topic, we consider the setting of gene expression experiments that use logfold change statistics where the null distribution is assumed to be a mixture of two normal distributions. An important issue in this setting is choosing the optimal interval of statistic values with which to estimate the null distribution. A modified cumulative sum changepoint detection criterion is constructed for this purpose and incorporated in three different methods for estimating local false discovery rate. In simulation studies, it is shown that two of those three methods successfully control false discovery rate (FDR). Both methods that controlled FDR produced better power than a baseline method.
In the second topic, the problem of small sample replicates in logfold change-based experiments is addressed. A 2-stage method was constructed that addressed the magnitude of the signal and the variability of the signal separately. It is shown that the method controls false discovery rate, and that it performs competitively compared to a baseline method when there is considerable variability in the weighted counts of replicates coming from the alternative distribution.
In the third topic, a new decision rule is proposed under some structural assumptions. When it can be assumed that the p-values of true nulls are uncorrelated, it is shown that this decision rule controls family-wise error rate (FWER) in the weak sense. Furthermore, under some conditions, simulation studies are presented to show that it controls false discovery rate in the strong sense. Most importantly, it is demonstrated using genome-wide association studies data how this method can be used as an "add-on" to existing FDR controlling methods in order to "boost" overall power.