5.5 Multiple Testing

#MultipleTesting #BonferroniCorrection #FWER #ScheffeSMethod #Chi2Distribution

1 Multiple Testing

In many testing problems, we want to test many hypotheses at a time, e.g.

Test $H_{0 j} : β_{j} = 0$ for $j = 1, \dots, d$ in linear regression.
Test whether each of $2 M$ single nucleotide polymorphisms (SNPs) is associated with a given phenotype (e.g. diabetes/schizophrenia)
Test whether each of $2000$ website tweaks affects user engagement

Setup: $X \sim P_{θ} \in P$ . $H_{0 i} : θ \in Θ_{0 i}, i = 1, \dots, m$ . (e.g. $θ_{i} (P) = 0$ ) commonly, $H_{0 i} : θ_{i} = 0$ .
Goal: return an accept/reject decision for each $i$ .

R (X) = {i : H_{0 i} rejected}, H_{0} (θ) = {i : H_{0 i} true} .

$R (X) = | R (X) |, m_{0} = | H_{0} |$ .
Problem: Even if all $H_{0 i}$ true, probably have $P (any H_{0 i} rejected) ≫ α$ .
Classical solution: control family-wise error rate (FWER) $\leq α$ . I.e. $sup_{θ \in Θ} FWER (θ) \leq α$ .

1.1 Familywise Error Rate

Problem: even if all $H_{0 i}$ are true, might have $P (any H_{0 i} rejected) ≫ α$ .

Example

$X_{i} \overset{i . n . d}{\sim} N (θ_{i}, 1), i = 1, \dots, m$ , $H_{0 i} : θ_{i} = 0$ . $P_{0} (any H_{0 i} rejected) = 1 - (1 - α)^{m} \to 1$ .

Is this a problem? Yes. If all attention will be focused on the (false) rejections and non on the (correct) non-rejections.

Classical solution is to control the familywise error rate (FWER):

\mathrm{FWER}_{\theta}=\mathbb{P}_{\theta}(\text{any false rejections})=\mathbb{P}_{\theta}(\mathcal{R}\cap \mathcal{H}_{0}\neq \varnothing).$$ We want $\sup_{\theta\in \Theta}\mathrm{FWER}_{\theta} \leq \alpha$. This is typically achieved by "correcting" marginal p-values $p_{1}(x),\cdots,p_{m}(x)$ ($p_{i} \overset{H_{0i}}{\geq }U[0,1]$), e.g. $p_{i}(x)=2(1-\Phi(|x_{i}|))$ for Gaussian. # 2 Bonferroni Correction Assume $p_{1},\cdots,p_{m}$ are p-values for $H_{01},\cdots,H_{0m}$ with $p_{i}\geq U[0,1]$ under $H_{0i}$. For general dependence, can guarantee control by rejecting $H_{0i}$ iff $p_{i} \leq \frac{\alpha}{m}$: $$\mathbb{P}(\text{any }H_{0i}\text{ rejected})\leq \sum_{i\in m_{0}}\mathbb{P}(H_{0i}\text{ rejected})\leq \frac{m_{0}}{m}\alpha \leq \alpha.

If hypothesis tests independent, we can use ${\tilde{α}}_{m} = 1 - (1 - α)^{\frac{1}{m}}$ (Sidak correction)
Then $P_{θ} (no false rejections) = \prod_{i \in H_{0}} P_{θ} (p_{i} > {\tilde{α}}_{m}) \geq (1 - {\tilde{α}}_{m})^{m_{0}} \geq 1 - α .$
For small $α$ , $1 - {\tilde{α}}_{m} = (1 - α)^{\frac{1}{m}} \approx 1 - \frac{α}{m}$ , so Sidak doesn't improve much on Bonferroni.

Example

$X \sim N_{d} (θ, I_{d})$ . Coordinate-wise multiple testing $H_{i} : θ_{i} = 0$ . How large does Bonferroni threshold have to be?
Turns out $n^{2} \sqrt{2 \log d}$ for large $d$ .
Test $H_{λ} : λ^{T} θ = 0, λ \in R^{d}, | | λ | | = 1$ .

3 Testing with Dependence

Bonferroni isn't much worse than Sidak, e.g. $α = 5 %, m = 20$ : $0.0025 vs 0.00256$ . But when tests are highly dependent, can often do much better.

Example (Scheffe's S-method)

$X \sim N_{d} (θ, I_{d}), θ \in R^{d}$ , $H_{0 λ} : θ^{T} λ = 0$ for $λ \in S^{d - 1}$ ( $m = \infty$ )
Reject $H_{0 λ}$ if $| | X^{T} λ | |^{2} \geq χ_{d}^{2} (α) \approx d + 3 \sqrt{d}$ . (Because $χ_{d}^{2} \approx N (d, 2 d)$ ) Controls FWER: $sup_{λ : θ^{T} λ = 0} | | X^{T} λ | |^{2} \leq sup_{λ} | | (X - θ)^{T} λ | |^{2} \sim χ_{d}^{2} (α) .$ Can view as deduction from confidence region $C (X) = {θ : | | θ - X | |^{2} \leq χ_{d}^{2} (α)}$ .
Pasted image 20250122001726.png|400

4 Deduced Inference

Given any joint confidence region $C (X)$ for $θ \in Θ$ , we may freely assume $θ \in C (X)$ and "deduce" any and all implied conclusions without any FWER inflation. $P_{θ} (any deduced inference is wrong) \leq P_{θ} (θ \notin C (X)) \leq α .$
Deduction is often a good paradigm for deriving simultaneous intervals.
We say $C_{1} (X), \dots, C_{n} (X)$ are simultaneous $1 - α$ confidence intervals for $g_{1} (θ), \dots, g_{m} (θ)$ if $P_{θ} (g_{i} (θ) \in C_{i} (X), \forall i = 1, \dots, m) \geq 1 - α$ .

Example (Simultaneous intervals for multivariate Gaussian)

Assume $X \sim N_{d} (θ, Σ)$ , $Σ$ is known, $Σ_{i i} = 1$ . Let $t_{α}$ be the upper- $α$ quantile of $| | X - θ | |_{\infty}$ . $\begin{aligned} C (X) & = {θ : | θ_{i} - X_{i} | \leq c_{α}, \forall i} \\ = (X_{1} \pm t_{α}) \times (X_{2} \pm t_{α}) \times \dots \times (X_{d} \pm t_{α}) \\ = C_{1} (X_{1}) \dots C_{d} (X_{d}) . \\ P_{θ} (C_{i} (X) ∌ θ_{i}, \forall i) & = P_{θ} (θ \notin C (X)) = α . \end{aligned}$
Note we could have instead constructed an elliptical confidence region but then the intervals would be conservative.

Example (Linear regression)

$n$ observations, $d$ variables, $X \in R^{n \times d}$ design $\hat{β} \sim N_{d} (β, σ^{2} (X^{T} X)^{- 1})$ . Estimate ${\hat{σ}}^{2} = \frac{RSS}{n - d} ⊥ ⊥ \hat{β}$ . Then $\frac{\hat{β} - β}{\hat{σ}} = \frac{Z}{\sqrt{V / (n - d)}}$ , where $Z = \frac{\hat{β} - β}{σ} \sim N_{d} (0, (X^{T} X)^{- 1}), V = \frac{RSS}{σ^{2}} \sim χ_{n - d}^{2}$ . $Z ⊥ ⊥ V \Rightarrow$ distribution of $\frac{\hat{β} - β}{\hat{σ}}$ is fully known.
Assume WLOG that $((X^{T} X)^{- 1})_{j j} = 1, \forall j$ . Let $t_{α}$ denote upper- $α$ quantile of ${‖ \frac{\hat{β} - β}{\hat{σ}} ‖}_{\infty}$ . Then $C_{j} = {\hat{β}}_{j} \pm \hat{σ} t_{α}$ are simultaneous confidence intervals for ${\hat{β}}_{j}, j = 1, \dots, d$ (compute $t_{α}$ by simulation), then $P (β_{j} \in C_{j}, \forall j) = P (| {\hat{β}}_{j} - β_{j} | \leq \hat{σ} t_{α}, \forall j) = 1 - α .$

5 False Discovery Rate (FDR)