5.5 Multiple Testing

1 Multiple Testing

In many testing problems, we want to test many hypotheses at a time, e.g.

Setup: XPθP. H0i:θΘ0i,i=1,,m. (e.g. θi(P)=0) commonly, H0i:θi=0.
Goal: return an accept/reject decision for each i.

R(X)={i:H0i rejected},H0(θ)={i:H0i true}.

R(X)=|R(X)|,m0=|H0|.
Problem: Even if all H0i true, probably have P(any H0i rejected)α.
Classical solution: control family-wise error rate (FWER) α. I.e. supθΘFWER(θ)α.

1.1 Familywise Error Rate

Problem: even if all H0i are true, might have P(any H0i rejected)α.

Example

Xii.n.dN(θi,1),i=1,,m, H0i:θi=0. P0(any H0i rejected)=1(1α)m1.

Is this a problem? Yes. If all attention will be focused on the (false) rejections and non on the (correct) non-rejections.

Classical solution is to control the familywise error rate (FWER):

\mathrm{FWER}_{\theta}=\mathbb{P}_{\theta}(\text{any false rejections})=\mathbb{P}_{\theta}(\mathcal{R}\cap \mathcal{H}_{0}\neq \varnothing).$$ We want $\sup_{\theta\in \Theta}\mathrm{FWER}_{\theta} \leq \alpha$. This is typically achieved by "correcting" marginal p-values $p_{1}(x),\cdots,p_{m}(x)$ ($p_{i} \overset{H_{0i}}{\geq }U[0,1]$), e.g. $p_{i}(x)=2(1-\Phi(|x_{i}|))$ for Gaussian. # 2 Bonferroni Correction Assume $p_{1},\cdots,p_{m}$ are p-values for $H_{01},\cdots,H_{0m}$ with $p_{i}\geq U[0,1]$ under $H_{0i}$. For general dependence, can guarantee control by rejecting $H_{0i}$ iff $p_{i} \leq \frac{\alpha}{m}$: $$\mathbb{P}(\text{any }H_{0i}\text{ rejected})\leq \sum_{i\in m_{0}}\mathbb{P}(H_{0i}\text{ rejected})\leq \frac{m_{0}}{m}\alpha \leq \alpha.

If hypothesis tests independent, we can use α~m=1(1α)1m (Sidak correction)
Then Pθ(no false rejections)=iH0Pθ(pi>α~m)(1α~m)m01α.
For small α, 1α~m=(1α)1m1αm, so Sidak doesn't improve much on Bonferroni.

3 Testing with Dependence

Bonferroni isn't much worse than Sidak, e.g. α=5%,m=20: 0.0025 vs 0.00256. But when tests are highly dependent, can often do much better.

Example (Scheffe's S-method)

XNd(θ,Id),θRd, H0λ:θTλ=0 for λSd1 (m=)
Reject H0λ if ||XTλ||2χd2(α)d+3d. (Because χd2N(d,2d)) Controls FWER: supλ:θTλ=0||XTλ||2supλ||(Xθ)Tλ||2χd2(α). Can view as deduction from confidence region C(X)={θ:||θX||2χd2(α)}.
Pasted image 20250122001726.png|400

4 Deduced Inference

Given any joint confidence region C(X) for θΘ, we may freely assume θC(X) and "deduce" any and all implied conclusions without any FWER inflation. Pθ(any deduced inference is wrong)Pθ(θC(X))α.
Deduction is often a good paradigm for deriving simultaneous intervals.
We say C1(X),,Cn(X) are simultaneous 1α confidence intervals for g1(θ),,gm(θ) if Pθ(gi(θ)Ci(X),i=1,,m)1α.

Example (Simultaneous intervals for multivariate Gaussian)

Assume XNd(θ,Σ), Σ is known, Σii=1. Let tα be the upper- α quantile of ||Xθ||. C(X)={θ:|θiXi|cα,i}=(X1±tα)×(X2±tα)××(Xd±tα)=C1(X1)Cd(Xd).Pθ(Ci(X)θi,i)=Pθ(θC(X))=α.
Note we could have instead constructed an elliptical confidence region but then the intervals would be conservative.

Example (Linear regression)

n observations, d variables, XRn×d design β^Nd(β,σ2(XTX)1). Estimate σ^2=RSSndβ^. Then β^βσ^=ZV/(nd), where Z=β^βσNd(0,(XTX)1),V=RSSσ2χnd2. ZV distribution of β^βσ^ is fully known.
Assume WLOG that ((XTX)1)jj=1,j. Let tα denote upper- α quantile of β^βσ^. Then Cj=β^j±σ^tα are simultaneous confidence intervals for β^j,j=1,,d (compute tα by simulation), then P(βjCj,j)=P(|β^jβj|σ^tα,j)=1α.

5 False Discovery Rate (FDR)