4.1 Hypothesis Testing Introduction

#NeymanPearsonLemma #HypothesisTesting #UMP #TypeIError #TypeIIError #NullHypothesis #AlternativeHypothesis #Power #SignificanceLevel #LikelihoodRatioTest #ZTest #LagrangeMethod

1 Hypothesis Testing

Assume we have model $P = {P_{θ} : θ \in Θ}$ (as usual, may be nonparametric). We have two competing hypothesis about $θ$ :

Null hypothesis: $H_{0} : θ \in Θ_{0}$ .
Alternative hypothesis: $H_{1} : θ \in Θ_{1}$ .

Here $Θ_{0} \cap Θ_{1} = \emptyset, Θ_{0} \cup Θ_{1} = Θ$ ( $Θ_{0}, Θ_{1}$ are disjoint).

Example

(Gaussian summary statistic, $Z -$ test) $Z \sim N (θ, 1)$ . $Z$ is what we observed, and we want to draw inference about $θ$ . Two common settings are one-sided $H_{0} : θ \leq θ_{0} vs H_{1} : θ > θ_{0}$ or two-sided $H_{0} : θ = θ_{0} vs H_{1} : θ \neq θ_{0} .$
(Two-sample nonparametric testing) we observe $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} P$ , and $Y_{1}, \dots, Y_{m} \overset{i . i . d}{\sim} Q$ , and $X_{1}, \dots, X_{n} ⊥ ⊥ Y_{1}, \dots, Y_{m}$ . We want to test the nonparametric hypothesis $H_{0} : P = Q vs H_{1} : P \neq Q .$

A hypothesis is called simple if it fully specifies data distribution (like $θ = θ_{0}$ in the above examples).
Now we want to use the data $X \sim P_{θ}$ to determine whether $H_{0}$ or $H_{1}$ is true.

The Bayesian answer is simple: $\begin{array}{r} Λ (Θ_{0} | X) = P (θ \in Θ_{0} | X), \\ Λ (Θ_{1} | X) = P (θ \in Θ_{1} | X) . \end{array}$
But this is undesirable if prior information is rare (like for drug companies, scientists, etc.)
The frequentist answer is to come up with a decision rule based on data. We can either
- Reject $H_{0}$ (conclude that $H_{0}$ is implausible and $H_{1}$ must be true), or
- Accept $H_{0}$ (go on believing $H_{0}$ )
Normally we set $H_{0}$ the choice to be disconfirmed. Like when convicting a crime, $H_{0}$ : suspect is innocent.

2 The Test Function

We describe a test by its critical/test function $\begin{array}{r} ϕ (x) = {\begin{aligned} 0, accept H_{0}, \\ γ \in (0, 1), reject H_{0} with probability γ, \\ 1, reject H_{0} . \end{aligned} \end{array}$

$γ$ part is to "top off" the Type I error rate if $T (X)$ is discrete and $P_{θ_{0}} (T (X) > c_{α}) < α$ . In practice, we will skip the random $γ$ part.

Therefore we partition $X$ into rejection region $R = {x \in X | ϕ (x) = 1}$ , and acceptance region $A = {x \in X | ϕ (x) = 0}$ .
We will define a test statistic $T (X)$ and some critical threshold $c \in R$ . So $ϕ$ rejects for large $T (X)$ , if $ϕ (x) = {\begin{aligned} 0, T (x) < c, \\ γ \in (0, 1), T (x) = c, \\ 1, T (x) > c . \end{aligned}$
Or $ϕ$ rejects for extreme $T (X)$ , if $ϕ (x) = {\begin{aligned} 0, | T (x) | < c, \\ γ \in (0, 1), | T (x) | = c, \\ 1, | T (x) | > c . \end{aligned}$

2.1 Significance Level, Power

Inevitably deduction produces error. There are two types of error we make:

Type I error (False Positive): we reject $H_{0}$ when it is actually true.
Type II error (False Negative): we fail to reject $H_{0}$ when it is false.

Our goal is to minimize type II error as small as we can, while control type I below a pre-specified value $α \in [0, 1]$ .
Define power function $β (θ) = E_{θ} [ϕ (X)] = P_{θ} (reject H_{0}) .$
We can explicitly combine power function with type I/II error:

	Type I Error	Type II Error
$θ \in Θ_{0}$	$β_{ϕ} (θ)$	$0$
$θ \in Θ_{1}$	$0$	$1 - β_{ϕ} (θ)$

So our goal can be expressed as $\begin{array}{r} \max_{ϕ} β_{ϕ} (θ), θ \in Θ_{1}, \\ s . t . β_{ϕ} (θ) \leq α, θ \in Θ_{0} . \end{array}$
We say $ϕ$ is a level $- α$ test if $sup_{θ \in Θ_{0}} β_{ϕ} (θ) \leq α$ . (if strictly below $α$ , we say it is conservative). We commonly use $α = 0.05$ .

2.2 Example: $Z -$ test

Assume we observe $Z (X) \sim N (θ, 1)$ . We use the right-tailed test $ϕ_{1} (z) = 1 {z > z_{α}}$ that rejects for large $Z$ . Here $z_{α} = Φ^{- 1} (1 - α)$ is the upper $α$ quantile of the $N (0, 1)$ distribution (top xx%).

If we want to test the two-sided hypothesis, we use two-tailed test $ϕ_{2} (z) = 1 {| z | > z_{α / 2}}$ .
#？

3 Optimal Testing

3.1 Likelihood Ratio Test

We start with the simplest case: $\begin{matrix} (3.1) & H_{0} : X \sim P_{0} vs H_{1} : X \sim P_{1} . \end{matrix}$ WLOG assume $P_{0}, P_{1}$ have densities $p_{0}, p_{1}$ w.r.t common dominating measure $μ$ .
Define $LR (X) = \frac{p_{1} (X)}{p_{0} (X)} .$

LRT

Likelihood ratio test (LRT) is $ϕ (x) = {\begin{aligned} 1, LR (x) > c, \\ γ, LR (x) = c, \\ 0, LR (x) < c . \end{aligned}$

Theorem (Neyman-Pearson Lemma)

Likelihood ratio test $ϕ^{*}$ with $E_{0} ϕ (X) = α$ maximizes power among all level $- α$ tests of (3.1).

Proof

We want to show $ϕ^{*}$ solves: $\begin{array}{r} max_{ϕ} \int ϕ (x) p_{1} (x) d μ (x), \\ s . t . \int ϕ (x) p_{0} (x) d μ (x) . \end{array}$
The Lagrange form is $\begin{aligned} L (ϕ; λ) & = \int ϕ (x) p_{1} (x) d μ (x) - λ \int ϕ (x) p_{0} (x) d μ (x) \\ = \int ϕ (x) (p_{1} (x) - λ p_{0} (x)) d μ (x) \\ = \int ϕ (x) (\frac{p_{1} (x)}{p_{0} (x)} - λ) d P_{0} (x) . \end{aligned}$
So $ϕ (x)$ should be large when $LR (x) - λ > 0$ and be small when $LR (x) - λ < 0$ . The best values are ${0, 1}$ . So $ϕ^{*}$ solves the Lagrange form.
For another test $ϕ (x)$ with $E_{0} ϕ (X) \leq α$ , we have $\begin{aligned} E_{1} ϕ (X) & \leq E_{1} ϕ (X) - c (E_{0} ϕ (X) - α) \\ \leq E_{1} ϕ^{*} (X) - c (E_{0} ϕ^{*} (X) - α) = E_{1} ϕ^{*} (x) . \end{aligned}$

We should set $c_{α}$ as the upper- $α$ quantile of the distribution of $LR (X)$ , i.e. $P_{0} (LR (X) > c_{α}) < α \leq P_{0} (LR (X) \geq c_{α}) .$ So we can set $γ = \frac{α - P_{0} (LR (X) > c)}{P_{0} (LR (X) = c)} .$

Example (Binomial)

Suppose we observe a binomial random variable measuring the same-side bias of some human coin flipper $X \sim Binomial (n, θ) .$
We want to test $H_{0} : θ = 0.5 vs H_{1} : θ = 0.51 .$ By NP lemma, use likelihood ratio $LR (X) = \frac{p_{0.51} (X)}{p_{0.5} (X)} = {(\frac{0.49}{0.5})}^{n} {(\frac{0.51}{0.49})}^{X} .$ Since it is strictly increasing w.r.t $X$ , it is equivalent to reject for large $X$ .

By discreteness, we can't exactly have Type I error be $α = 0.05$ . For example, $n = 100, α = 0.05$ , take $c_{α} = 58$ ( $0.95$ quantile under the null). But $P_{0.5} (X > 58) = 0.044 < 0.05$ . So we randomize at the boundary by setting $γ = \frac{0.05 - 0.044}{P_{0.5} (X = 58)} = 0.26,$ and $ϕ^{*}$ rejects with probability $0.26$ if $X = 58$ .
But in practice we seldom use randomized tests. So consider the conservative test $ϕ^{*} (X) = 1 {X > 58}$ .

3.2 UMP Tests

UMP Tests

We say a test $ϕ^{*}$ is uniformly most powerful (UMP) level $- α$ test of $H_{0}$ against $H_{1}$ if

it is a valid level $α$ test;
for any other valid test $ϕ$ , $β_{ϕ^{*}} (θ) \geq β_{ϕ} (θ)$ , $\forall θ \in Θ_{1}$ .

MLR

$P = {P_{θ} | θ \in Θ \subset R}$ has monitone likelihood ratios (MLR) in $T (X)$ if $\frac{p_{θ_{2}} (x)}{p_{θ_{1}} (x)}$ is a non-decreasing function of $T (x)$ , for any $θ_{1} < θ_{2}$ .

MLR is sufficient for finding a UMP test:

Theorem

Assume $P$ has MLR in $T (X)$ , and $H_{0} : θ \leq θ_{0} vs H_{1} : θ > θ_{0}$ , for some $θ_{0} \in Θ \subset R$ .
If $ϕ^{*} (X)$ rejects for large $T (X)$ , then $ϕ^{*}$ is UMP at level $α = E_{θ_{0}} ϕ^{*} (X)$ .

Proof

Consider any other level $- α$ test $ϕ$ , and $θ_{1} > θ_{0}$ . We know $ϕ$ is also a valid level $- α$ test for $\begin{matrix} (*) & H_{0} : θ = θ_{0} vs H_{1} : θ = θ_{1}, \end{matrix}$ and so is $ϕ^{*} (X)$ . Since $\frac{p_{θ_{1}} (X)}{p_{θ_{0}} (X)}$ is non-decreasing w.r.t $T (X)$ , $ϕ^{*} (X)$ is a likelihood ratio test for this problem, so $β_{ϕ^{*}} (θ_{1}) \geq β_{ϕ} (θ_{1})$ .
Compared to the test $ϕ (X) \equiv α$ , we have $β_{ϕ^{*}} (θ_{1}) \geq α$ for $θ_{1} > θ_{0}$ .
It remains to show that $ϕ^{*}$ is a valid level $- α$ test. Define $\overset{―}{ϕ} (X) = 1 - ϕ^{*} (X),$ which rejects for small $T (X)$ . Note that $\overset{―}{ϕ} (X)$ is a level $- (1 - α)$ LRT of (*), for any $θ_{1} < θ_{0}$ . As a result, we can conclude that for $θ_{1} < θ_{0}$ , $1 - α \leq β_{\overset{―}{ϕ}} (θ_{1}) = 1 - β_{ϕ^{*}} (θ_{1}) .$

Example (One-parameter Exponential Family)

Consider $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} p_{η} (x) = e^{η T (x) - A (η)} h (x)$ . Then for $η_{1} < η_{2}$ , $\frac{\prod_{i = 1}^{n} p_{η_{2}} (x_{i})}{\prod_{i = 1}^{n} p_{η_{1}} (x_{i})} = \exp {(η_{2} - η_{1}) \sum_{i = 1}^{n} T (x_{i}) - n (A (η_{2}) - A (η_{1}))},$ which is increasing in $\sum_{i = 1}^{n} T (x_{i})$ if $η_{2} > η_{1}$ . As a result, any LRT will reject for large values of $\sum_{i = 1}^{n} T (x) i$ .

1 Hypothesis Testing

2 The Test Function

2.1 Significance Level, Power

2.2 Example: Z− test

3 Optimal Testing

3.1 Likelihood Ratio Test

3.2 UMP Tests

2.2 Example: $Z -$ test