Assume we have model (as usual, may be nonparametric). We have two competing hypothesis about :
Null hypothesis: .
Alternative hypothesis: .
Here ( are disjoint).
Example
(Gaussian summary statistic, test) . is what we observed, and we want to draw inference about . Two common settings are one-sided or two-sided
(Two-sample nonparametric testing) we observe , and , and . We want to test the nonparametric hypothesis
A hypothesis is called simple if it fully specifies data distribution (like in the above examples).
Now we want to use the data to determine whether or is true.
The Bayesian answer is simple:
But this is undesirable if prior information is rare (like for drug companies, scientists, etc.)
The frequentist answer is to come up with a decision rule based on data. We can either
Reject (conclude that is implausible and must be true), or
Accept (go on believing )
Normally we set the choice to be disconfirmed. Like when convicting a crime, : suspect is innocent.
2 The Test Function
We describe a test by its critical/test function
part is to "top off" the Type I error rate if is discrete and . In practice, we will skip the random part.
Therefore we partition into rejection region, and acceptance region.
We will define a test statistic and some critical threshold. So rejects for large, if
Or rejects for extreme, if
2.1 Significance Level, Power
Inevitably deduction produces error. There are two types of error we make:
Type I error (False Positive): we reject when it is actually true.
Type II error (False Negative): we fail to reject when it is false.
Our goal is to minimize type II error as small as we can, while control type I below a pre-specified value .
Define power function
We can explicitly combine power function with type I/II error:
Type I Error
Type II Error
So our goal can be expressed as
We say is a level test if . (if strictly below , we say it is conservative). We commonly use .
2.2 Example: test
Assume we observe . We use the right-tailed test that rejects for large . Here is the upper quantile of the distribution (top xx%).
If we want to test the two-sided hypothesis, we use two-tailed test. #?
3 Optimal Testing
3.1 Likelihood Ratio Test
We start with the simplest case: WLOG assume have densities w.r.t common dominating measure .
Define
LRT
Likelihood ratio test (LRT) is
Theorem (Neyman-Pearson Lemma)
Likelihood ratio test with maximizes power among all level tests of (3.1).
Proof
We want to show solves:
The Lagrange form is
So should be large when and be small when . The best values are . So solves the Lagrange form.
For another test with , we have
We should set as the upper- quantile of the distribution of , i.e. So we can set
Example (Binomial)
Suppose we observe a binomial random variable measuring the same-side bias of some human coin flipper
We want to test By NP lemma, use likelihood ratio Since it is strictly increasing w.r.t , it is equivalent to reject for large .
By discreteness, we can't exactly have Type I error be . For example, , take ( quantile under the null). But . So we randomize at the boundary by setting and rejects with probability if .
But in practice we seldom use randomized tests. So consider the conservative test .
3.2 UMP Tests
UMP Tests
We say a test is uniformly most powerful (UMP) level test of against if
it is a valid level test;
for any other valid test , , .
MLR
has monitone likelihood ratios (MLR) in if is a non-decreasing function of , for any .
MLR is sufficient for finding a UMP test:
Theorem
Assume has MLR in , and , for some .
If rejects for large , then is UMP at level .
Proof
Consider any other level test , and . We know is also a valid level test for and so is . Since is non-decreasing w.r.t , is a likelihood ratio test for this problem, so .
Compared to the test , we have for .
It remains to show that is a valid level test. Define which rejects for small . Note that is a level LRT of (*), for any . As a result, we can conclude that for ,
Example (One-parameter Exponential Family)
Consider . Then for , which is increasing in if . As a result, any LRT will reject for large values of .