5.4 Bootstrap

1.1 Nonparametric Estimation

Setting: $X_{1}, \dots, X_{n} \overset{i . i . d}{\sim} P$ , $P$ is any distribution. Want to do inference on some "parameter" (functional) $θ (P)$ , e.g.

$θ (P) = median (P), X \subset R$ ,
$θ (P) = λ_{max} ({Var}_{P} (X_{i})), X \subset R^{d}$ .
$θ (P) = \arg min_{θ \in R^{d}} E_{p} [(Y_{i} - θ^{T} X_{i})^{2}]$ , $(X_{i}, Y_{i}) \overset{i . i . d}{\sim} P$ .
$θ (P) = \arg min_{θ \in Θ} D_{KL} (P | | P_{θ}) = \arg max_{θ \in Θ} E_{P} [l_{1} (θ; X_{i})]$ .

Recall the empirical distribution of $X_{1}, \dots, X_{n}$ is ${\hat{P}}_{n} = \frac{1}{n} \sum_{i = 1}^{n} δ_{X_{i}}$ ( ${\hat{P}}_{n} (A) = \frac{# {i : X_{i} \in A}}{n}$ ). The plug-in estimator of $θ (P)$ is $\hat{θ} = θ ({\hat{P}}_{n})$ :

Sample median.
$λ_{max}$ (sample variance)
OLS estimator
MLE for ${P_{θ} : θ \in Θ}$ .

Does plug-in estimator work? Depends.
Does ${\hat{P}}_{n} \overset{p}{\to} P$ ? Depends on what sense of convergence.
Does ${\hat{P}}_{n} (A) \overset{p}{\to} P (A)$ for all $A$ ? Yes.
By Glivenko-Cantelli, $sup_{x} | {\hat{P}}_{n} ((- \infty, x]) - P ((- \infty, x]) | \overset{p}{\to} 0$ for $X \subset R$ . But

${\hat{P}}_{n}$ is non-parametric estimator for $P$ , e.g. empirical distribution.
Plug-in estimator $\hat{θ} = θ ({\hat{P}}_{n})$ is called bootstrap estimator.

Bootstrap standard error for estimator

{\hat{θ}}_{n} (X)

For $b = 1, \dots, T$ :

Sample $X_{1}^{* b}, \dots, X_{n}^{* b} \overset{i . i . d}{\sim} {\hat{P}}_{n}$ . (with replacement)

${\hat{θ}}^{* b} = {\hat{θ}}_{n} (X_{1}^{* b}, \dots, X_{n}^{* b})$ .

${\overset{―}{θ}}^{*} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{θ}}^{* b}$
$\hat{se} ({\hat{θ}}_{n}) = \sqrt{\frac{1}{B} \sum_{b = 1}^{B} ({\hat{θ}}^{* b} - {\overset{―}{θ}}^{*})^{2}}$ .
$σ (P) = {se}_{P} ({\hat{θ}}_{n}) = \sqrt{{Var}_{P} (\hat{θ} (X))}$ , $\hat{σ} (P) = σ ({\hat{P}}_{n}) = \sqrt{{Var}_{{\hat{P}}_{n}} (\hat{θ} (X^{n}))}$ .

$θ (P) = {median}_{P} [\frac{1}{min_{i \neq j} | X_{i} - X_{j} |}]$ , or $θ (P) = E [max_{1 \leq i \leq n} X_{i}]$ .
Say determine the dam's height that has a $99 %$ chance higher than the storm that year.

1.2 Bootstrap CI

$R_{n} (X, P) = {\hat{θ}}_{n} (X) - θ (P)$ , $G_{n, P} (r) = P_{P} (\hat{θ} (X) - θ (P) \leq r)$ . $r_{1}, r_{2}$ are $α / 2, 1 - α / 2$ quantiles of $G_{n, p}$ . Then $C (X) = [\hat{θ} (X) - {\hat{r}}_{2}, \hat{θ} (X) - {\hat{r}}_{1}]$ .
We can take ${\hat{r}}_{1} = G_{n, {\hat{P}}_{n}}^{- 1} (\frac{α}{2})$ .

2 Double Bootstrap

Might have theory telling us, e.g., $sup_{a < b} | G_{n, {\hat{P}}_{n}} ([a, b]) - G_{n, P} ([a, b]) | \overset{p}{\to} 0$ .
Let $γ_{n, P} (α) = P_{P} (C_{n, α} (X) ∋ θ (P)) \to 1 - α$ . But in finite samples might have $γ_{n, P} (α) < 1 - α$ (or more).
If $γ_{n, P} (0, 1) = 0.87 < 0.9$ , double bootstrap:

Estimate $γ_{n, P} (\cdot)$ with plug-in $γ_{n, {\hat{P}}_{n}} (\cdot)$ .
Use $C_{n, \hat{α}} (X)$ where $\hat{γ} (\hat{α}) = 1 - α$ .
$\hat{α} = {\hat{γ}}^{- 1} (1 - α)$ , ${\hat{γ}}_{n, {\hat{P}}_{n}} (\cdot) = \hat{γ} (\cdot)$ .

Bootstrap CI Algorithm (MC version)

Want to estimate $r_{1, P} = G_{n, P}^{- 1} (\frac{α}{2})$ which is the value of $r$ that $P_{P} (\hat{θ} (X) - θ (P) \leq r) = \frac{α}{2}$ .
So calculate plug-in estimator ${\hat{r}}_{1} = {\hat{r}}_{1, {\hat{P}}_{n}} = G_{n, {\hat{P}}_{n}}^{- 1} (\frac{α}{2}) = P_{{\hat{P}}_{n}} (\hat{θ} (X^{*}) - θ ({\hat{P}}_{n}) \leq r) = \frac{α}{2}$ . $G_{n, {\hat{P}}_{n}}$ is the distribution of ${\hat{θ}}_{n} (X^{*}) - θ ({\hat{P}}_{n})$ .

For $b = 1, \dots, B$ ,

$X_{1}^{* b}, \dots, X_{n}^{* b} \overset{i . i . d}{\sim} {\hat{P}}_{n}$ .

$R_{n}^{* b} = \hat{θ} (X^{* b}) - θ ({\hat{P}}_{n})$ .

$G_{n, {\hat{P}}_{n}} (r) = \frac{1}{B} \sum_{b = 1}^{B} 1 {R_{n}^{* b} \leq r}$ .
${\hat{r}}_{1} = G_{n, {\hat{P}}_{n}}^{- 1} (\frac{α}{2})$ .
$C (X) = [\hat{θ} (X) - {\hat{r}}_{2}, \hat{θ} (X) - {\hat{r}}_{1}]$ .

2.1 Double Bootstrap CI

Double Bootstrap CI

Want to estimate $γ_{n, P} (α) = P (C (X) ∋ θ (P))$ . So calculate ${\hat{γ}}_{n, {\hat{P}}_{n}} (α) = P_{{\hat{P}}_{n}} (C (X^{*} ∋ θ (\hat{P_{n}})))$ .

For $α = 1, \dots, B$ ,

$X_{1}^{* b}, \dots, X_{n}^{* b} \overset{i . i . d}{\sim} {\hat{P}}_{n}$

$C_{n, α}^{* b} = C_{n, α} (X^{* b})$

$\hat{γ} (α) = \frac{1}{B} \sum_{b = 1}^{B} 1 {C_{n, α}^{* b} ∋ θ ({\hat{P}}_{n})}$ .
$\hat{α} = {\hat{γ}}^{- 1} (1 - α)$ .
$C (X) = C_{n, \hat{α}} (X)$ .