Consider model for data . Loss , risk .
The Bayes risk is the average-case risk, integrated w.r.t some measure , called prior.
For now, assume (probability measure). Later we will allow it to be improper ().
and for are functionally equivalent.
Average risk makes sense even if we don't "believe" .
(if we assume ) is now mean w.r.t the joint distribution of .
An estimator minimizing is called a Bayes estimator. It depends on (by tower property again, )
1.2 Prior and Posterior
Usual interpretation of is prior belief about before seeing the data.
Conditional distribution is called posterior distribution: "belief after seeing the data".
Now we can explicitly define the densities:
Definition
Prior.
Likelihood.
Joint density .
Marginal density .
Posterior density.
Bayes estimator depends on posterior:
Solving Bayes estimator should be "one at a time".
Theorem
Suppose , . for some . Then is Bayes with iff
Proof
"": let be any other estimator. Then
"": Define . Let
Then , with inequality strict on a set of measure .
2 Posterior Mean
2.1 Square Error Loss
If , then the Bayes estimator is the posterior mean: so .
2.2 Weighted Square Error
If (like ), then which is minimized at .
2.3 Other Examples
Example (Beta-Binomial)
, and prior , with gamma function . The posterior is so , so
"pseudo-trials" with successes.
We can see from several examples that posterior mean is weighted sum of sample mean and prior mean. When is huge, the weight of sample is huge.
Example (Normal Mean)
, . So so
Example (Gaussian iid Sample)
, . It is commonly known that . So by results above,
is pseudo-observations with mean . If , data swamps prior. If , prior swamps data.
If the posterior is from the same family as the prior, we say that prior is conjugate to the likelihood. This is most common in exponential families.
3 Conjugate Priors
Suppose , .
For any carrier , define -dim family so sufficient statistic is , and natural parameter is .
So where . Then