Throw two dies. Die 0 is fair, and die 1 is loaded. is the number shown. is the die chosen. Define Then
For a fixed , satisfies the usual properties of expectation, e.g., linearity.
is a function of .
is a function of random variable , so itself is a random variable. . So it can also be seen as a composition.
Theorem (Law of total expectation/Law of iterated expectation/Tower property)
For any random variable , s.t. ,
The notation of expectation by default indicates what to integrate. So the inner layer is expectation over , and the outer layer is over .
Proof
Second last equality uses law of total probability.
For discrete case, replace with .
Theorem (Wald's Identity)
Suppose is a sequence of i.i.d. random variables, with , and is another positive integer valued random variable, s.t. , and .
Let . Then
Proof
2 Important Applications
2.1 Statistical risk minimization
is a random variable of interest (we want to predict). is a prediction of . Loss function is . Risk is . It's the expectation over both and .
The goal is to find .