2.1 Exponential Families

1 Definition

We have discussed statistical models. Now we want to study a family of models with a specific structure, and they have a lot of properties.

Exponential Family

P={Pη|ηΞ} is a s parameter exponential family if it is defined by a family of densities of the form (1.1)pη(x)=eηTT(x)A(η)h(x), w.r.t a common dominating measure μ (Pημ,ηΞ)
The parts in the formula has distinct names:

  • T:XRs: sufficient statistic,
  • h:X[0,): carrier density/base density,
  • ηΞRs: natural parameter,
  • A:ΞR: log-partition function.

By definition, take integral on both sides w.r.t x over X: 1=XeηTT(x)A(η)h(x)dμ(x)(1.2)A(η)=log(XeηTT(x)h(x)dμ(x)).
We should restrict A(η)<, so define natural parameter space Ξ1={η|A(η)<}Rs.
We can prove that A(η) is convex function, so Ξ1 is convex set.

1.1 Distribution of T(X)

If Xpη(x)=eηTT(x)A(η) w.r.t μ (WLOG we let h1, otherwise we let μ to absorb h), then T(X)qη(t)=eηTtA(η) w.r.t ν, where ν is the measure μ push forward through T:XRs: ν(B)=Δμ({x:T(x)B}). So Pη(T(X)B)=1B(T(X))eηTT(x)A(η)dμ(x)=1B(t)eηTtA(η)dν(t).
This is simplest in discrete case (we can now drop h1 assumption):

Pη(T(x)=t)=T(x)=teηTT(x)A(η)h(x)μ({x})=eηTtA(η)T(x)=th(x)μ({x}),

and we denote ν({t})=x:T(x)=th(x)μ({x}).

1.2 Carnonical Form

Based on discussion above, we can simplify the structure of exponential family:

Based on these, we define

Carnonical Form

pη(x)=eηTxA(η) is called carnonical form.

2 Differential Identites

By (1.2) we have (2.1)eA(η)=XeηTT(x)h(x)dμ(x). We can differentiate this function to get meaningful results. We use without proof that it's correct to swap differentiation and integral within Ξ1.

2.1 Mean of T(X)

Denote Tj(x) be the j th coordinate of T(x). Then ηjeA(η)=XηjeηTT(x)h(x)dμ(x)eA(η)Aηj(η)=XTj(x)eηTT(x)h(x)dμ(x)Aηj(η)=XTj(x)eηTT(x)A(η)h(x)dμ(x)=Eη[Tj(X)].Rearrange for j=1,,s: (2.2)A(η)=Eη[T(X)].

2.2 Variance of T(X)

Take a second partial derivative: 2ηjηkeA(η)=X2ηjηkeηTT(x)h(x)dμ(x)eA(η)(2Aηjηk+AηjAηk)=XTj(x)Tk(x)eηTT(x)h(x)dμ(x)2Aηjηk+Eη[Tj(X)]Eη[Tk(X)]=Eη[Tj(X)Tk(X)]2Aηjηk=Covη(Tj(X),Tk(X)). Finally we get (2.3)2A(η)=Varη(T(X)). Here Varη(T(X)) is a s×s covariance matrix of the random vector T(X).

2.3 MGF of T(X)

Moment Generating Function (MGF) of a d dimensional random vector XP is defined as MX(u)=E[euTX],uRd. Note that 1-dim case is introduced in here. We can calculate moments by taking derivative of MGF, as long as MX(u) is well-defined for a neighborhood of 0. Now we evaluate the first moments ujMX(u)=XujeuTxdP(x)=XxjeuTxdP(x).
Let u=0, we obtain ujMX(0)=XxjdP(x)=E[Xj]. Similarly m1++mdu1m1udmdMX(u)|u=0(2.4)=Xx1m1xdmdeuTxdP(x)|u=0=E[X1m1Xdmd].
On the other hand, given η,Pη, we can explicitly calculate the MGF for exponential family MT(X)(u)=Eη[euTT(X)]=XeuTT(x)A(η)h(x)dμ(x)=eA(η)Xe(u+η)TT(x)h(x)dμ(x)=eA(η+u)A(η).

2.4 CGF

The cumulant-generating function (CGF) is the log of MGF: KX(u)=logMX(u). So for exponential family, KT(X)(u)=A(η+u)A(η)ηjKT(u)|u=0=0.

3 Other Parameterizations

Instead of parameterizing P w.r.t η, we can parameterize the family by another η=η(θ), so pθ(x)=eη(θ)TT(x)B(θ)h(x),B(θ)=A(η(θ)).

4 Interpretation: Exponential Tilting

We can think of pη(x)=eηTT(x)A(η)h(x) as an exponential tilt for the carrier h(x):

  1. Start with carrier h(x).
  2. Multiply by eηTT(x)
  3. Re-normalize by eA(η).

T(X)=(T1(X),,Ts(X)) can be viewed as giving linear space of directions in which we can tilt h(x). Ξ1 is all tilts after which normalization is possible (not going to infinity).

5 Repeated Sampling from Exponential Families

One of the most important properties of exponential families is that a large sample can be summarized by a low-dimensional statistic.
Suppose X=(X1,,Xn) represents iid sample from an exponential family X1,,Xni.i.dpη(1)(x)=eηTT(x)A(η)h(x), then pη(x)=i=1neηTT(xi)A(η)h(xi)=exp{ηTi=1nT(xi)nA(η)}i=1nh(xi). This is an exponential family with sufficient statistic i=1nT(Xi), base density i=1nh(xi) and log-partition function nA(η).