This note is a backup from Notion.
Contents
Lecture 8
Definition of Variance
- Def: The variance of X, Var(X) is:
-
If X is discrete
σ2=Var(X)=E((X−μx)2)=x∑(x−μx)2f(x)
-
If X is continuous
σ2=Var(X)=E((X−μx)2)=∫−∞+∞(x−μx)2f(x)dx
-
σ is called the standard deviation
-
σ2=E((X−μx)2)=E(X2−2μxX+μx2)=E(X2)−2μxE(X)+μx2=E(X2)−E2(X)
- Properties
-
∀X, Var(X)≥0, Var(C)=0
Var(X)=0⇔P(X=C)=1
-
Var(CX)=C2Var(X)
-
If X and Y are independent, then
E(XY)=E(X)E(Y)
Var(X±Y)=Var(X)+Var(Y)
-
If X1,X2...Xn are mutually independent,
Var(∑i=1nCiXi+b)=∑i=1nCi2Var(Xi)
Covariance
-
Def: the covariance of X and Y is
σXY=Cov(X,Y)=E[(X−μx)(Y−μy)]
-
σXY=Cov(X,Y)=E(XY)−E(X)E(Y)
-
If X and Y are independent, Cov(X,Y)=0
-
The inverse direction may not be true!
Var(X±Y)=Var(X)+Var(Y)±2Cov(X,Y)
Correlation coefficient
-
Def: The correlation coefficient of X and Y is
ρXY=σYσYσXY
-
If X and Y are independent, ρXY=0
-
Properties:
-
∀X,Y ∣ρXY≤1∣
pf: use Var(Y−tX)≥0
-
∣ρXY∣=1⇔∃a=0, bs.t. P(Y=aX+b)=1
-
If ∣ρXY∣=1, we call X and Y are completely linear correlated
-
If ρXY=0, X and Y are called uncorrelated, means there is not any "linear correlation" between X and Y.
-
independent ⟶ uncorrelated
-
∣ρXY∣ denote the strongness of linear correlation between X and Y
-
ρXY>0 means there is positive linear correlation between X and
If X becomes larger, then Y tends to become stronger
Lecture 9
Bernoulli Distribution
-
0-1 distribution X∼B(1,p)
F(x)=⎩⎨⎧0,1−p,1,x<00≤x<1x≥1
-
E(X)=p, Var(X)=p−p2=pq
-
Indicator A⊂S, IA(ω)={1,0,if ω∈Aif ω∈/A
-
It can be used everywhere
Binomial Distribution
- Def: the number of success X in n Bernoulli trails X∼B(n,p)
- If n=1, it becomes Bernoulli distribution
- pmf: f(x)=P(X=x)=b(x;n,p)=Cnxpxqn−x,x=0,1,...,n
- Binomial: ∑x=0nb(x;n,p)=∑x=0nCnxpxqn−x=(p+q)n=1
- E(X)=np,Var(X)=npq
- hint: Xi={1,0,the i−th trail succeedsthe i−th trail fails
- Xi are mutually independent Xi∼B(1,p), X=∑i=1nXi
Multinomial Distribution
- Def: Multinomial experiments repeatedly: independent, k outcomes each time
- DefL Multinomial distribution: the number of each outcomes in n trails
- Joint pmf: f(x1,x2,...,xk;p1,p2,...,pk,n)=x1!x2!...xk!n!p1x1p2x2...pkxk
- Each marginal distribution is binomial
Lecture 10
Hypergeometric Distribution
- Motivation: Sampling without replacement
- Def: X the number of success
- n is selected from N terms without replacement;
- of N terms, k are success and N−k are failures.
X∼H(N,n,k)
f(x;N,n,k)=CnNCknCn−xN−k, max(0,n−(N−k))≤x≤min(n,k)
-
Relationship to Binomial
- Binomial is the limit case for hypergeometric when N approaches infinity
- When N is larger enough(Nn is small): f(x;N,n,k)≈b(x;n,Nk)
-
X is hypergeometric with N, n and k, then
E(X)=nNk
Var(X)=N−1N−nnNk(1−Nk)
Multivariate Hypergeometric
-
N terms be Lectureified into k kinds, select n randomly, number of each kind
f(x1,x2,...,xk;a1,a2,...,ak,N,n)=CNnCa1x1Ca2x2...Cakxk
-
Each marginal is hypergeometric!
Geometric Distribution
-
Def: Do Bernoulli experiments until succeed, X the number of trails X∼G(p)
-
pmf: g(x;p)=qx−1p,x=1,2,3...
-
Mean E(X) and variance Var(X)
E(X)=p1, Var(X)=p2q
Negative Binomial Distribution
-
Def: Do Bernoulli experiments until the k-th succeed, X the number of trails X∼NB(k,p)
-
pmf:
b∗(x;k,p)=Cx−1k−1qx−kpk, x=k,k+1,k+2,...
-
Mean E(X) and variance Var(X)
E(X)=pk, Var(X)=p2kp
Poisson Distribution
-
Def: number of occurring in a Poisson process
-
Derivation: Poisson theorem
limn→∞Cnx(nλ)x(1−nλ)n−x=x!λxe−λ
-
pmf:
p(x;λ)=x!λxe−λ, x=0,1,2...
-
Expectation:
X∼P(λ), E(X)=λ, Var(X)=λ
-
Relationship to Binomial
- Poisson distribution is the limit case of binomial when n approaches infinity while np is fixed
- If n(n≥50) is large while p(p≤0.1) is small, X∼B(n,p)≈P(np)
Lecture 11
-
Def: X is called uniform distribution on [a,b] if its density satisfy: X∼U(a,b)
f(x)=⎩⎨⎧b−a1,0,x∈[a,b]elsewhere
-
cdf and probability
-
Expectations: E(X)=2a+b,Var(X)=12(b−a)2
Exponential Distribution
-
Def: X is called exponential distribution if
f(x)=⎩⎨⎧β1e−βx,0,x>0x≤0
-
cdf: F(x)={0,1−e−βx,x≤0x>0
Gamma Distribution
Gamma Function
-
Def: Gamma function
Γ(α)=∫0+∞xα−1e−xdx,α>0
-
Properties:
Γ(1)=1,Γ(0.5)=πΓ(α+1)=αΓ(α),Γ(n)=(n−1)!
-
Def: the Gamma density is as following: X∼Γ(α,β)
f(x)=⎩⎨⎧βαΓ(α)1xα−1e−βx,0,x>0x≤0
-
Exponential is special case of Gamma density X∼e(β)=Γ(1,β)
-
Expectations:
E(X)=αβ,Var(X)=αβ2
X∼e(β),E(X)=β,Var(X)=β2
Normal Distribution
Standard Normal
-
Def: X is called standard normal if density
φ(x)=2π1e−2x2,x∈(−∞,+∞)
-
The cdf can be found from tables
Φ(x)=∫−∞xφ(t)dt=∫−∞x2π1e−2t2dt
Φ(0)=0.5,Φ(−x)=1−Φ(x)
-
Expectations: if X is standard normal
E(X)=0,Var(X)=1
X∼N(0,1)
-
Def: X is normal with parameter μ,σ2
X∼N(μ,σ2)⇔σX−μ∼N(0,1)
-
The density of N(μ,σ2) is:
F(x)=P(X≤x)=P(σX−μ≤σx−μ)=Φ(σx−μ)
f(x)=2πσ1e−2σ2(x−μ)2,x∈(−∞,+∞)
-
Expectations:
E(X)=μ,Var(X)=σ2
-
pth quantile
- Def: for p in (0,1), the pth quantile xp of X is P(X≤xp)=p
- Def: for p in (0,1), the critical value cp of X is P(X≥xp)=p
- xp=c1−p
Lecture 12
Central Limit Theorem
Lecture 13
Estimation Methods
- Moment estimate
-
Fundamental basis: {Xi}iid E(Xi)=μ,Var(Xi)=σ2
X=n1∑i=1nXi∼N(μ,nσ2)⇒Xn→∞μ
-
Distribution parameter θ is related to μ
-
Estimation:
E(x)=μ=g(θ)⟶θ=h(μ)≈h(X)=θ^
- The Method of Maximum Likelihood
-
Suppose the population X∼f(x,θ)
P(X1=x1,X2=x2,...,Xn=xn)=f(x1,θ)f(x2,θ)...f(xn,θ)≡L(θ)
-
L(θ) is called likelihood function
-
The estimation of mle is chosen as:
L(θ^)=maxL(θ)
-
Solution of mle for uniform distribution
-
find the likelihood function for X∼U(a,b)
L(a,b)=∏i=1nf(xi)=(b−a1)n
-
find mle ∂a∂L(a,b)>0,∂b∂L(a,b)<0
∀i,a<Xi<b⇒a≤min{Xi},b≥max{Xi}
-
The likelihood function is strictly increasing with a but strictly decreasing with b, so the mle are:
a^=min{Xi},b^=max{Xi}
Lecture 14
Unbiasedness
- Def: if E(θ^)=θ, θ^ is called unbiased
- Def: b(θ^)=E(θ^)−θ is called bias
- Def: if b(θ^)=0,limn→+∞b(θ^)=0, θ^ is asymptotically
Efficiency
- Def: both θ^1 and θ^2 are biased, θ^1 is more efficient than θ^2 if Var(θ^1)<Var(θ^2)
Mean Squared Error(MSE)
-
Def: the mean squared error is:
M(θ^)=E[(θ^−θ)2]
-
The MSE can be computed as:
M(θ^)=Var(θ^)+b2(θ^)
Lecture 15
Chi-Squared Distribution
Xi∼N(0,1),X=i=1∑nXi2∼χ2(n)
-
Derive of density:
χ2(n)=Γ(2n,2)
f(x;n)=⎩⎨⎧2n/2Γ(n/2)1xn/2−1e−x/2,0,x>0elsewhere
-
Expectations: X∼χ2(n)⇒E(x)=n,Var(X)=2n
-
Chi-Squared distributions are addictive:
X∼χ2(n),Y∼χ2(m),X,Y indep⇒X+Y∼χ2(n+m)
t-Distribution
X∼N(0,1),Y∼χ2(n)⇒T=Y/nX∼t(n)
-
Density:
f(t)=Γ(n/2)nπΓ[(n+1)/2](1+nt2)−(n+1)/2,−∞<t<+∞
-
Even function
-
Limit is standard normal: limn→∞f(t)=φ(t)
F-Distribution
X∼χ2(n1),Y∼χ2(n2)⇒F=Y/n2X/n1∼F(n1,n2)
- Property: F∼F(n1,n2)⇒1/F∼F(n2,n1)
- The limit case is Normal Distribution
Sampling Distribution Theorems
-
Suppose the population is Normal: X∼N(μ,σ2)
-
Th1:
X∼N(μ,nσ2)orσ/nX−μ∼N(0,1)
-
Th2: X and S2 are independent, and
σ2(n−1)S2=i=1∑nσ2(Xi−X)2∼χ2(n−1)
-
Th3:
S/nX−μ∼t(n−1)
Lecture 16
CI under Normal Distribution
- find μ
- X∼N(μ,σ2), and σ2 is given
- find X≈μ
- construct Z=σ/nX−μ∼N(0,1)
- find P(−zα/2<Z<zα/2)=1−α
- solve −zα/2<Z<zα/2⇔X−zα/2nσ<μ<X+zα/2
- X∼N(μ,σ2), and σ2 is unknown
- find X≈μ
- construct T=S/nX−μ∼t(n−1)
- find P(−tα/2<T<tα/2)=1−α
- solve −tα/2<T<tα/2⇔X−tα/2nS<μ<X+tα/2nS
- find σ
- X∼N(μ,σ2), and μ is given
- construct W=∑i=1nσ2(Xi−μ)2∼χ2(n)
- solve P(χ1−α/22<W<χα/22)=1−α
- X∼N(μ,σ2), and μ is unknown
- construct W=σ2n−1S2=∑i=1nσ2(Xi−X)2∼χ2(n−1)
Sampling Distribution under Two Populations
-
Suppose X∼N(μ1,σ12), Y∼N(μ2,σ22)
-
X, Y independent, n1, n2 samples from X, Y
-
Th1: var known
σ12/n+σ22/n2(X−Y)−(μ1−μ2)∼N(0,1)
-
Th2: var unknown but equal
Sp1/n1+1/n2(X−Y)−(μ1−μ2)∼t(n1+n2−2)
Sp2=n1+n2−2(n1−1)S12+(n2−1)S22
- Th3: Sampling theorem for Variance
S22/σ22S12/σ12∼F(n1−1,n2−1)
Sample variance
S2=n−1∑(Xi−X)2
X∼ ?, E(X)=μ, Var(X)=σ2X=n1∑Xi∼N(μ,nσ2)
Var(X)=E(X2)−E2(X)Var(X)=E(X2)−E2(X)
E(X2)=μ2+σ2E(X2)=μ2+nσ2
E(∑(Xi−X)2)=======E(∑(Xi2+X2−2XiX))E(∑Xi2+nX2−2X∑Xi)E(∑Xi2+nX2−2nX2)∑E(Xi2)−E(nX2)nE(X2)−nE(X2)n(μ2+σ2)−n(μ2+nσ2)nσ2−σ2=(n−1)σ2
⇒E(∑(Xi−X)2)n−1E(∑(Xi−X)2)E(n−1∑(Xi−X)2)=(n−1)σ2=σ2=σ2
S2=n−1∑(Xi−X)2⇒E(S2)=σ2