Standard errors
Heteroskedasticity robust standard errors
HC0
HC1
HC2
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
生成模拟数据¶
Homoscedasticity standard errors
$$ y=X\beta+\epsilon $$
$y\in R^{n\times1}$
$X=[X_1, X_2,\dots,X_n]^T, X\in R^{n\times k}, X_i\in R^{k\times1}$
$\beta\in R^{k\times1}$
$\epsilon\in R^{n\times1}$
rng = np.random.default_rng(seed=42)
X = np.linspace(1, 10, 10)
X = np.repeat(X, 10)
y = 3 * X + rng.normal(0, 5, size=len(X))
X += rng.normal(0, 1, len(X))
X = X.reshape(-1, 1)
y = y.reshape(-1, 1)
n, k = X.shape[0], 1
plt.scatter(X, y, s=2)
plt.show()
print(y.shape, X.shape,
f'n={n} k={k}')
(100, 1) (100, 1) n=100 k=1
$$ \hat\beta = \arg\min_{\beta} ||y- X\hat\beta||^2 = (y-X\hat\beta)^T (y-X\hat\beta) $$ $$ \downarrow \\ $$ $$ \frac{\partial ||y-X\hat\beta||^2}{\partial \beta} = 2 (-X)^T (y- X\hat\beta ) = 0 $$ $$ \downarrow $$ $$ \hat\beta = (X^TX)^{-1} X^T y $$
beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y
epsilon_hat = y - X @ beta_hat
beta_hat.shape, epsilon_hat.shape, beta_hat,
((1, 1), (100, 1), array([[2.84726633]]))
$$ \hat\beta=(X^TX)^{-1}X^T(X\beta+\epsilon)=\beta+(X^TX)^{-1}X^T\epsilon $$
$$ \downarrow $$
$$ E[\hat\beta]=\beta + (X^TX)^{-1}X^T E[\epsilon]=\beta $$
- $E[\epsilon]=0$
Sandwich estimator¶
$$ \begin{aligned} V[\hat\beta\mid X]&=V[\hat\beta-\beta\mid X]\\ &=V[(X^TX)^{-1}X^T\epsilon\mid X]\\ &=(X^TX)^{-1}X^T V[\epsilon\mid X] X(X^TX)^{-1}\\ &=(X^TX)^{-1} \{X^T V[\epsilon\mid X] X\}(X^TX)^{-1}\\ &=(X^TX)^{-1} \{X^T E[\epsilon\epsilon^T\mid X] X\}(X^TX)^{-1} \end{aligned} $$
$V[\hat\beta\mid X]\in R^{k\times k}$
$X^T V[\epsilon\mid X] X \in R^{k\times k}$
Standard errors¶
假设球形扰动(同方差、无自相关)
$$ V[\epsilon\mid X]=\sigma^2 I = \begin{bmatrix} \sigma^2 & & \\ & \ddots & \\ & & \sigma^2 \\ \end{bmatrix} $$
$$ \downarrow $$
$$ X^T V[\epsilon\mid X] X = [X_1, X_2,\dots,X_n] \begin{bmatrix} \sigma^2 & & \\ & \ddots & \\ & & \sigma^2 \\ \end{bmatrix} [X_1,X_2,\dots,X_n]^T=\sigma^2 X^TX $$
$$ V[\hat\beta\mid X]=\sigma^2 (X^TX)^{-1} X^TX (X^TX)^{-1}=\sigma^2 (X^TX)^{-1} $$
对扰动项方差 $\sigma^2$ 的无偏估计
$$ s^2\equiv\frac1{n-k}\sum_{i=1}^n e_i^2 = \frac{n}{n-k} \frac1n \sum_{i=1}^n e_i^2 $$
$$ \hat V[\hat\beta\mid X]= s^2 (X^TX)^{-1} $$
# standard errors
s2 = np.sum(epsilon_hat**2) / (n-k)
v_hat_beta_hat = s2 * np.linalg.inv(X.T @ X)
se = v_hat_beta_hat**0.5
print(v_hat_beta_hat, se)
[[0.00520589]] [[0.07215188]]
Heteroskedasticity robust standard errors¶
存在异方差,但无自相关
$$ V[\epsilon\mid X] = \begin{bmatrix} E[\epsilon_1^2\mid X] & & \\ & \ddots & \\ & & E[\epsilon_n^2\mid X] \\ \end{bmatrix} $$
$$ \downarrow $$
$$ \begin{aligned} \{X^TV[\epsilon\mid X]X\}_{k\times k} &=[X_1,X_2,\dots,X_n] \begin{bmatrix} E[\epsilon_1^2\mid X] & & \\ & \ddots & \\ & & E[\epsilon_n^2\mid X] \\ \end{bmatrix} [X_1,X_2,\dots,X_n]^T\\ &=[X_1 E[\epsilon_1^2 \mid X] , X_2 E[\epsilon_2^2 \mid X], \dots, X_n E[\epsilon_n^2 \mid X]]\ [X_1,X_2,\dots,X_n]^T\\ &=\sum_{i=1}^n X_i E[\epsilon_i^2\mid X] X_i^T \end{aligned}\\ $$
$$ \begin{aligned} V[\hat\beta\mid X] & =(X^TX)^{-1} \{\sum_{i=1}^n X_i E[\epsilon_i^2\mid X]X_i^T\} (X^TX)^{-1}\\ & =(X^TX)^{-1} \{\sum_{i=1}^n E[\epsilon_i^2\mid X] X_i X_i^T\} (X^TX)^{-1}\\ \end{aligned} $$
HC0¶
异方差稳健标准误 (Eicker, 1963, 1967; Huber, 1967; White, 1980)
$$ \hat V[\hat\beta\mid X]^{HC0}=(X^TX)^{-1} \{\sum_{i=1}^n e_i^2 X_i X_i^T\} (X^TX)^{-1}\\ $$
v_hat_beta_hat_hc0 = (
np.linalg.inv(X.T @ X)
@ X.T @ np.diag(epsilon_hat.reshape(-1))**2 @ X
@ np.linalg.inv(X.T @ X)
)
se_hc0 = v_hat_beta_hat_hc0**0.5
print(v_hat_beta_hat_hc0, se_hc0)
[[0.0040926]] [[0.0639734]]
HC1¶
小样本校正的异方差稳健标准误(Stata的默认稳健标准误,选择项 robust)(Hinkley, 1977)
$$ \hat V[\hat\beta\mid X]^{HC1}=\frac{n}{n-k} (X^TX)^{-1} \{\sum_{i=1}^n e_i^2 X_i X_i^T\} (X^TX)^{-1}\\ $$
v_hat_beta_hat_hc1 = (
np.linalg.inv(X.T @ X)
@ X.T @ np.diag(epsilon_hat.reshape(-1))**2 @ X
@ np.linalg.inv(X.T @ X)
) * n / (n - k)
se_hc1 = v_hat_beta_hat_hc0**0.5
print(v_hat_beta_hat_hc1, se_hc1)
[[0.00413394]] [[0.0639734]]
HC2¶
在同方差的情况下,$E[e_i^2]=(1-h_i)\sigma^2$,故 $E[\frac{e_i^2}{1-h_i}=\sigma^2$
定义如下稳健标准误(Horn, Horn and Duncan, 1975)
$$ \hat V[\hat\beta\mid X]^{HC2} = (X^TX)^{-1} \{ \sum_{i=1}^n \frac{e_i^2}{1-h_i} X_i X_i^T \} (X^TX)^{-1} $$
{stata}
vce(hc2)
h = np.diag(X @ np.linalg.inv(X.T @ X) @ X.T)
v_hat_beta_hat_hc2 = (
np.linalg.inv(X.T @ X)
@ X.T @ (np.diag(epsilon_hat.reshape(-1))**2 / (1 - np.diag(h))) @ X
@ np.linalg.inv(X.T @ X)
)
se_hc2 = v_hat_beta_hat_hc2**0.5
print(v_hat_beta_hat_hc2, se_hc2)
[[0.00415573]] [[0.06446497]]
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
results_hc0 = results.get_robustcov_results(cov_type='HC0')
print(results_hc0.summary())
results_hc1 = results.get_robustcov_results(cov_type='HC1')
print(results_hc1.summary())
results_hc2 = results.get_robustcov_results(cov_type='HC2')
print(results_hc2.summary())
OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.940 Model: OLS Adj. R-squared (uncentered): 0.940 Method: Least Squares F-statistic: 1557. Date: Thu, 17 Oct 2024 Prob (F-statistic): 2.26e-62 Time: 00:04:43 Log-Likelihood: -292.83 No. Observations: 100 AIC: 587.7 Df Residuals: 99 BIC: 590.3 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 2.8473 0.072 39.462 0.000 2.704 2.990 ============================================================================== Omnibus: 1.520 Durbin-Watson: 1.611 Prob(Omnibus): 0.468 Jarque-Bera (JB): 1.361 Skew: 0.142 Prob(JB): 0.506 Kurtosis: 2.505 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors assume that the covariance matrix of the errors is correctly specified. OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.940 Model: OLS Adj. R-squared (uncentered): 0.940 Method: Least Squares F-statistic: 1981. Date: Thu, 17 Oct 2024 Prob (F-statistic): 2.85e-67 Time: 00:04:43 Log-Likelihood: -292.83 No. Observations: 100 AIC: 587.7 Df Residuals: 99 BIC: 590.3 Df Model: 1 Covariance Type: HC0 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 2.8473 0.064 44.507 0.000 2.720 2.974 ============================================================================== Omnibus: 1.520 Durbin-Watson: 1.611 Prob(Omnibus): 0.468 Jarque-Bera (JB): 1.361 Skew: 0.142 Prob(JB): 0.506 Kurtosis: 2.505 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors are heteroscedasticity robust (HC0) OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.940 Model: OLS Adj. R-squared (uncentered): 0.940 Method: Least Squares F-statistic: 1961. Date: Thu, 17 Oct 2024 Prob (F-statistic): 4.57e-67 Time: 00:04:43 Log-Likelihood: -292.83 No. Observations: 100 AIC: 587.7 Df Residuals: 99 BIC: 590.3 Df Model: 1 Covariance Type: HC1 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 2.8473 0.064 44.284 0.000 2.720 2.975 ============================================================================== Omnibus: 1.520 Durbin-Watson: 1.611 Prob(Omnibus): 0.468 Jarque-Bera (JB): 1.361 Skew: 0.142 Prob(JB): 0.506 Kurtosis: 2.505 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors are heteroscedasticity robust (HC1) OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.940 Model: OLS Adj. R-squared (uncentered): 0.940 Method: Least Squares F-statistic: 1951. Date: Thu, 17 Oct 2024 Prob (F-statistic): 5.86e-67 Time: 00:04:43 Log-Likelihood: -292.83 No. Observations: 100 AIC: 587.7 Df Residuals: 99 BIC: 590.3 Df Model: 1 Covariance Type: HC2 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 2.8473 0.064 44.168 0.000 2.719 2.975 ============================================================================== Omnibus: 1.520 Durbin-Watson: 1.611 Prob(Omnibus): 0.468 Jarque-Bera (JB): 1.361 Skew: 0.142 Prob(JB): 0.506 Kurtosis: 2.505 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors are heteroscedasticity robust (HC2)
生成模拟数据¶
异方差
rng = np.random.default_rng(seed=42)
X = np.linspace(1, 10, 10)
X = np.repeat(X, 10)
y = 3 * X + rng.normal(0, X*7)
X += rng.normal(0, 1, len(X))
plt.scatter(X, y,
s=1, c='b')
<matplotlib.collections.PathCollection at 0x2d102301f70>
beta_hat = np.matmul(
np.matmul(
np.linalg.inv(
np.matmul(
X.reshape(-1, 1).T,
X.reshape(-1, 1)
)
),
X.reshape(-1, 1).T,
),
y
)
print(beta_hat, beta_hat.shape)
[[2.84726633]] (1, 1)
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
results_hc0 = results.get_robustcov_results(cov_type='HC0')
print(results_hc0.summary())
results_hc1 = results.get_robustcov_results(cov_type='HC1')
print(results_hc1.summary())
results_hc2 = results.get_robustcov_results(cov_type='HC2')
print(results_hc2.summary())
OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.101 Model: OLS Adj. R-squared (uncentered): 0.092 Method: Least Squares F-statistic: 11.14 Date: Wed, 16 Oct 2024 Prob (F-statistic): 0.00119 Time: 00:50:05 Log-Likelihood: -490.56 No. Observations: 100 AIC: 983.1 Df Residuals: 99 BIC: 985.7 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 1.7398 0.521 3.338 0.001 0.706 2.774 ============================================================================== Omnibus: 11.728 Durbin-Watson: 1.554 Prob(Omnibus): 0.003 Jarque-Bera (JB): 12.448 Skew: -0.729 Prob(JB): 0.00198 Kurtosis: 3.930 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors assume that the covariance matrix of the errors is correctly specified. OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.101 Model: OLS Adj. R-squared (uncentered): 0.092 Method: Least Squares F-statistic: 7.078 Date: Wed, 16 Oct 2024 Prob (F-statistic): 0.00910 Time: 00:50:05 Log-Likelihood: -490.56 No. Observations: 100 AIC: 983.1 Df Residuals: 99 BIC: 985.7 Df Model: 1 Covariance Type: HC0 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 1.7398 0.654 2.660 0.009 0.442 3.037 ============================================================================== Omnibus: 11.728 Durbin-Watson: 1.554 Prob(Omnibus): 0.003 Jarque-Bera (JB): 12.448 Skew: -0.729 Prob(JB): 0.00198 Kurtosis: 3.930 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors are heteroscedasticity robust (HC0) OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.101 Model: OLS Adj. R-squared (uncentered): 0.092 Method: Least Squares F-statistic: 7.007 Date: Wed, 16 Oct 2024 Prob (F-statistic): 0.00945 Time: 00:50:05 Log-Likelihood: -490.56 No. Observations: 100 AIC: 983.1 Df Residuals: 99 BIC: 985.7 Df Model: 1 Covariance Type: HC1 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 1.7398 0.657 2.647 0.009 0.436 3.044 ============================================================================== Omnibus: 11.728 Durbin-Watson: 1.554 Prob(Omnibus): 0.003 Jarque-Bera (JB): 12.448 Skew: -0.729 Prob(JB): 0.00198 Kurtosis: 3.930 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors are heteroscedasticity robust (HC1) OLS Regression Results ======================================================================================= Dep. Variable: y R-squared (uncentered): 0.101 Model: OLS Adj. R-squared (uncentered): 0.092 Method: Least Squares F-statistic: 6.933 Date: Wed, 16 Oct 2024 Prob (F-statistic): 0.00982 Time: 00:50:05 Log-Likelihood: -490.56 No. Observations: 100 AIC: 983.1 Df Residuals: 99 BIC: 985.7 Df Model: 1 Covariance Type: HC2 ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 1.7398 0.661 2.633 0.010 0.429 3.051 ============================================================================== Omnibus: 11.728 Durbin-Watson: 1.554 Prob(Omnibus): 0.003 Jarque-Bera (JB): 12.448 Skew: -0.729 Prob(JB): 0.00198 Kurtosis: 3.930 Cond. No. 1.00 ============================================================================== Notes: [1] R² is computed without centering (uncentered) since the model does not contain a constant. [2] Standard Errors are heteroscedasticity robust (HC2)