import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

rng = np.random.default_rng(seed=42)

X = np.linspace(1, 10, 10)
X = np.repeat(X, 10)

y = 3 * X + rng.normal(0, 5, size=len(X))
X += rng.normal(0, 1, len(X))

X = X.reshape(-1, 1)
y = y.reshape(-1, 1)

n, k = X.shape[0], 1

plt.scatter(X, y, s=2)
plt.show()

print(y.shape, X.shape, 
      f'n={n} k={k}')

(100, 1) (100, 1) n=100 k=1

beta_hat = np.linalg.inv(X.T @ X) @ X.T @ y
epsilon_hat = y - X @ beta_hat

beta_hat.shape, epsilon_hat.shape, beta_hat,

((1, 1), (100, 1), array([[2.84726633]]))

# standard errors
s2 = np.sum(epsilon_hat**2) / (n-k)

v_hat_beta_hat = s2 * np.linalg.inv(X.T @ X)

se = v_hat_beta_hat**0.5

print(v_hat_beta_hat, se)

[[0.00520589]] [[0.07215188]]

v_hat_beta_hat_hc0 = (
    np.linalg.inv(X.T @ X)
    @ X.T @ np.diag(epsilon_hat.reshape(-1))**2 @ X
    @ np.linalg.inv(X.T @ X)
)

se_hc0 = v_hat_beta_hat_hc0**0.5

print(v_hat_beta_hat_hc0, se_hc0)

[[0.0040926]] [[0.0639734]]

v_hat_beta_hat_hc1 = (
    np.linalg.inv(X.T @ X)
    @ X.T @ np.diag(epsilon_hat.reshape(-1))**2 @ X
    @ np.linalg.inv(X.T @ X)
) * n / (n - k)

se_hc1 = v_hat_beta_hat_hc0**0.5

print(v_hat_beta_hat_hc1, se_hc1)

[[0.00413394]] [[0.0639734]]

h = np.diag(X @ np.linalg.inv(X.T @ X) @ X.T)

v_hat_beta_hat_hc2 = (
    np.linalg.inv(X.T @ X)
    @ X.T @ (np.diag(epsilon_hat.reshape(-1))**2 / (1 - np.diag(h))) @ X
    @ np.linalg.inv(X.T @ X)
)

se_hc2 = v_hat_beta_hat_hc2**0.5

print(v_hat_beta_hat_hc2, se_hc2)

[[0.00415573]] [[0.06446497]]

model = sm.OLS(y, X)
results = model.fit()
print(results.summary())

results_hc0 = results.get_robustcov_results(cov_type='HC0')
print(results_hc0.summary())

results_hc1 = results.get_robustcov_results(cov_type='HC1')
print(results_hc1.summary())

results_hc2 = results.get_robustcov_results(cov_type='HC2')
print(results_hc2.summary())

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.940
Model:                            OLS   Adj. R-squared (uncentered):              0.940
Method:                 Least Squares   F-statistic:                              1557.
Date:                Thu, 17 Oct 2024   Prob (F-statistic):                    2.26e-62
Time:                        00:04:43   Log-Likelihood:                         -292.83
No. Observations:                 100   AIC:                                      587.7
Df Residuals:                      99   BIC:                                      590.3
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.8473      0.072     39.462      0.000       2.704       2.990
==============================================================================
Omnibus:                        1.520   Durbin-Watson:                   1.611
Prob(Omnibus):                  0.468   Jarque-Bera (JB):                1.361
Skew:                           0.142   Prob(JB):                        0.506
Kurtosis:                       2.505   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.940
Model:                            OLS   Adj. R-squared (uncentered):              0.940
Method:                 Least Squares   F-statistic:                              1981.
Date:                Thu, 17 Oct 2024   Prob (F-statistic):                    2.85e-67
Time:                        00:04:43   Log-Likelihood:                         -292.83
No. Observations:                 100   AIC:                                      587.7
Df Residuals:                      99   BIC:                                      590.3
Df Model:                           1                                                  
Covariance Type:                  HC0                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.8473      0.064     44.507      0.000       2.720       2.974
==============================================================================
Omnibus:                        1.520   Durbin-Watson:                   1.611
Prob(Omnibus):                  0.468   Jarque-Bera (JB):                1.361
Skew:                           0.142   Prob(JB):                        0.506
Kurtosis:                       2.505   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors are heteroscedasticity robust (HC0)
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.940
Model:                            OLS   Adj. R-squared (uncentered):              0.940
Method:                 Least Squares   F-statistic:                              1961.
Date:                Thu, 17 Oct 2024   Prob (F-statistic):                    4.57e-67
Time:                        00:04:43   Log-Likelihood:                         -292.83
No. Observations:                 100   AIC:                                      587.7
Df Residuals:                      99   BIC:                                      590.3
Df Model:                           1                                                  
Covariance Type:                  HC1                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.8473      0.064     44.284      0.000       2.720       2.975
==============================================================================
Omnibus:                        1.520   Durbin-Watson:                   1.611
Prob(Omnibus):                  0.468   Jarque-Bera (JB):                1.361
Skew:                           0.142   Prob(JB):                        0.506
Kurtosis:                       2.505   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors are heteroscedasticity robust (HC1)
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.940
Model:                            OLS   Adj. R-squared (uncentered):              0.940
Method:                 Least Squares   F-statistic:                              1951.
Date:                Thu, 17 Oct 2024   Prob (F-statistic):                    5.86e-67
Time:                        00:04:43   Log-Likelihood:                         -292.83
No. Observations:                 100   AIC:                                      587.7
Df Residuals:                      99   BIC:                                      590.3
Df Model:                           1                                                  
Covariance Type:                  HC2                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.8473      0.064     44.168      0.000       2.719       2.975
==============================================================================
Omnibus:                        1.520   Durbin-Watson:                   1.611
Prob(Omnibus):                  0.468   Jarque-Bera (JB):                1.361
Skew:                           0.142   Prob(JB):                        0.506
Kurtosis:                       2.505   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors are heteroscedasticity robust (HC2)

rng = np.random.default_rng(seed=42)

X = np.linspace(1, 10, 10)
X = np.repeat(X, 10)

y = 3 * X + rng.normal(0, X*7)
X += rng.normal(0, 1, len(X))
plt.scatter(X, y, 
            s=1, c='b')

<matplotlib.collections.PathCollection at 0x2d102301f70>

beta_hat = np.matmul(
    np.matmul(
        np.linalg.inv(
            np.matmul(
                X.reshape(-1, 1).T,
                X.reshape(-1, 1)
            )
        ),
        X.reshape(-1, 1).T,
    ),
    y
)

print(beta_hat, beta_hat.shape)

[[2.84726633]] (1, 1)

model = sm.OLS(y, X)
results = model.fit()
print(results.summary())

results_hc0 = results.get_robustcov_results(cov_type='HC0')
print(results_hc0.summary())

results_hc1 = results.get_robustcov_results(cov_type='HC1')
print(results_hc1.summary())

results_hc2 = results.get_robustcov_results(cov_type='HC2')
print(results_hc2.summary())

                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.101
Model:                            OLS   Adj. R-squared (uncentered):              0.092
Method:                 Least Squares   F-statistic:                              11.14
Date:                Wed, 16 Oct 2024   Prob (F-statistic):                     0.00119
Time:                        00:50:05   Log-Likelihood:                         -490.56
No. Observations:                 100   AIC:                                      983.1
Df Residuals:                      99   BIC:                                      985.7
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.7398      0.521      3.338      0.001       0.706       2.774
==============================================================================
Omnibus:                       11.728   Durbin-Watson:                   1.554
Prob(Omnibus):                  0.003   Jarque-Bera (JB):               12.448
Skew:                          -0.729   Prob(JB):                      0.00198
Kurtosis:                       3.930   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.101
Model:                            OLS   Adj. R-squared (uncentered):              0.092
Method:                 Least Squares   F-statistic:                              7.078
Date:                Wed, 16 Oct 2024   Prob (F-statistic):                     0.00910
Time:                        00:50:05   Log-Likelihood:                         -490.56
No. Observations:                 100   AIC:                                      983.1
Df Residuals:                      99   BIC:                                      985.7
Df Model:                           1                                                  
Covariance Type:                  HC0                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.7398      0.654      2.660      0.009       0.442       3.037
==============================================================================
Omnibus:                       11.728   Durbin-Watson:                   1.554
Prob(Omnibus):                  0.003   Jarque-Bera (JB):               12.448
Skew:                          -0.729   Prob(JB):                      0.00198
Kurtosis:                       3.930   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors are heteroscedasticity robust (HC0)
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.101
Model:                            OLS   Adj. R-squared (uncentered):              0.092
Method:                 Least Squares   F-statistic:                              7.007
Date:                Wed, 16 Oct 2024   Prob (F-statistic):                     0.00945
Time:                        00:50:05   Log-Likelihood:                         -490.56
No. Observations:                 100   AIC:                                      983.1
Df Residuals:                      99   BIC:                                      985.7
Df Model:                           1                                                  
Covariance Type:                  HC1                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.7398      0.657      2.647      0.009       0.436       3.044
==============================================================================
Omnibus:                       11.728   Durbin-Watson:                   1.554
Prob(Omnibus):                  0.003   Jarque-Bera (JB):               12.448
Skew:                          -0.729   Prob(JB):                      0.00198
Kurtosis:                       3.930   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors are heteroscedasticity robust (HC1)
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.101
Model:                            OLS   Adj. R-squared (uncentered):              0.092
Method:                 Least Squares   F-statistic:                              6.933
Date:                Wed, 16 Oct 2024   Prob (F-statistic):                     0.00982
Time:                        00:50:05   Log-Likelihood:                         -490.56
No. Observations:                 100   AIC:                                      983.1
Df Residuals:                      99   BIC:                                      985.7
Df Model:                           1                                                  
Covariance Type:                  HC2                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             1.7398      0.661      2.633      0.010       0.429       3.051
==============================================================================
Omnibus:                       11.728   Durbin-Watson:                   1.554
Prob(Omnibus):                  0.003   Jarque-Bera (JB):               12.448
Skew:                          -0.729   Prob(JB):                      0.00198
Kurtosis:                       3.930   Cond. No.                         1.00
==============================================================================

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors are heteroscedasticity robust (HC2)

生成模拟数据¶

Sandwich estimator¶

Standard errors¶

Heteroskedasticity robust standard errors¶

HC0¶

HC1¶

HC2¶

生成模拟数据¶

References¶