7 Multiple linear regression I
7.1 Multiple Regression Models
7.1.1 First-Order Model with Two Predictor Variables
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\]
- regression surface/response surface
- additive effects or not to interact
- partial regression coefficients
7.1.2 First-Order Model with More than Two Predictor Variables
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... + \beta_{p-1}X_{i,p-1}+\varepsilon_i\]
\[Y_i = \beta_0 + \displaystyle\sum_{k=1}^{p-1}\beta_{k}X_{ik} + \varepsilon_i\]
\[Y_i = \displaystyle\sum_{k=0}^{p-1}\beta_{k}X_{ik} + \varepsilon_i,\text{ where }X_{i0} \equiv 1\]
- hyperplane: the response function, which is a plane in more than two dimensions.
7.1.3 General Linear Regression Model
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... + \beta_{p-1}X_{i,p-1}+\varepsilon_i\]
where:
- \(\beta_0, \beta_2, ..., \beta_{p-1}\) are parameters
- \(X_{i1}, ..., X_{i,p-1}\) are known constants
- \(\varepsilon_i\) are independent \(N(0,\sigma^2)\)
\(i = 1, ..., n\)
- p-1 predictor variables
- qualitative predictor variables
polynomia regression:
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i1}^2 + \varepsilon_i\]
can be written as
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\text{ if }X_{i2} = X_i^2\]
- transformed variables:
\[logY_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\]
can be treated as a general linear regression model:
\[Y_i^{'} = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\text{ if }Y_i^{'}=logY_i\]
- interaction effects
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i1}X_{i2} + \varepsilon_i\]
can be written as follows:
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i3} + \varepsilon_i\text{ let }X_{i3} = X_{i1}X_{i2}\]
- combination of cases
\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i1}^2 + \beta_3X_{i2} + \beta_4X_{i2}^2 + \beta_5X_{i1}X_{i2} + \varepsilon_i\]
can be written as
\[Y_i = \beta_0 + \beta_1Z_{i1} + \beta_2Z_{i2} + \beta_3Z_{i3} + \beta_4Z_{i4} + \beta_5Z_{i5} + \varepsilon_i\]
, where we define \(Z_{i1}=X_{i1}, Z_{i2}=X_{i1}^2, Z_{i3}=X_{i2}, Z_{i4}=X_{i2}^2\text{ and }Z_{i5}=X_{i1}X_{i2}\)
- meaning of linear model: refers to linear in the parameters
7.2 General Linear Regression Model in Matrix Terms
\[\underset{n \times 1}{\mathbf{Y}} = \underset{n \times p}{\mathbf{X}}\underset{p \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\varepsilon}}\]
, where
\[\underset{n \times 1}{\mathbf{Y}} = \begin{bmatrix}y_1 \\ y_2 \\ ... \\y_n\end{bmatrix}, \underset{n \times p}{\mathbf{X}} = \begin{bmatrix}1 & x_{11} & x_{12} & ... & x_{1,p-1} \\ 1 & x_{21} & x_{22} & ... & x_{2,p-1} \\ ... & ... & ...& ... & ... \\ 1 & x_{n1} & x_{n2} & ... & x_{n,p-1}\end{bmatrix}, \underset{p \times 1}{\boldsymbol{\beta}} = \begin{bmatrix}\beta_0 \\ \beta_1 \\ ... \\ \beta_{p-1} \end{bmatrix} and \underset{n \times 1}{\boldsymbol{\varepsilon}} = \begin{bmatrix}\varepsilon_1 \\ \varepsilon_2 \\ ... \\\varepsilon_n\end{bmatrix}\]
- \(\mathbf{Y}\) is a vector of responses
- \(\mathbf{\beta}\) is a vector of parameters
- \(\mathbf{X}\) is a matrix of constants
- \(\boldsymbol{\varepsilon}\) is a vector of independent normal random variables
- expectation \(E(\boldsymbol{\varepsilon}) = \mathbf{0}\)
- variane-covariance matrix: \(\sigma^2(\boldsymbol{\varepsilon}) = \sigma^2\mathbf{I}\)
Consequently, \(E(\mathbf{Y}) = \boldsymbol{X\beta}\) and \(\sigma^2(\boldsymbol{Y}) = \sigma^2\mathbf{I}\)
7.3 Estimation of Regression Coefficients
The least squares for general linear regression model:
\[Q = \displaystyle\sum_{i=1}^{n}(Y_i - \beta_{0} - \beta_1X_{i1} - ... - \beta_{p-1}X_{i,p-1})^2\]
The least squares estimators are to minimize Q.
\[\mathbf{X}'\mathbf{Xb} = \mathbf{X}'\mathbf{Y}\text{ and }\mathbf{b} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]
The method of maximum likelihood leads to the same estimators for normal error regression model, which is obtained by maximizing this likelihood function:
\[L(\boldsymbol{\beta},\sigma^2) = \frac{1}{(2\pi\sigma^2)^{n/2}}exp[-\frac{1}{2\sigma^2}\displaystyle\sum_{i=1}^{n}(Y_i - \beta_0 - \beta_1X_{i1} - ... - \beta_{p-1}X_{i,p-1})^2] \]
7.4 Fitted Values and Residuals
\[\hat{\mathbf{Y}} = \mathbf{X}\mathbf{b} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]
or
\[\hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}\text{, with }\underset{n \times n}{\mathbf{H}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\]
The matrix H is called hat matrix. And it’s symmetric and has the special property (called idempotency):
\[\mathbf{HH} = \mathbf{H}\]
\[\mathbf{e} = \mathbf{Y} - \mathbf{\hat{Y}} = \mathbf{Y} - \mathbf{HY} = (\mathbf{I}-\mathbf{H})\mathbf{Y}\]
and the matrix \(\mathbf{I}-\mathbf{H}\), like the matrix \(\mathbf{H}\), is symmetric and idempotent.
The variance-covariance matrix of residuals e:
\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]
and is estimated by:
\[s^2(\mathbf{3}) = MSE \times (\mathbf{I}-\mathbf{H})\]
7.5 Analysis of Variance Results
\[SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]
\[SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\]
\[SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]
\[MSR = \frac{SSR}{p-1}\]
\[MSE=\frac{SSE}{n-p}\]
Source of Variation | SS | df | MS |
---|---|---|---|
Regression | \(SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\) | p-1 | \(MSR = \frac{SSR}{p-1}\) |
Error | \(SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\) | n-p | \(MSE = \frac{SSE}{n-p}\) |
Total | \(SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\) | n-1 |
- F Test for Regression Relation
\[H_0: \beta_1=\beta_2=...=\beta_{p-1}=0\]
\[H_1:\text{ not all }\beta_{k}(k=1....p-1)\] equal zero
\[F^* = \frac{MSR}{MSE} \sim F(p-1,n-p)\]
- Coefficient of Multiple Determination
\[R^2 = \frac{SSR}{SSTO} = 1 - \frac{SSE}{SSTO}\]
Adding more X variables to the regression model can only increase \(R^2\) and never reduce it.
The adjusted coefficient of multiple determination, denoted by \(R^2_a\):
\[R^2_a = 1-\frac{\frac{SSE}{n-p}}{\frac{SSTO}{n-1}} = 1-(\frac{n-1}{n-p})\frac{SSE}{SSTO}\]
- Coefficient of Multiple Correlation
\[R = \sqrt{R^2}\]
7.6 Inferences about Regression Parameters
\[E(\mathbf{b}) = \boldsymbol{\beta}\]
\[\sigma^2(\mathbf{b}) = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1} \]
\[s^2(\mathbf{b}) = MSE \times (\mathbf{X}'\mathbf{X})^{-1}\]
- Interval Estimation of \(\beta_k\)
\[\frac{b_k - \beta_k}{s(b_k)} \sim t(n-p), k = 0,1, ...., p-1\]
\[b_k \pm t(1-\frac{\alpha}{2};n-p)s(b_k)\]
- Tests for \(\beta_k\)
\[H_0: \beta_k = 0\]
\[H_1: \beta_k \neq 0\]
\[t^* = \frac{b_k}{s(b_k)}\]
- joint inferences
The Bonferroni joint confidence intervals can be used to estimate several regression coefficients simultaneously.
If g parameters are to be estimated jointly , then
\[b_k \pm t(1-\frac{\alpha}{2g};n-p)s(b_k)\]