7 Multiple linear regression I

7.1 Multiple Regression Models

7.1.1 First-Order Model with Two Predictor Variables

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\]

regression surface/response surface
additive effects or not to interact
partial regression coefficients

7.1.2 First-Order Model with More than Two Predictor Variables

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... + \beta_{p-1}X_{i,p-1}+\varepsilon_i\]

\[Y_i = \beta_0 + \displaystyle\sum_{k=1}^{p-1}\beta_{k}X_{ik} + \varepsilon_i\]

\[Y_i = \displaystyle\sum_{k=0}^{p-1}\beta_{k}X_{ik} + \varepsilon_i,\text{ where }X_{i0} \equiv 1\]

hyperplane: the response function, which is a plane in more than two dimensions.

7.1.3 General Linear Regression Model

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... + \beta_{p-1}X_{i,p-1}+\varepsilon_i\]

where:

\(\beta_0, \beta_2, ..., \beta_{p-1}\) are parameters
\(X_{i1}, ..., X_{i,p-1}\) are known constants
\(\varepsilon_i\) are independent \(N(0,\sigma^2)\)
\(i = 1, ..., n\)
p-1 predictor variables
qualitative predictor variables
polynomia regression:

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i1}^2 + \varepsilon_i\]

can be written as

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\text{ if }X_{i2} = X_i^2\]

transformed variables:

\[logY_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\]

can be treated as a general linear regression model:

\[Y_i^{'} = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\text{ if }Y_i^{'}=logY_i\]

interaction effects

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i1}X_{i2} + \varepsilon_i\]

can be written as follows:

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i3} + \varepsilon_i\text{ let }X_{i3} = X_{i1}X_{i2}\]

combination of cases

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i1}^2 + \beta_3X_{i2} + \beta_4X_{i2}^2 + \beta_5X_{i1}X_{i2} + \varepsilon_i\]

can be written as

\[Y_i = \beta_0 + \beta_1Z_{i1} + \beta_2Z_{i2} + \beta_3Z_{i3} + \beta_4Z_{i4} + \beta_5Z_{i5} + \varepsilon_i\]

, where we define \(Z_{i1}=X_{i1}, Z_{i2}=X_{i1}^2, Z_{i3}=X_{i2}, Z_{i4}=X_{i2}^2\text{ and }Z_{i5}=X_{i1}X_{i2}\)

meaning of linear model: refers to linear in the parameters

7.2 General Linear Regression Model in Matrix Terms

\[\underset{n \times 1}{\mathbf{Y}} = \underset{n \times p}{\mathbf{X}}\underset{p \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\varepsilon}}\]

, where

\[\underset{n \times 1}{\mathbf{Y}} = \begin{bmatrix}y_1 \\ y_2 \\ ... \\y_n\end{bmatrix}, \underset{n \times p}{\mathbf{X}} = \begin{bmatrix}1 & x_{11} & x_{12} & ... & x_{1,p-1} \\ 1 & x_{21} & x_{22} & ... & x_{2,p-1} \\ ... & ... & ...& ... & ... \\ 1 & x_{n1} & x_{n2} & ... & x_{n,p-1}\end{bmatrix}, \underset{p \times 1}{\boldsymbol{\beta}} = \begin{bmatrix}\beta_0 \\ \beta_1 \\ ... \\ \beta_{p-1} \end{bmatrix} and \underset{n \times 1}{\boldsymbol{\varepsilon}} = \begin{bmatrix}\varepsilon_1 \\ \varepsilon_2 \\ ... \\\varepsilon_n\end{bmatrix}\]

\(\mathbf{Y}\) is a vector of responses
\(\mathbf{\beta}\) is a vector of parameters
\(\mathbf{X}\) is a matrix of constants
\(\boldsymbol{\varepsilon}\) is a vector of independent normal random variables
expectation \(E(\boldsymbol{\varepsilon}) = \mathbf{0}\)
variane-covariance matrix: \(\sigma^2(\boldsymbol{\varepsilon}) = \sigma^2\mathbf{I}\)

Consequently, \(E(\mathbf{Y}) = \boldsymbol{X\beta}\) and \(\sigma^2(\boldsymbol{Y}) = \sigma^2\mathbf{I}\)

7.3 Estimation of Regression Coefficients

The least squares for general linear regression model:

\[Q = \displaystyle\sum_{i=1}^{n}(Y_i - \beta_{0} - \beta_1X_{i1} - ... - \beta_{p-1}X_{i,p-1})^2\]

The least squares estimators are to minimize Q.

\[\mathbf{X}'\mathbf{Xb} = \mathbf{X}'\mathbf{Y}\text{ and }\mathbf{b} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]

The method of maximum likelihood leads to the same estimators for normal error regression model, which is obtained by maximizing this likelihood function:

\[L(\boldsymbol{\beta},\sigma^2) = \frac{1}{(2\pi\sigma^2)^{n/2}}exp[-\frac{1}{2\sigma^2}\displaystyle\sum_{i=1}^{n}(Y_i - \beta_0 - \beta_1X_{i1} - ... - \beta_{p-1}X_{i,p-1})^2] \]

7.4 Fitted Values and Residuals

\[\hat{\mathbf{Y}} = \mathbf{X}\mathbf{b} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]

\[\hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}\text{, with }\underset{n \times n}{\mathbf{H}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\]

The matrix H is called hat matrix. And it’s symmetric and has the special property (called idempotency):

\[\mathbf{HH} = \mathbf{H}\]

\[\mathbf{e} = \mathbf{Y} - \mathbf{\hat{Y}} = \mathbf{Y} - \mathbf{HY} = (\mathbf{I}-\mathbf{H})\mathbf{Y}\]

and the matrix \(\mathbf{I}-\mathbf{H}\), like the matrix \(\mathbf{H}\), is symmetric and idempotent.

The variance-covariance matrix of residuals e:

\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]

and is estimated by:

\[s^2(\mathbf{3}) = MSE \times (\mathbf{I}-\mathbf{H})\]

7.5 Analysis of Variance Results

\[SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]

\[SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\]

\[SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]

\[MSR = \frac{SSR}{p-1}\]

\[MSE=\frac{SSE}{n-p}\]

Source of Variation	SS	df	MS
Regression	\(SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\)	p-1	\(MSR = \frac{SSR}{p-1}\)
Error	\(SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\)	n-p	\(MSE = \frac{SSE}{n-p}\)
Total	\(SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\)	n-1

F Test for Regression Relation

\[H_0: \beta_1=\beta_2=...=\beta_{p-1}=0\]

\[H_1:\text{ not all }\beta_{k}(k=1....p-1)\] equal zero

\[F^* = \frac{MSR}{MSE} \sim F(p-1,n-p)\]

Coefficient of Multiple Determination

\[R^2 = \frac{SSR}{SSTO} = 1 - \frac{SSE}{SSTO}\]

Adding more X variables to the regression model can only increase \(R^2\) and never reduce it.

The adjusted coefficient of multiple determination, denoted by \(R^2_a\):

\[R^2_a = 1-\frac{\frac{SSE}{n-p}}{\frac{SSTO}{n-1}} = 1-(\frac{n-1}{n-p})\frac{SSE}{SSTO}\]

Coefficient of Multiple Correlation

\[R = \sqrt{R^2}\]

7.6 Inferences about Regression Parameters

\[E(\mathbf{b}) = \boldsymbol{\beta}\]

\[\sigma^2(\mathbf{b}) = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1} \]

\[s^2(\mathbf{b}) = MSE \times (\mathbf{X}'\mathbf{X})^{-1}\]

Interval Estimation of \(\beta_k\)

\[\frac{b_k - \beta_k}{s(b_k)} \sim t(n-p), k = 0,1, ...., p-1\]

\[b_k \pm t(1-\frac{\alpha}{2};n-p)s(b_k)\]

Tests for \(\beta_k\)

\[H_0: \beta_k = 0\]

\[H_1: \beta_k \neq 0\]

\[t^* = \frac{b_k}{s(b_k)}\]

joint inferences

The Bonferroni joint confidence intervals can be used to estimate several regression coefficients simultaneously.

If g parameters are to be estimated jointly , then

\[b_k \pm t(1-\frac{\alpha}{2g};n-p)s(b_k)\]