7 Multiple linear regression I

7.1 Multiple Regression Models

7.1.1 First-Order Model with Two Predictor Variables

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\]

  • regression surface/response surface
  • additive effects or not to interact
  • partial regression coefficients

7.1.2 First-Order Model with More than Two Predictor Variables

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... + \beta_{p-1}X_{i,p-1}+\varepsilon_i\]

\[Y_i = \beta_0 + \displaystyle\sum_{k=1}^{p-1}\beta_{k}X_{ik} + \varepsilon_i\]

\[Y_i = \displaystyle\sum_{k=0}^{p-1}\beta_{k}X_{ik} + \varepsilon_i,\text{ where }X_{i0} \equiv 1\]

  • hyperplane: the response function, which is a plane in more than two dimensions.

7.1.3 General Linear Regression Model

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + ... + \beta_{p-1}X_{i,p-1}+\varepsilon_i\]

where:

  • \(\beta_0, \beta_2, ..., \beta_{p-1}\) are parameters
  • \(X_{i1}, ..., X_{i,p-1}\) are known constants
  • \(\varepsilon_i\) are independent \(N(0,\sigma^2)\)
  • \(i = 1, ..., n\)

  • p-1 predictor variables
  • qualitative predictor variables
  • polynomia regression:

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i1}^2 + \varepsilon_i\]

can be written as

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\text{ if }X_{i2} = X_i^2\]

  • transformed variables:

\[logY_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\]

can be treated as a general linear regression model:

\[Y_i^{'} = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \varepsilon_i\text{ if }Y_i^{'}=logY_i\]

  • interaction effects

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i1}X_{i2} + \varepsilon_i\]

can be written as follows:

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i2} + \beta_3X_{i3} + \varepsilon_i\text{ let }X_{i3} = X_{i1}X_{i2}\]

  • combination of cases

\[Y_i = \beta_0 + \beta_1X_{i1} + \beta_2X_{i1}^2 + \beta_3X_{i2} + \beta_4X_{i2}^2 + \beta_5X_{i1}X_{i2} + \varepsilon_i\]

can be written as

\[Y_i = \beta_0 + \beta_1Z_{i1} + \beta_2Z_{i2} + \beta_3Z_{i3} + \beta_4Z_{i4} + \beta_5Z_{i5} + \varepsilon_i\]

, where we define \(Z_{i1}=X_{i1}, Z_{i2}=X_{i1}^2, Z_{i3}=X_{i2}, Z_{i4}=X_{i2}^2\text{ and }Z_{i5}=X_{i1}X_{i2}\)

  • meaning of linear model: refers to linear in the parameters

7.2 General Linear Regression Model in Matrix Terms

\[\underset{n \times 1}{\mathbf{Y}} = \underset{n \times p}{\mathbf{X}}\underset{p \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\varepsilon}}\]

, where

\[\underset{n \times 1}{\mathbf{Y}} = \begin{bmatrix}y_1 \\ y_2 \\ ... \\y_n\end{bmatrix}, \underset{n \times p}{\mathbf{X}} = \begin{bmatrix}1 & x_{11} & x_{12} & ... & x_{1,p-1} \\ 1 & x_{21} & x_{22} & ... & x_{2,p-1} \\ ... & ... & ...& ... & ... \\ 1 & x_{n1} & x_{n2} & ... & x_{n,p-1}\end{bmatrix}, \underset{p \times 1}{\boldsymbol{\beta}} = \begin{bmatrix}\beta_0 \\ \beta_1 \\ ... \\ \beta_{p-1} \end{bmatrix} and \underset{n \times 1}{\boldsymbol{\varepsilon}} = \begin{bmatrix}\varepsilon_1 \\ \varepsilon_2 \\ ... \\\varepsilon_n\end{bmatrix}\]

  • \(\mathbf{Y}\) is a vector of responses
  • \(\mathbf{\beta}\) is a vector of parameters
  • \(\mathbf{X}\) is a matrix of constants
  • \(\boldsymbol{\varepsilon}\) is a vector of independent normal random variables
  • expectation \(E(\boldsymbol{\varepsilon}) = \mathbf{0}\)
  • variane-covariance matrix: \(\sigma^2(\boldsymbol{\varepsilon}) = \sigma^2\mathbf{I}\)

Consequently, \(E(\mathbf{Y}) = \boldsymbol{X\beta}\) and \(\sigma^2(\boldsymbol{Y}) = \sigma^2\mathbf{I}\)

7.3 Estimation of Regression Coefficients

The least squares for general linear regression model:

\[Q = \displaystyle\sum_{i=1}^{n}(Y_i - \beta_{0} - \beta_1X_{i1} - ... - \beta_{p-1}X_{i,p-1})^2\]

The least squares estimators are to minimize Q.

\[\mathbf{X}'\mathbf{Xb} = \mathbf{X}'\mathbf{Y}\text{ and }\mathbf{b} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]

The method of maximum likelihood leads to the same estimators for normal error regression model, which is obtained by maximizing this likelihood function:

\[L(\boldsymbol{\beta},\sigma^2) = \frac{1}{(2\pi\sigma^2)^{n/2}}exp[-\frac{1}{2\sigma^2}\displaystyle\sum_{i=1}^{n}(Y_i - \beta_0 - \beta_1X_{i1} - ... - \beta_{p-1}X_{i,p-1})^2] \]

7.4 Fitted Values and Residuals

\[\hat{\mathbf{Y}} = \mathbf{X}\mathbf{b} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]

or

\[\hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}\text{, with }\underset{n \times n}{\mathbf{H}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\]

The matrix H is called hat matrix. And it’s symmetric and has the special property (called idempotency):

\[\mathbf{HH} = \mathbf{H}\]

\[\mathbf{e} = \mathbf{Y} - \mathbf{\hat{Y}} = \mathbf{Y} - \mathbf{HY} = (\mathbf{I}-\mathbf{H})\mathbf{Y}\]

and the matrix \(\mathbf{I}-\mathbf{H}\), like the matrix \(\mathbf{H}\), is symmetric and idempotent.

The variance-covariance matrix of residuals e:

\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]

and is estimated by:

\[s^2(\mathbf{3}) = MSE \times (\mathbf{I}-\mathbf{H})\]

7.5 Analysis of Variance Results

\[SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]

\[SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\]

\[SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]

\[MSR = \frac{SSR}{p-1}\]

\[MSE=\frac{SSE}{n-p}\]

Source of Variation SS df MS
Regression \(SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\) p-1 \(MSR = \frac{SSR}{p-1}\)
Error \(SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\) n-p \(MSE = \frac{SSE}{n-p}\)
Total \(SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\) n-1
  • F Test for Regression Relation

\[H_0: \beta_1=\beta_2=...=\beta_{p-1}=0\]

\[H_1:\text{ not all }\beta_{k}(k=1....p-1)\] equal zero

\[F^* = \frac{MSR}{MSE} \sim F(p-1,n-p)\]

  • Coefficient of Multiple Determination

\[R^2 = \frac{SSR}{SSTO} = 1 - \frac{SSE}{SSTO}\]

Adding more X variables to the regression model can only increase \(R^2\) and never reduce it.

The adjusted coefficient of multiple determination, denoted by \(R^2_a\):

\[R^2_a = 1-\frac{\frac{SSE}{n-p}}{\frac{SSTO}{n-1}} = 1-(\frac{n-1}{n-p})\frac{SSE}{SSTO}\]

  • Coefficient of Multiple Correlation

\[R = \sqrt{R^2}\]

7.6 Inferences about Regression Parameters

\[E(\mathbf{b}) = \boldsymbol{\beta}\]

\[\sigma^2(\mathbf{b}) = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1} \]

\[s^2(\mathbf{b}) = MSE \times (\mathbf{X}'\mathbf{X})^{-1}\]

  • Interval Estimation of \(\beta_k\)

\[\frac{b_k - \beta_k}{s(b_k)} \sim t(n-p), k = 0,1, ...., p-1\]

\[b_k \pm t(1-\frac{\alpha}{2};n-p)s(b_k)\]

  • Tests for \(\beta_k\)

\[H_0: \beta_k = 0\]

\[H_1: \beta_k \neq 0\]

\[t^* = \frac{b_k}{s(b_k)}\]

  • joint inferences

The Bonferroni joint confidence intervals can be used to estimate several regression coefficients simultaneously.

If g parameters are to be estimated jointly , then

\[b_k \pm t(1-\frac{\alpha}{2g};n-p)s(b_k)\]