6 Matrix Approach to Simple Linear Regression Analysis
6.1 Matrices
- Definition
- matrix
- elements
- dimension
- Notation: a boldface symbol, such as A, X or Z.
- Square Matrix: the number of rows equals the number of columns
- Vector
- column vector/vector: only one column matrix
- row vector
- Transpose: A’ (A prime) is the transpose of a matrix A
- Equality of Matrices: same dimension and all same corresponding elements
- design matrix
6.2 Matrix Addition and Subtraction
- same dimension
- the sum or difference of the corresponding elements of the two matrixs
- A + B = B + A
6.3 Matrix Multiplication
- Multiplication of a Matrix by a scalar
- a scalar is an ordinary number or a symbol representing a number
- Multiplication of a Matrix by a Matrix
- the product AB, we say that A is postmultiplied by B or B is premultiplied by A
- \(AB \neq BA\)
In general, if A has dimension r * c and B has dimension c * s, the product AB is a matrix of dimension r * s, which is
\[AB_{r \times s} = \begin{bmatrix}\sum_{k=1}^{c} a_{ik}b_{kj}\end{bmatrix}\text{, where }i=1,...,r;j=1,...,s\]
- Regression Examples
\[Y'Y_{1 \times 1} = \begin{bmatrix} Y_{1} & Y_{2} & ... & Y_{n} \end{bmatrix}\begin{bmatrix}Y_{1} \\ Y_{2} \\ ... \\ Y_{n} \end{bmatrix} = Y_{1}^{2} + Y_{2}^{2} + ... + Y_{n}^{2} = \sum Y_{i}^{2}\]
\[X'X_{2 \times 2} = \begin{bmatrix} 1 & 1 & ... & 1 \\ X_{1} & X_{2} & ... & X_{n} \end{bmatrix}\begin{bmatrix} 1 & X_{1} \\ 1 & X_{2} \\ ... \\ 1 & X_{n} \end{bmatrix} = \begin{bmatrix} n & \sum X_{i} \\ \sum X_{i} & \sum X_{i}^{2} \end{bmatrix}\]
\[X'Y_{2 \times 1} = \begin{bmatrix} 1 & 1 & ... & 1 \\ X_{1} & X_{2} & ... & X_{n} \end{bmatrix}\begin{bmatrix} Y_{1} \\ Y_{2} \\ ... \\ Y_{n} \end{bmatrix} = \begin{bmatrix} \sum Y_{i} \\ \sum X_{i}Y_{i} \end{bmatrix}\]
6.4 Special Types of Matrices
- Symmetric Matrix: A = A’
- Symmetric matrix is necessarily square
- premultiply a matrix by its transpose, say X’X is symmetric
- Diagonal Matrix: off-diagonal elements are all zeros
- Identity Matrix, denoted by I: a diagonal matrix whose elements on the main diagonal are all 1s.
- AI = IA = A, \(A, I \in \mathbb{R}^{r \times r}\)
- Scalar Matrix: a diagonal matrix whose main-diagonal elements are the same, can be expressed as kI
- Vector and matrix with all elements unity
- a column vector with all elements 1 will be denoted by 1
- a square matrix with all elements 1 will be denoted by J
- 1’1 = n
- 11’ = J
- Zero Vector: a vector containing only zeros, denoted by 0
6.5 Linear Dependence and Rank of Matrix
- Linear dependence
We define the set of c column vectors \(C_{1}, ..., C_{c}\) in an r \(\times\) c matrix to be linearly dependent if one vector can be expressed as a linear combination of others. If no vector in the set can be so expressed, we define the set of vectors to be linearly independent.
In a more general, when c scalars \(k_{1},...,k_{c}\), not all zero, can be found such that:
\[k_{1}\mathbf{C_{1}} + k_{2}\mathbf{C_{2}} + ... + k_{c}\mathbf{C_{c}} = \mathbf{0}\]
where 0 denotes the zero column vector, the c column vectors are linearly dependent. If the only set of scalars for which the equality holds is \(k_{1} = k_{2} = ... = k_{c} = 0\), the set of c column vectors is linearly independent.
- Rank of Matrix: the maximum number of linearly independent columns in the matrix
- the rank of r \(\times\) c matrix cannot exceed min(r, c)
- When a matrix is the product of two matrixs, its rank cannot exceed the smaller of the two ranks for the matrices being multiplied.
6.6 Inverse of a Matrix
The inverse of a matrix \(\mathbf{A}\) is another matrix, denoted by \(\mathbf{A^{-1}}\), such that:
\[\mathbf{A}^{-1}\mathbf{A} = \mathbf{AA}^{-1} = \mathbf{I}\]
where I is the identity matrix.
- the inverse of a matrix is defined only for square matrix
- many square matrix do not have inverse
- the inverse of a square matrix, if exits, is unique
Finding the inverse
- An inverse of a square \(r \times r\) matrix exists if the rank of the matrix is r. Such a matrix is said to be nonsingular or of full rank.
- An \(r \times r\) matrix with rank less than r is said to be singular or not of full rank, and does not have an inverse.
- The inverse of an \(r \times r\) matrix of full rank also has rank r.
If:
\[\mathbf{A}_{2 \times 2} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\]
then:
\[\mathbf{A}_{2 \times 2}^{-1} = \begin{bmatrix} \frac{d}{D} & \frac{-b}{D} \\ \frac{-c}{D} & \frac{a}{D} \end{bmatrix}\]
where:
\[D = ad - bc\]
D is called the determinant(行列式) of the matrix A. If A were singular, its determinant would equal zero and no inverse of A would exist.
Regression Example
\[\mathbf{X}'\mathbf{X}_{2 \times 2} = \begin{bmatrix} n & \sum X_{i} \\ \sum X_{i} & \sum X_{i}^{2} \end{bmatrix}\]
\[ a = n, b = c = \sum{X_{i}}, d = \sum{X_{i}^{2}} \]
\[ D = n\sum{X_{i}^{2}} - (\sum{X_{i}})^{2} = n\sum{(X_{i} - \bar{X})}^{2}\]
\[(\mathbf{X}'\mathbf{X})_{2 \times 2}^{-1} = \begin{bmatrix} \frac{1}{n} + \frac{\bar{X}^2}{\sum(X_{i} - \bar{X})^{2}} & \frac{-\bar{X}}{\sum{(X_{i} - \bar{X})^2}} \\ \frac{-\bar{X}}{\sum{(X_{i} - \bar{X})^2}} & \frac{1}{\sum{(X_{i} - \bar{X})^2}} \end{bmatrix}\]
6.7 Some Basic Results for Matrics
- A + B = B + A
- (A + B) + C = A + (B + C)
- (AB)C = A(BC)
- C(A + B) = CA + CB
- k(A + B) = kA + kB
- (A’)’ = A
- (A + B)‘= A’ + B’
- (AB)‘= B’A’
- (ABC)‘= C’B’A’
- \((AB)^{-1} = B^{-1}A^{-1}\)
- \((ABC)^{-1} = C^{-1}B^{-1}A^{-1}\)
- \((A^{-1})^{-1} = A\)
- \((A')^{-1} = (A^{-1})'\)
6.8 Random Vectors and Matrices
- A random vector or matrix contains elements that are random variables.
- Expectation of random vector or matrix:
\[\mathbf{Y}_{3 \times 1} = \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \end{bmatrix}\text{, and } \mathbf{E(Y)}_{3 \times 1 } = \begin{bmatrix} E(Y_1) \\ E(Y_2) \\ E(Y_3) \end{bmatrix}\]
- Variance-Covariance Matrix of Random Vector
\[\sigma^2(\mathbf{Y})_{n \times n} = \begin{bmatrix} \sigma^2(Y_1) & \sigma(Y_1,Y_2) & ... & \sigma(Y_1, Y_n) \\ \sigma(Y_2,Y_1) & \sigma^2(Y_2) & ... & \sigma(Y_2, Y_n) \\ ... & ... & ... & ...\\ \sigma(Y_n, Y_1) & \sigma(Y_n, Y_2) & ... & \sigma^2(Y_n) \end{bmatrix}\]
which is a symmetric matrix.
- Some Basic Rules
\[\mathbf{W} = \mathbf{AY}\],
which W and Y are two random vectors and A is a constant matrix
\[E(\mathbf{A}) = \mathbf{A}\]
\[E(\mathbf{W}) = E(\mathbf{AY}) = \mathbf{A}E(\mathbf{Y})\]
\[\sigma^2(\mathbf{W}) = \sigma^2(\mathbf{AY}) = \mathbf{A}\sigma^2(\mathbf{Y})\mathbf{A'}\]
- Multivariate Normal Distribution
6.9 Simple Linear Regression Model in Matrix Terms
The normal error regression model in matrix terms is:
\[\underset{n \times 1}{\mathbf{Y}} = \underset{n \times 2}{\mathbf{X}}\underset{2 \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\varepsilon}}\] , where
\[\underset{n \times 1}{\mathbf{Y}} = \begin{bmatrix}y_1 \\ y_2 \\ ... \\y_n\end{bmatrix}, \underset{n \times 2}{\mathbf{X}} = \begin{bmatrix}1 & x_1 \\ 1 & x_2 \\ ... & ... \\ 1 & x_n \end{bmatrix}, \underset{2 \times 1}{\boldsymbol{\beta}} = \begin{bmatrix}\beta_0 \\ \beta_1 \end{bmatrix} and \underset{n \times 1}{\boldsymbol{\varepsilon}} = \begin{bmatrix}\varepsilon_1 \\ \varepsilon_2 \\ ... \\\varepsilon_n\end{bmatrix}\]
6.10 Leasst Squares Estimation of Regression Parameters
\[\mathbf{b} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y},\]
where b is the vector of the least squares regression coefficients:
\[ \mathbf{b} = \begin{bmatrix} b_0 \\ b_1 \end{bmatrix} \]
Note that, the inverse is only valid for square matrix and \(\mathbf{X}'\mathbf{X}\) is definitely a square matrix.
6.11 Fitted Values and Residuals
\[\hat{\mathbf{Y}} = \mathbf{X}\mathbf{b} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]
or
\[\hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}\text{, with }\underset{n \times n}{\mathbf{H}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\]
The matrix H is called hat matrix. And it’s symmetric and has the special property (called idempotency):
\[\mathbf{HH} = \mathbf{H}\]
\[\mathbf{e} = \mathbf{Y} - \mathbf{\hat{Y}} = \mathbf{Y} - \mathbf{HY} = (\mathbf{I}-\mathbf{H})\mathbf{Y}\]
and the matrix \(\mathbf{I}-\mathbf{H}\), like the matrix \(\mathbf{H}\), is symmetric and idempotent.
The variance-covariance matrix of residuals e:
\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]
and is estimated by:
\[s^2(\mathbf{3}) = MSE \times (\mathbf{I}-\mathbf{H})\]
- Proof:
\[\sigma^2(\mathbf{e}) = \sigma^2((\mathbf{I}-\mathbf{H})\mathbf{Y}) = (\mathbf{I}-\mathbf{H})\times \sigma^2(\mathbf{Y}) \times (\mathbf{I}-\mathbf{H})'\]
\[\sigma^2(\mathbf{Y})= \sigma^2 \times \mathbf{I}\]
\[(\mathbf{I}-\mathbf{H})' = (\mathbf{I}-\mathbf{H})\]
\[(\mathbf{I}-\mathbf{H})(\mathbf{I}-\mathbf{H}) = \mathbf{I}-\mathbf{H}\]
\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]
6.12 Analysis of Variance Results
\[SSTO = \sum(Y_i - \bar{Y})^2 = \sum Y_i^2 - \frac{(\sum Y_i)^2}{n} = \mathbf{Y}'\mathbf{Y} - (\frac{1}{n})\mathbf{Y}'\mathbf{JY}\]
\[SSE = \mathbf{e}'\mathbf{e} = (\mathbf{Y}-\mathbf{Xb})'(\mathbf{Y}-\mathbf{Xb})=\mathbf{Y}'\mathbf{Y} - \mathbf{b}'\mathbf{X}'\mathbf{Y}\]
\[SSR = \mathbf{b}'\mathbf{X}'\mathbf{Y} - (\frac{1}{n})\mathbf{Y}'\mathbf{JY}\] A quadratic form is defined as:
\[\underset{1 \times 1}{\mathbf{Y}'\mathbf{AY}} = \displaystyle\sum_{i=1}^{n}\displaystyle\sum_{j=1}^{n}a_{ij}Y_iY_j\text{, where }a_{ij} = a_{ji}\]
A is a symmetrc n by n matrix and is called the matrix of the quadratic form.
So the sums of squares as quadratic forms as follows:
\[SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]
\[SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\]
\[SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y} \]
Quadratic forms play an important role in statistics because all sums of squares in the analysis of variance for linear statistical models can be expressed as quadratic forms.
6.13 Inferences in Regeression Analysis
- Regression Coefficients
The variance-covariance matrix of b:
\[\sigma^2(\mathbf{b}) = \begin{bmatrix} \sigma^2(b_0) & \sigma(b_0,b_1) \\ \sigma(b_0,b_1) & \sigma^2(b_1) \end{bmatrix} = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1} = \begin{bmatrix} \frac{\sigma^2}{n} + \frac{\sigma^2\bar{X}^2}{\sum(X_i - \bar{X})^2} & \frac{-\bar{X}\sigma^2}{\sum(X_i - \bar{X})^2} \\ \frac{-\bar{X}\sigma^2}{\sum(X_i - \bar{X})^2} & \frac{\sigma^2}{\sum(X_i - \bar{X})^2} \end{bmatrix}\]
And the estimated variance-covariance matrix of b, denoted by \(s^2(\mathbf{b})\):
\[s^2(\mathbf{b}) = MSE \times (\mathbf{X}'\mathbf{X})^{-1}\]
- Mean Response
\[s^2(\hat{Y}_h) = MSE \times (\mathbf{X}_{h}'(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}_h) = MSE \times [\frac{1}{n} + \frac{(X_h - \bar{X})^2}{\sum(X_i - \bar{X})^2}]\]
- Prediction of new observation
\[s^2(pred) = MSE \times (1+\mathbf{X}_{h}'(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}_h)\]
- Proof:
\[\sigma^2(\mathbf{b}) = \sigma^2(\mathbf{(X'X)^{-1}X'Y}) = \mathbf{(X'X)^{-1}X'}\sigma^2(\mathbf{Y})(\mathbf{(X'X)^{-1}X'})' = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1}\]
6.14 R code
- example data
head(trees)
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
fit_lm = lm(trees$"Girth" ~ trees$"Height")
summary(fit_lm)
##
## Call:
## lm(formula = trees$Girth ~ trees$Height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2386 -1.9205 -0.0714 2.7450 4.5384
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.18839 5.96020 -1.038 0.30772
## trees$Height 0.25575 0.07816 3.272 0.00276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.728 on 29 degrees of freedom
## Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445
## F-statistic: 10.71 on 1 and 29 DF, p-value: 0.002758
fitted(fit_lm)
## 1 2 3 4 5 6 7
## 11.713904 10.435169 9.923674 12.225399 14.527123 15.038617 10.690916
## 8 9 10 11 12 13 14
## 12.992640 14.271376 12.992640 14.015628 13.248387 13.248387 11.458157
## 15 16 17 18 19 20 21
## 12.992640 12.736893 15.550111 15.805858 11.969651 10.179422 13.759881
## 22 23 24 25 26 27 28
## 14.271376 12.736893 12.225399 13.504134 14.527123 14.782870 14.271376
## 29 30 31
## 14.271376 14.271376 16.061605
anova(fit_lm)
## Analysis of Variance Table
##
## Response: trees$Girth
## Df Sum Sq Mean Sq F value Pr(>F)
## trees$Height 1 79.665 79.665 10.707 0.002758 **
## Residuals 29 215.772 7.440
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Least squaress estimation
y = trees$"Girth"
x = cbind(1,trees$"Height")
b = solve(t(x) %*% x) %*% t(x) %*% y
b
## [,1]
## [1,] -6.1883945
## [2,] 0.2557471
- fitted values
h = x %*% solve(t(x) %*% x) %*% t(x)
dim(h); h[1:4,1:4]
## [1] 31 31
## [,1] [,2] [,3] [,4]
## [1,] 0.06181471 0.08644526 0.09629747 0.05196250
## [2,] 0.08644526 0.13160125 0.14966365 0.06838286
## [3,] 0.09629747 0.14966365 0.17101012 0.07495100
## [4,] 0.05196250 0.06838286 0.07495100 0.04539435
yhat = h %*% y
head(yhat)
## [,1]
## [1,] 11.713904
## [2,] 10.435169
## [3,] 9.923674
## [4,] 12.225399
## [5,] 14.527123
## [6,] 15.038617
- sum of squares
SSTO = t(y) %*% y - 1 / length(y) * t(y) %*% matrix(1, nrow=length(y), ncol=length(y)) %*% y
SSTO
## [,1]
## [1,] 295.4374