6 Matrix Approach to Simple Linear Regression Analysis

6.1 Matrices

Definition
- matrix
- elements
- dimension
Notation: a boldface symbol, such as A, X or Z.
Square Matrix: the number of rows equals the number of columns
Vector
- column vector/vector: only one column matrix
- row vector
Transpose: A’ (A prime) is the transpose of a matrix A
Equality of Matrices: same dimension and all same corresponding elements
design matrix

6.2 Matrix Addition and Subtraction

same dimension
the sum or difference of the corresponding elements of the two matrixs
A + B = B + A

6.3 Matrix Multiplication

Multiplication of a Matrix by a scalar
- a scalar is an ordinary number or a symbol representing a number
Multiplication of a Matrix by a Matrix
- the product AB, we say that A is postmultiplied by B or B is premultiplied by A
- \(AB \neq BA\)

In general, if A has dimension r * c and B has dimension c * s, the product AB is a matrix of dimension r * s, which is

\[AB_{r \times s} = \begin{bmatrix}\sum_{k=1}^{c} a_{ik}b_{kj}\end{bmatrix}\text{, where }i=1,...,r;j=1,...,s\]

Regression Examples

\[Y'Y_{1 \times 1} = \begin{bmatrix} Y_{1} & Y_{2} & ... & Y_{n} \end{bmatrix}\begin{bmatrix}Y_{1} \\ Y_{2} \\ ... \\ Y_{n} \end{bmatrix} = Y_{1}^{2} + Y_{2}^{2} + ... + Y_{n}^{2} = \sum Y_{i}^{2}\]

\[X'X_{2 \times 2} = \begin{bmatrix} 1 & 1 & ... & 1 \\ X_{1} & X_{2} & ... & X_{n} \end{bmatrix}\begin{bmatrix} 1 & X_{1} \\ 1 & X_{2} \\ ... \\ 1 & X_{n} \end{bmatrix} = \begin{bmatrix} n & \sum X_{i} \\ \sum X_{i} & \sum X_{i}^{2} \end{bmatrix}\]

\[X'Y_{2 \times 1} = \begin{bmatrix} 1 & 1 & ... & 1 \\ X_{1} & X_{2} & ... & X_{n} \end{bmatrix}\begin{bmatrix} Y_{1} \\ Y_{2} \\ ... \\ Y_{n} \end{bmatrix} = \begin{bmatrix} \sum Y_{i} \\ \sum X_{i}Y_{i} \end{bmatrix}\]

6.4 Special Types of Matrices

Symmetric Matrix: A = A’
- Symmetric matrix is necessarily square
- premultiply a matrix by its transpose, say X’X is symmetric
Diagonal Matrix: off-diagonal elements are all zeros
Identity Matrix, denoted by I: a diagonal matrix whose elements on the main diagonal are all 1s.
- AI = IA = A, \(A, I \in \mathbb{R}^{r \times r}\)
Scalar Matrix: a diagonal matrix whose main-diagonal elements are the same, can be expressed as kI
Vector and matrix with all elements unity
- a column vector with all elements 1 will be denoted by 1
- a square matrix with all elements 1 will be denoted by J
- 1’1 = n
- 11’ = J
Zero Vector: a vector containing only zeros, denoted by 0

6.5 Linear Dependence and Rank of Matrix

Linear dependence

We define the set of c column vectors \(C_{1}, ..., C_{c}\) in an r \(\times\) c matrix to be linearly dependent if one vector can be expressed as a linear combination of others. If no vector in the set can be so expressed, we define the set of vectors to be linearly independent.

In a more general, when c scalars \(k_{1},...,k_{c}\), not all zero, can be found such that:

\[k_{1}\mathbf{C_{1}} + k_{2}\mathbf{C_{2}} + ... + k_{c}\mathbf{C_{c}} = \mathbf{0}\]

where 0 denotes the zero column vector, the c column vectors are linearly dependent. If the only set of scalars for which the equality holds is \(k_{1} = k_{2} = ... = k_{c} = 0\), the set of c column vectors is linearly independent.

Rank of Matrix: the maximum number of linearly independent columns in the matrix
- the rank of r \(\times\) c matrix cannot exceed min(r, c)
- When a matrix is the product of two matrixs, its rank cannot exceed the smaller of the two ranks for the matrices being multiplied.

6.6 Inverse of a Matrix

The inverse of a matrix \(\mathbf{A}\) is another matrix, denoted by \(\mathbf{A^{-1}}\), such that:

\[\mathbf{A}^{-1}\mathbf{A} = \mathbf{AA}^{-1} = \mathbf{I}\]

where I is the identity matrix.

the inverse of a matrix is defined only for square matrix
many square matrix do not have inverse
the inverse of a square matrix, if exits, is unique

Finding the inverse

An inverse of a square \(r \times r\) matrix exists if the rank of the matrix is r. Such a matrix is said to be nonsingular or of full rank.
An \(r \times r\) matrix with rank less than r is said to be singular or not of full rank, and does not have an inverse.
The inverse of an \(r \times r\) matrix of full rank also has rank r.

If:

\[\mathbf{A}_{2 \times 2} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\]

then:

\[\mathbf{A}_{2 \times 2}^{-1} = \begin{bmatrix} \frac{d}{D} & \frac{-b}{D} \\ \frac{-c}{D} & \frac{a}{D} \end{bmatrix}\]

where:

\[D = ad - bc\]

D is called the determinant(行列式) of the matrix A. If A were singular, its determinant would equal zero and no inverse of A would exist.

Regression Example

\[\mathbf{X}'\mathbf{X}_{2 \times 2} = \begin{bmatrix} n & \sum X_{i} \\ \sum X_{i} & \sum X_{i}^{2} \end{bmatrix}\]

\[ a = n, b = c = \sum{X_{i}}, d = \sum{X_{i}^{2}} \]

\[ D = n\sum{X_{i}^{2}} - (\sum{X_{i}})^{2} = n\sum{(X_{i} - \bar{X})}^{2}\]

\[(\mathbf{X}'\mathbf{X})_{2 \times 2}^{-1} = \begin{bmatrix} \frac{1}{n} + \frac{\bar{X}^2}{\sum(X_{i} - \bar{X})^{2}} & \frac{-\bar{X}}{\sum{(X_{i} - \bar{X})^2}} \\ \frac{-\bar{X}}{\sum{(X_{i} - \bar{X})^2}} & \frac{1}{\sum{(X_{i} - \bar{X})^2}} \end{bmatrix}\]

6.7 Some Basic Results for Matrics

A + B = B + A
(A + B) + C = A + (B + C)
(AB)C = A(BC)
C(A + B) = CA + CB
k(A + B) = kA + kB
(A’)’ = A
(A + B)‘= A’ + B’
(AB)‘= B’A’
(ABC)‘= C’B’A’
\((AB)^{-1} = B^{-1}A^{-1}\)
\((ABC)^{-1} = C^{-1}B^{-1}A^{-1}\)
\((A^{-1})^{-1} = A\)
\((A')^{-1} = (A^{-1})'\)

6.8 Random Vectors and Matrices

A random vector or matrix contains elements that are random variables.
Expectation of random vector or matrix:

\[\mathbf{Y}_{3 \times 1} = \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \end{bmatrix}\text{, and } \mathbf{E(Y)}_{3 \times 1 } = \begin{bmatrix} E(Y_1) \\ E(Y_2) \\ E(Y_3) \end{bmatrix}\]

Variance-Covariance Matrix of Random Vector

\[\sigma^2(\mathbf{Y})_{n \times n} = \begin{bmatrix} \sigma^2(Y_1) & \sigma(Y_1,Y_2) & ... & \sigma(Y_1, Y_n) \\ \sigma(Y_2,Y_1) & \sigma^2(Y_2) & ... & \sigma(Y_2, Y_n) \\ ... & ... & ... & ...\\ \sigma(Y_n, Y_1) & \sigma(Y_n, Y_2) & ... & \sigma^2(Y_n) \end{bmatrix}\]

which is a symmetric matrix.

Some Basic Rules

\[\mathbf{W} = \mathbf{AY}\],

which W and Y are two random vectors and A is a constant matrix

\[E(\mathbf{A}) = \mathbf{A}\]

\[E(\mathbf{W}) = E(\mathbf{AY}) = \mathbf{A}E(\mathbf{Y})\]

\[\sigma^2(\mathbf{W}) = \sigma^2(\mathbf{AY}) = \mathbf{A}\sigma^2(\mathbf{Y})\mathbf{A'}\]

Multivariate Normal Distribution

6.9 Simple Linear Regression Model in Matrix Terms

The normal error regression model in matrix terms is:

\[\underset{n \times 1}{\mathbf{Y}} = \underset{n \times 2}{\mathbf{X}}\underset{2 \times 1}{\boldsymbol{\beta}} + \underset{n \times 1}{\boldsymbol{\varepsilon}}\] , where

\[\underset{n \times 1}{\mathbf{Y}} = \begin{bmatrix}y_1 \\ y_2 \\ ... \\y_n\end{bmatrix}, \underset{n \times 2}{\mathbf{X}} = \begin{bmatrix}1 & x_1 \\ 1 & x_2 \\ ... & ... \\ 1 & x_n \end{bmatrix}, \underset{2 \times 1}{\boldsymbol{\beta}} = \begin{bmatrix}\beta_0 \\ \beta_1 \end{bmatrix} and \underset{n \times 1}{\boldsymbol{\varepsilon}} = \begin{bmatrix}\varepsilon_1 \\ \varepsilon_2 \\ ... \\\varepsilon_n\end{bmatrix}\]

6.10 Leasst Squares Estimation of Regression Parameters

\[\mathbf{b} = (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y},\]

where b is the vector of the least squares regression coefficients:

\[ \mathbf{b} = \begin{bmatrix} b_0 \\ b_1 \end{bmatrix} \]

Note that, the inverse is only valid for square matrix and \(\mathbf{X}'\mathbf{X}\) is definitely a square matrix.

6.11 Fitted Values and Residuals

\[\hat{\mathbf{Y}} = \mathbf{X}\mathbf{b} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y}\]

\[\hat{\mathbf{Y}} = \mathbf{H}\mathbf{Y}\text{, with }\underset{n \times n}{\mathbf{H}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\]

The matrix H is called hat matrix. And it’s symmetric and has the special property (called idempotency):

\[\mathbf{HH} = \mathbf{H}\]

\[\mathbf{e} = \mathbf{Y} - \mathbf{\hat{Y}} = \mathbf{Y} - \mathbf{HY} = (\mathbf{I}-\mathbf{H})\mathbf{Y}\]

and the matrix \(\mathbf{I}-\mathbf{H}\), like the matrix \(\mathbf{H}\), is symmetric and idempotent.

The variance-covariance matrix of residuals e:

\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]

and is estimated by:

\[s^2(\mathbf{3}) = MSE \times (\mathbf{I}-\mathbf{H})\]

Proof:

\[\sigma^2(\mathbf{e}) = \sigma^2((\mathbf{I}-\mathbf{H})\mathbf{Y}) = (\mathbf{I}-\mathbf{H})\times \sigma^2(\mathbf{Y}) \times (\mathbf{I}-\mathbf{H})'\]

\[\sigma^2(\mathbf{Y})= \sigma^2 \times \mathbf{I}\]

\[(\mathbf{I}-\mathbf{H})' = (\mathbf{I}-\mathbf{H})\]

\[(\mathbf{I}-\mathbf{H})(\mathbf{I}-\mathbf{H}) = \mathbf{I}-\mathbf{H}\]

\[\sigma^2(\mathbf{e}) = \sigma^2 \times (\mathbf{I}-\mathbf{H})\]

6.12 Analysis of Variance Results

\[SSTO = \sum(Y_i - \bar{Y})^2 = \sum Y_i^2 - \frac{(\sum Y_i)^2}{n} = \mathbf{Y}'\mathbf{Y} - (\frac{1}{n})\mathbf{Y}'\mathbf{JY}\]

\[SSE = \mathbf{e}'\mathbf{e} = (\mathbf{Y}-\mathbf{Xb})'(\mathbf{Y}-\mathbf{Xb})=\mathbf{Y}'\mathbf{Y} - \mathbf{b}'\mathbf{X}'\mathbf{Y}\]

\[SSR = \mathbf{b}'\mathbf{X}'\mathbf{Y} - (\frac{1}{n})\mathbf{Y}'\mathbf{JY}\] A quadratic form is defined as:

\[\underset{1 \times 1}{\mathbf{Y}'\mathbf{AY}} = \displaystyle\sum_{i=1}^{n}\displaystyle\sum_{j=1}^{n}a_{ij}Y_iY_j\text{, where }a_{ij} = a_{ji}\]

A is a symmetrc n by n matrix and is called the matrix of the quadratic form.

So the sums of squares as quadratic forms as follows:

\[SSTO = \mathbf{Y}'[\mathbf{I} - \frac{1}{n}\mathbf{J}]\mathbf{Y}\]

\[SSE = \mathbf{Y}'[\mathbf{I} - \mathbf{H}]\mathbf{Y}\]

\[SSR = \mathbf{Y}'[\mathbf{H} - \frac{1}{n}\mathbf{J}]\mathbf{Y} \]

Quadratic forms play an important role in statistics because all sums of squares in the analysis of variance for linear statistical models can be expressed as quadratic forms.

6.13 Inferences in Regeression Analysis

Regression Coefficients

The variance-covariance matrix of b:

\[\sigma^2(\mathbf{b}) = \begin{bmatrix} \sigma^2(b_0) & \sigma(b_0,b_1) \\ \sigma(b_0,b_1) & \sigma^2(b_1) \end{bmatrix} = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1} = \begin{bmatrix} \frac{\sigma^2}{n} + \frac{\sigma^2\bar{X}^2}{\sum(X_i - \bar{X})^2} & \frac{-\bar{X}\sigma^2}{\sum(X_i - \bar{X})^2} \\ \frac{-\bar{X}\sigma^2}{\sum(X_i - \bar{X})^2} & \frac{\sigma^2}{\sum(X_i - \bar{X})^2} \end{bmatrix}\]

And the estimated variance-covariance matrix of b, denoted by \(s^2(\mathbf{b})\):

\[s^2(\mathbf{b}) = MSE \times (\mathbf{X}'\mathbf{X})^{-1}\]

Mean Response

\[s^2(\hat{Y}_h) = MSE \times (\mathbf{X}_{h}'(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}_h) = MSE \times [\frac{1}{n} + \frac{(X_h - \bar{X})^2}{\sum(X_i - \bar{X})^2}]\]

Prediction of new observation

\[s^2(pred) = MSE \times (1+\mathbf{X}_{h}'(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}_h)\]

Proof:

\[\sigma^2(\mathbf{b}) = \sigma^2(\mathbf{(X'X)^{-1}X'Y}) = \mathbf{(X'X)^{-1}X'}\sigma^2(\mathbf{Y})(\mathbf{(X'X)^{-1}X'})' = \sigma^2 \times (\mathbf{X}'\mathbf{X})^{-1}\]

6.14 R code

example data

head(trees)

##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

fit_lm = lm(trees$"Girth" ~ trees$"Height")
summary(fit_lm)

## 
## Call:
## lm(formula = trees$Girth ~ trees$Height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.2386 -1.9205 -0.0714  2.7450  4.5384 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -6.18839    5.96020  -1.038  0.30772   
## trees$Height  0.25575    0.07816   3.272  0.00276 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.728 on 29 degrees of freedom
## Multiple R-squared:  0.2697, Adjusted R-squared:  0.2445 
## F-statistic: 10.71 on 1 and 29 DF,  p-value: 0.002758

fitted(fit_lm)

##         1         2         3         4         5         6         7 
## 11.713904 10.435169  9.923674 12.225399 14.527123 15.038617 10.690916 
##         8         9        10        11        12        13        14 
## 12.992640 14.271376 12.992640 14.015628 13.248387 13.248387 11.458157 
##        15        16        17        18        19        20        21 
## 12.992640 12.736893 15.550111 15.805858 11.969651 10.179422 13.759881 
##        22        23        24        25        26        27        28 
## 14.271376 12.736893 12.225399 13.504134 14.527123 14.782870 14.271376 
##        29        30        31 
## 14.271376 14.271376 16.061605

anova(fit_lm)

## Analysis of Variance Table
## 
## Response: trees$Girth
##              Df  Sum Sq Mean Sq F value   Pr(>F)   
## trees$Height  1  79.665  79.665  10.707 0.002758 **
## Residuals    29 215.772   7.440                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Least squaress estimation

y = trees$"Girth"
x = cbind(1,trees$"Height")
b = solve(t(x) %*% x) %*% t(x) %*% y
b

##            [,1]
## [1,] -6.1883945
## [2,]  0.2557471

fitted values

h = x %*% solve(t(x) %*% x) %*% t(x)
dim(h); h[1:4,1:4]

## [1] 31 31

##            [,1]       [,2]       [,3]       [,4]
## [1,] 0.06181471 0.08644526 0.09629747 0.05196250
## [2,] 0.08644526 0.13160125 0.14966365 0.06838286
## [3,] 0.09629747 0.14966365 0.17101012 0.07495100
## [4,] 0.05196250 0.06838286 0.07495100 0.04539435

yhat = h %*% y
head(yhat)

##           [,1]
## [1,] 11.713904
## [2,] 10.435169
## [3,]  9.923674
## [4,] 12.225399
## [5,] 14.527123
## [6,] 15.038617

sum of squares

SSTO = t(y) %*% y - 1 / length(y) * t(y) %*% matrix(1, nrow=length(y), ncol=length(y)) %*% y
SSTO

##          [,1]
## [1,] 295.4374