2 Linear regression with one predictor variable

2.1 Relations between variables

relation
- Function relation: Y = f(X), e.g. total cost = the number of products * cost per product
- Statistical relation: not a perfect one, e.g. performance for 10 employees at midyear and year-end
variable
- X: independent/explanatory/predictor variable
- Y: dependent/response variable
plot
- scatter diagram/plot
- each point represents a trial or a case

2.2 Regression Models and Their Uses

History
- Sir Francis Galton in the latter part of 19th century
- relation between heights of parents and children
- regression to the mean
Basic Concepts
- A regression model
- two characters:
  - there is a probability distribution of Y for each level of X
  - The means of these probability distribution vary in some systematic fashion with X
- regression function: the systematic relationship
  - linear, curvilinear, etc.
- regression curve: the graph of the regression function
- probability distributions: symmetrical, skewed etc.
Regression models with more than one predictor variable
Construction of Regression Models
- Selection of predictor variables
- Functional form of regression relation
- Scope of model
Uses of regression analysis
- description
- control
- prediction
- overlap in practice
Regeression and causality
Use of Computers

2.3 Simple linear regression model with distribution of error terms unspecified

Formal statement of model

\[Y_{i} = \beta_{0} + \beta_{1}X_{i} + \varepsilon_{i}\]

where:

\(Y_{i}\) is the value of th response variable in the ith trail
\(\beta_{0}\text{ and }\beta_{1}\) are paramters
\(X_{i}\) is a known constant, namely, the value of the predictor variable in the ith trial
\(\varepsilon_{i}\) is a random error term
- mean \(E(\varepsilon_{i}) = 0\)
- variance \(\sigma^{2}(\varepsilon_{i}) = \sigma^{2}\)
- covariance \(\sigma(\varepsilon_{i}, \varepsilon_{j}) = 0\), for all i, j; \(i \neq j\)
Important features of model

\(Y_{i}\) contains two components: the constant term \(\beta_{0} + \beta_{1}X_{i}\) and the random term \(\varepsilon_{i}\). Hence, \(Y_{i}\) is a random variable
Since \(E(\varepsilon_{i}) = 0\), \(E(Y_{i}) = E(\beta_{0} + \beta_{1}X_{i} + \varepsilon_{i}) = \beta_{0} + \beta_{1}X_{i} + E(\varepsilon_{i}) = \beta_{0} + \beta_{1}X_{i}\)
The response \(Y_{i}\) in the ith trail exceeds or falls short of the value of the regssion fucntion by the error term amount \(\varepsilon_{i}\)
The erorr term \(\varepsilon_{i}\) are assumed to have constant variance \(\sigma^{2}\) and \(\sigma^{2}(Y_{i}) = \sigma^{2}\)
The error terms are assumed to be uncorrelated, so are the responses \(Y_{i}\) and \(Y_{j}\)

Meaning of regression paramters
- regrssion coefficients: the paramters \(\beta_{0}\text{ and }\beta_{1}\)
- the slope of the regression line: \(\beta_{1}\)
Alternative versions of regression model

\[Y_{i} = \beta_{0}X_{0} + \beta_{1}X_{i} + \varepsilon_{i}\text{, where }X_{0} \equiv 1\]

\[Y_{i} = \beta_{0}^{*} + \beta_{1}(X_{i} - \bar{X}) + \varepsilon_{i}\text{, where }\beta_{0}^{*} = \beta_{0} + \beta_{1}\bar{X}\]

2.4 Data for regression analysis

Observational Data
Eperimental Data
- treatment
- experimental units
Completely randomized design

2.5 Overview of steps in regression analysis

2.6 Estimation of regression function

Methods of Least Squares
- To find estimates \(b_{0}\) and \(b_{1}\) for \(\beta_{0}\) and \(\beta_{1}\), respectively, for which Q is a minimum, where \(Q = \displaystyle\sum_{i=1}^{n}(Y_{i} - \beta_{0} - \beta_{1}X_{i})^2\).
- Least Squares Estimators \(b_{0}\) and \(b_{1}\) can be found in two ways:
  - numerical search procedures
  - analytical procedures

\[b_{1} = \frac{\sum(X_{i} - \bar{X})(Y_{i} - \bar{Y})}{\sum(X_{i} - \bar{X})^2}\]

\[b_{0} = \frac{1}{n}(\sum Y_{i} - b_{1} \sum X_{i}) = \bar{Y} - b_{1}\bar{X}\]

Proof

The paritial derivatives are

\[\frac{\partial Q}{\partial\beta_{0}} = -2\sum(Y_{i} - \beta_{0} - \beta_{1}X_{i})\]

\[\frac{\partial Q}{\partial\beta_{1}} = -2\sum X_{i}(Y_{i} - \beta_{0} - \beta_{1}X_{i})\]

We set them equal to zero, using \(b_{0}\) and \(b_{1}\) to denote the particular values of \(b_{0}\) and \(b_{1}\) that minimize Q:

\[-2\sum(Y_{i} - \beta_{0} - \beta_{1}X_{i}) = 0\]

\[-2\sum X_{i}(Y_{i} - \beta_{0} - \beta_{1}X_{i}) = 0\]

Proerties of Least Squares Estimators

Guass-Markov theorem:

Under the conditions of regression model, the least squares estimators b0 and b1 are unbiased and have minimum variance among all unbiased linear estimators

Point Esitmation of Mean Response
- estimate the regression function as follows:
\[\hat{Y} = b_{0} + b_{1}X\]
Residuals
- residual: the differenc between the observed value \(Y_{i}\) and the corresponding fitted value \(\hat{Y_{i}}\)

\[e_{i} = Y_{i} - \hat{Y}_{i}\]

Properties of Fitted Regression Line
- \(\sum e_{i} = 0\)
- \(\sum e_{i}^{2}\) is a minimum
- \(\sum Y_{i} = \sum \hat{Y}_{i}\)
- \(\sum X_{i} e_{i} = 0\)
- \(\sum \hat{Y}_{i}\)e_i = 0
- the regression line always goes through the point \((\bar{X}, \bar{Y})\)

2.7 Estimation of Erro Terms Variance \(\sigma^{2}\)

Point Estimator of \(\sigma^{2}\)
- Single population
  - sum of squares: \(\displaystyle\sum_{i=1}^{n}(Y_{i}-\bar{Y})^2\)
  - degrees of freedom (df): n - 1, because one degree of freedom is lost by using \(\bar{Y}\) as an estimate of the unknown population mean \(\mu\)
  - sample variance/mean square: \(s^2 = \frac{\displaystyle\sum(Y_{i}-\bar{Y})^2}{n-1}\)
- Regression model
  - deviation/residual: \(Y_{i} - \hat{Y}_{i}\) = e_i
  - error/residual sum of squares:
    - \(SSE = \displaystyle\sum_{i=1}^{n}(Y_{i} - \hat{Y}_{i})^{2}\)
    - \(SSE = \displaystyle\sum_{i=1}^{n}e_{i}^{2}\)
  - degrees of freedom: n - 2, because two degrees of freedom are lost due to estimating \(\beta_{0}\) and \(\beta_{1}\) to get \(\hat{Y}_{i}\)
  - MSE (error/residual mean square): \(s^{2} = MSE = \frac{SSE}{n-2}\)
  - \(E(MSE) = \sigma^2\) (Note: \(SSE/\sigma^2 \sim \chi^2(n-2)\)), so MSE is an unbiased estimator of \(\sigma^2\)

2.8 Normal Error Regression Model

Model
- same with simple linear regression model
- except it assumes that the error \(\varepsilon_{i}\) are normally distributed
Estimation of Parameters by Method of Maximum Likelihood

Method of maximum likelihood chooses as estimates those values of the parameters that are most consistent with the sample data.

A normal distribution with SD = 10, mean is unknown
A random of sample n = 3 yields the results 250, 265 and 259
The likelihood value (L) is the product of the densities of the normal distribution

If we assue μ = 230, L(μ = 230) = 0.279*10E-9 
R code: prod(dnorm(c(250,265,259), 230, 10))
If we assue μ = 259, L(μ = 259) = 0.0000354
R code: prod(dnorm(c(250,265,259), 259, 10))

So, L(μ = 259) > L(μ = 230)

The method of maximum likelihood is to estimate prarametes to get maximum L.
It can be shown that for a normal population,
the maximum likelihood estimator of μ is the smaple mean

For regression model, the likelihood function for n observations is

\[L(\beta_{0}, \beta_{1}, \sigma^{2}) = \displaystyle\Pi_{i=1}^{n}\frac{1}{(2\pi\sigma^{2})^{1/2}}exp[-\frac{1}{2\sigma^{2}}(Y_{i} - \beta_{0} - \beta_{1}X_{i})^{2}]\]

\[= \frac{1}{(2\pi\sigma^{2})^{1/2}}exp[-\frac{1}{2\sigma^{2}}\displaystyle\sum_{i=1}^{n}(Y_{i} - \beta_{0} - \beta_{1}X_{i})^{2}]\]

So we can get the maximu likelihood estimator:

Parameter	Maximum Likelihood Estimator
\(\beta_0\)	\(\hat{\beta_0} = b_0\)
\(\beta_1\)	\(\hat{\beta_1} = b_1\)
\(\sigma^2\)	\(\hat{\sigma^2} = \frac{\sum(Y_i-\hat{Y_i})^2}{n}\)

The maximum likelihood estimator \(\sigma^2\) is biased. The unbiased one is MSE or \(s^2\).

\[s^2 = MSE = \frac{n}{n-2}\hat{\sigma^2}\]