Descriptive statistics

one variable

statistic $X$ $X+c$ $c \times X$ $scale(X) = \frac{X-mean(X)}{SD(X)}$
mean $\frac{\sum{x_i}}{n}$ mean(X) + c $c \times mean(X)$ 0
SS (Sum of Squares) $\sum{(x_i - \bar{x})^2} $
$=\sum{x_i^2} - n\bar{x}^2$
$SS_X$ $c^2 \times SS_X$ n-1
Var (Variance) $\frac{\sum{(x_i - \bar{x})^2}}{n-1}$ Var(X) $c^2 \times Var(X)$ 1
SD (Standard Deviation) $\sqrt{\frac{\sum{(x_i - \bar{x})^2}}{n-1}}$ SD(X) $c \times SD(X)$ 1

two variables

statistic $X,Y$ $X+c,Y$ $c \times X,Y$ $scale(X), scale(Y)$
SS (Sum of Squares) $\sum(x_i-\bar{x})(y_i-\bar{y})$
$= \sum{x_iy_i - n\bar{x}\bar{y}}$
$SS_{XY}$ $c \times SS_{XY}$ $(n-1) \times Cor(X,Y)$
Cov (Covariance) $\frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{n-1}$ Cov(X,Y) $c \times Cov(X,Y)$ Cor(X,Y)
Cor (Correlation) $\frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i-\bar{x})^2 \times \sum(y_i-\bar{y})^2}}$ Cor(X,Y) Cor(X,Y) Cor(X,Y)

Random Variables

expected value or mean:

\[E(X) = \sum x_if(x_i)\]

variance

\[Var(X) = \sum (x_i - \mu)^2 f(x_i)\] \[Var(X)= E((X-E(X))^2) = E(X^2) - (E(X))^2\]

covariance

\[Cov(X,Y) = \sum(x_i - \mu_x)(y_i - \mu_y)f(x_i, y_i)\] \[Cov(X,Y) = E((X-E(X))(Y-E(Y))) = E(XY) - E(X)E(Y)\]

SLR (simple linear regression)

the normal error regression model:

\[Y = \beta_0 + \beta_1X + \epsilon\]

the estimation of regression function:

\[\hat{Y_i} = b_0 + b_1X_i\text{, with }b_1 = \frac{SS_{XY}}{SS_X}, b_0 = \bar{Y} - b_1\bar{X}\]

the sampling distributon

\[b_1 \sim N(\beta_1,\sigma^2(b_1))\text{, so } \frac{b_1 - \beta_1}{\sigma(b_1)} \sim N(0,1)\text{, with }\sigma^2(b_1) = \frac{\sigma^2}{SS_X}\] \[\frac{b_1 - \beta_1}{s(b_1)} \sim t(n-2)\text{, with }s^2(b_1) = \frac{MSE}{SS_X}\]
statistic $X,Y$ $X, scale(Y)$ $scale(X), Y$ $scale(X),scale(Y)$
$b_1$ $\frac{\sum(x_i-\bar{x})(y_i - \bar{y})}{\sum{(x_i-\bar{x})^2}}$ $\frac{b_1}{SD(Y)}$ $b_1SD(X)$ $b_1\frac{SD(X)}{SD(Y)} = Cor(X,Y)$
$\hat{Y_i}$ $b_0+b_1X_i = \bar{y}+\frac{\sum(x_i-\bar{x})(y_i - \bar{y})}{\sum{(x_i-\bar{x})^2}}(x_i - \bar{x})$ $\frac{\hat{y_i}-\bar{y}}{SD(y)}$ $\hat{Y_i}$ $\frac{\hat{y_i}-\bar{y}}{SD(y)}$
SSTO($SS_Y$) $\sum(Y_i - \bar{Y})^2$ $\frac{SSTO}{Var(Y)}$ SSTO $\frac{SSTO}{Var(Y)}$
SSE $\sum(Y_i - \hat{Y_{i}})^2$ $\frac{SSE}{Var(Y)}$ SSE $\frac{SSE}{Var(Y)}$
MSE $\frac{\sum(Y_i - \hat{Y_i})^2}{n-2}$ $\frac{MSE}{Var(Y)}$ MSE $\frac{MSE}{Var(Y)}$
SSR $\sum(\hat{Y_i}-\bar{Y})^2$ $\frac{SSR}{Var(Y)}$ SSR $\frac{SSR}{Var(Y)}$
$R^2$ $1-\frac{SSE}{SSTO} = \frac{SSR}{SSTO} = Cor(X,Y)^2$ $R^2$ $R^2$ $R^2$
$s(b_1)$ $\sqrt{\frac{MSE}{SS_X}}$ $\frac{s(b_1)}{SD(Y)}$ $s(b_1)SD(X)$ $s(b_1)\frac{SD(X)}{SD(Y)}$
t $\frac{b_1}{s(b_1)} = \frac{SS_{XY}}{\sqrt{MSE}\sqrt{SS_X}} = \pm\sqrt{(n-2)\frac{R^2}{1-R^2}}$ t t t

Notes

t2 and R2

$t^2 = \frac{b_1^2}{s(b_1)^2} = \frac{\frac{SS_{XY}^2}{SS_X^2}}{\frac{MSE}{SS_X}} = \frac{SS_{XY}^2}{MSE \times SS_X} = (n-2)\frac{SS_Y}{SSE}R^2 = (n-2)\frac{R^2}{\frac{SSE}{SS_Y}} = (n-2)\frac{R^2}{1-R^2}$

When n is large, t(n-1) –> N(0,1), so t2 –> χ12.

terminology

  • SSTO: total sum of squares
  • SSE: error/residual sum of squares
  • SSR: regression sum of squares
  • MSE: error/residual mean squares
  • b1: the point estimator of β1, effect size
  • s(b1): the standard error of b1, the estimated variance of b1

R code example

lm_example = function(x,y){
  xmin = min(x); xmax = max(x)
  fit_lm = lm(y~x)
  b0 = fit_lm$"coefficient"[1]
  b1 = fit_lm$"coefficient"[2]
  yhat = x*b1 + b0
  yhat1 = yhat[1]
  SSTO = sum((y - mean(y))^2)
  SSE = sum(fit_lm$residuals^2)
  MSE = SSE/(length(y)-2)
  SSR = sum((yhat - mean(y))^2)
  R = cor(x,y)
  Rsq = 1 - SSE/SSTO
  s_b1 = summary(fit_lm)$coefficients[2,2]
  t = summary(fit_lm)$coefficients[2,3]
  p = summary(fit_lm)$coefficients[2,4]
  #plot(x,y)
  #lines(c(xmin, xmax),c(b0+xmin*b1, b0+xmax*b1),col="red")
  data = data.frame(b1,yhat1,SSTO,SSE,MSE,SSR,R,Rsq,s_b1,t)
  return(data)
}
x = trees$Height
y = trees$Girth
data = cbind(t(lm_example(x,y)),t(lm_example(x,scale(y))),t(lm_example(scale(x),y)),t(lm_example(scale(x),scale(y))))
colnames = c("x_y","x_scy","scx_y","scx_scy")
data

##                 x           x           x          x
## b1      0.2557471  0.08149644   1.6295728  0.5192801
## yhat1  11.7139043 -0.48897864  11.7139043 -0.4889786
## SSTO  295.4374194 30.00000000 295.4374194 30.0000000
## SSE   215.7721895 21.91044621 215.7721895 21.9104462
## MSE     7.4404203  0.75553263   7.4404203  0.7555326
## SSR    79.6652299  8.08955379  79.6652299  8.0895538
## R       0.5192801  0.51928007   0.5192801  0.5192801
## Rsq     0.2696518  0.26965179   0.2696518  0.2696518
## s_b1    0.0781583  0.02490594   0.4980101  0.1586960
## t       3.2721686  3.27216859   3.2721686  3.2721686