Statistic ONE学习笔记

Bimodal distribution: a non-normal distribution consisting of two modes; two scores (or two ranges of scores) that share the greatest frequency; characterized by two peaks in a histogram.

Classical test theory (also known as true score theory): A theory of measurement that assumes that, for each subject, an observed score (raw score) consists of a true score and error, which may be due to bias and/or chance.

Confound variable: an unknown, unmeasured, or extraneous variable that is correlated with both the dependent variable and the independent variable in experimental research, thus compromising causal arguments.

Correlation (general): refers to any of a broad class of statistical relationships involving dependence between two variables.

Correlation (Pearson-product moment, r): a measure of the relationship between two continuous variables.

Correlation (Phi coefficient): a measure of the relationship between two categorical variables.

Correlation (Point-bi-serial): a measure of the relationship between one continuous variable and one categorical variable.

Correlation (Spearman rank order): a measure of the relationship between two ranked (ordinal) variables.

Construct: an idealized object of investigation that is not directly observable.

Covariance: a measure quantifying the degree to which two variables vary together. Also, unstandardized correlation.

Dependent variable: a variable that represents an aspect of the world that the experimenter predicts will be affected by the independent variable.

Descriptive statistics: procedures used to summarize, organize, and simplify data.

Double blind experiment: an experiment in which neither the experimenter nor the subject knows whether the treatment is experimental or control.

Homoscedasticity: an assumption underlying linear correlation and regression analysis, according to which residuals are orthogonal to the predictor variable (X).

Independent variable: a variable manipulated by the experimenter.

Inferential statistics: procedures that allow for generalizations about population parameters based on sample statistics.

Intercept: In a regression analysis, the predicted score on the outcome variable (Y) when all predictors (X) equal zero. Also known as the regression constant.

Interval variable: a type of variable that is used to not only categorize cases but also to distinguish cases as greater than or less than. Furthermore, all distinctions are equivalent across all possible values.

Leptokurtic distribution: a non-normal distribution with a high kurtosis value; characterized by one high peak in a histogram.

Linear Regression: A statistical procedure used to estimate the relationship between an outcome variable (Y) and one or more predictor variables (X). Simple regression refers to analyses with just one predictor variable whereas multiple regression refers to analyses with more than one predictor variable.

Mean (M = ΣX/N): A measure of central tendency used to describe the center point of a distribution and/or sample (also known as the average).

Mean Squares (MS = SS/N): A measure of variability, more commonly referred to as variance.

Median: A measure of central tendency equivalent to the 50th percentile rank of a distribution/sample. Often preferred when distributions are skewed because it is more resistant to extreme scores than the mean.

Mode: A measure of central tendency; the most frequent score in a distribution/sample.

Negatively skewed distribution: a non-normal distribution consisting of a few or many extreme scores on the negative end of a scale (typically the left side of an x-axis).

Nominal variable: the most basic variable type, used to assign cases to categories.

Normal distribution: also known as a Gaussian distribution, or bell curve, due to greater frequency around the mean and symmetry.

Null Hypothesis Significance Testing (NHST): A form of hypothesis testing that controls the probability of incorrectly deciding that a default position (null hypothesis) is incorrect based on how likely it would be for a set of observations (data) to occur if the null hypothesis were true.

Ordinary Least Squares (OLS): In linear regression analysis, a method for estimating unknown regression coefficients (or parameters), in which the sum of the squared residuals is minimized.

Ordinal variable: a type of variable that is used to not only categorize cases but also to distinguish cases as greater than or less than.

Parameter: a numerical measure that describes a characteristic of a population.

Population: the entire collection of cases to which one attempts to generalize.

Percentile rank: the percentage of scores that fall at or below a given score in a distribution.

Platykurtic distribution: see uniform distribution.

Positively skewed distribution: a non-normal distribution consisting of many or few extreme scores on the positive end of the x-axis

Quasi-independent variable: a variable that resembles an independent variable but is not manipulated by the experimenter.

Ratio variable: a type of of variable with all the qualities of an interval variable but also has a true zero point.

Reliability estimate: a statistic that estimates the consistency of a measurement (methods include test/retest, parallel tests, and inter-item).

Residual: In a regression analysis, it is the prediction error, or the difference between an individual’s score on the outcome variable (Y) and their score predicted by the regression model.

Sample: a subset of the population.

Slope: In a regression analysis, it is the predicted change in Y associated with a one-unit increase in X. Also known as the regression coefficient.

Standard deviation (SD = SQRT(MS)): The square root of variance; or an estimate of the average deviation in a sample.

Statistic: a numerical measure that describes a characteristic of a sample.

Sum of cross products (SP = Σ[(X – MX)*(Y – MY)]): used to calculate the correlation and covariance between two variables, X and Y.

Sum of squares (SS = Σ(X – M)2): The sum of squared deviation scores.

Type I error: In NHST, incorrectly rejecting the null hypothesis when it should have been retained.

Type II error: In NHST, incorrectly retaining the null hypothesis when it should have been rejected.

Uniform distribution: A non-normal distribution in which frequency is nearly equivalent across all possible values. Also known at platykurtic.

Validity (construct): the notion that observations or measurement tools actually represent or measure the construct being investigated.

Validity (content): the notion that the items or devices used to obtain a score on a measure are representative of the underlying construct.

Validity (convergent): the notion that scores on a measure should correlate with scores on other measures used to define the same, or similar, constructs.

Validity (divergent): the notion that scores on a measure should not correlate, or weakly correlate, with scores on measures used to define unrelated constructs.

Validity (nomological): the notion that scores on a measure are consistent with more general theories, including theories from other disciplines of science.

Variance (MS or SD2): A measure of variability, also known as Mean-Squares (MS) and equal to standard deviation (SD) squared.

Z-scale: A universal metric in statistics used to standardize different scales, such that for any metric, M=0 and SD=1.

Z-score (Z = (X-M) / SD): A score on a Z-scale.

注明：

本博客原来地址为：https://sites.google.com/site/zjuwhwsblog/glossary/statisticoneoncoursera
内容来自于Coursera上Princeton大学的《Statistic One》（目前在Coursera上已经下线，视频在Youtube可以找到）