1 Probability Theory

1.1 Set Theory

Definition 1.1.1:

Sample space: The set, S, of all possible outcomes of a particular experiment

countable
uncountable

Definition 1.1.2:

Event: any collection of possible outcomes of an experiment, that is, any subset of S (including S itself).

Union: The union of A and B, written \(A \cup B\), is the set of elements tat belong to either A or B or both. \(A \cup B= \{ x: x\in A\text{ or }x \in B \}\)
Intersection: The intersection of A and B, written \(A \cap B\), is the set of elements that belong to both A and B. \(A\cap B=\{x: x \in A\text{ and }x \in B\}\)
Complementation: The complement of A, written \(A^c\), is the set of all elements that are not in A. \(A^c = \{x: x\notin A\}\)

Theorem 1.1.4:

For any three events, A, B and C, defined on a sample space S,

Commutativity: \(A \cup B = B \cup A\); \(A \cap B = B \cap A\)
Associativity: \(A \cup (B \cup C) = (A \cup B) \cup C\); \(A \cap (B \cap C) = (A \cap B) \cap C\)
Distributive Laws: \(A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\); \(A \cup (B \cap C) = (A \cup B) \cap (A \cup C)\)
DeMorgan’s Laws: \((A \cup B)^c = A^c \cap B^c\); \((A \cap B)^c = A^c \cup B^c\)

Definition 1.1.5:

Two events A and B are disjoint (or mutually exclusive) if \(A \cap B = \emptyset\). The events \(A_1, A_2, ...\) are pairwise disjoint (or mutually exclusive) if \(A_i \cap A_j = \emptyset\text{ for all }a \neq b\).

One pairwise disjoint example: \(A_i = [i, i+1), i =0,1,2,...\).

Definition 1.1.6:

If A_1, A_2, … are pairwise disjoint and \(\cup _{i=0}^{\infty} A_i = [0,\infty) = S\), then the collection of \(A_1, A_2, ...\) forms a partition of S.

1.2 Basics of Probability Theory

1.2.1 Axiomatic（公理的） Foundations

Definition 1.2.1:

A collection of subsets of S is called sigma algebra (or Borel field), denoted by \(\mathcal{B}\), if it satisfies the following three properties:

\(\emptyset \in \mathcal{B}\) (the empty set is an element of \(\mathcal{B}\))
If \(A \in \mathcal{B}\), then \(A^c \in \mathcal{B}\) (\(\mathcal{B}\) is closed under complementation)
If \(A_1, A_2, ... \in \mathcal{B}, then \cup^{\infty}_{i=1}A_i \in \mathcal{B}\) (\(\mathcal{B}\) is closed under coutable unions.)

Example 1.2.2 (Sigma algebra-I):

If S is finite or countable, then these techinicalities really do not arise, for we define for a given sample space S,

\[\mathcal{B} = \text{{all subsets of S, including S itself}}\]

If S has n elements, there are \(2^n\) sets in \(\mathcal{B}\). For example, if S={1,2,3}, then \(\mathcal{B}\) = {{1}, {2}, {3}, {1,2},{2,3},{1,3},{1,2,3},\(\emptyset\)}.

Definition 1.2.4 (Kolmogorov Axioms):

Given a sample space S and an associated sigma algebra \(\mathcal{B}\), a probability function is a function P with domain \(\mathcal{B}\) that satisfies

\(P(A) \geq 0\text{ for all } A \in \mathcal{B}\)
P(S) = 1
If \(A_1, A_2, ... \in \mathcal{B}\) are pairwise disjoint, then \(P(\cup^{\infty}_{i=1}A_i) = \sum_{i=1}^{\infty}P(A_i)\)

1.2.2 The Calculus of Probabilities

Theorem 1.2.8

If P is a probability function and A is any set in \(\mathcal{B}\), then

\(P(\emptyset)=0\)
\(P(A) \leq 1\)
\(P(A^c) = 1- P(A)\)

Theorem 1.2.9

If P is a probability function and A and B are any sets in \(\mathcal{B}\), then

\(P(B \cap A^c) = P(B) -P(A\cap B)\)
\(P(A \cup B) = P(A) + P(B) - P(A \cap B )\)
If \(A \subset B\), then P(A) \(\leq\) P(B)

Bonferroni’s Inequality: From the second formula, we can get

\[P(A \cap B) \geq P(A) + P(B) - 1\] Theorem 1.2.11

If P is a probability function, then

\(P(A) = \sum_{i=1}^\infty P(A \cap C_i)\) for any partition \(C_1, C_2, ...\)
\(P(\cup){i=1}^\infty \leq \sum_{i=1}^\infty P(A_i)\) for any sets \(A_i, A_2, ...\) (Boole’s Inequality)

1.2.3 Counting

Theorem 1.2.14

If a job consistes of k separate tasksm the ith of which can be done in \(n_i\) ways, i =1,…,k, then the entire job can be done in \(n_1 \times n_2 \times ...\times n_k\) ways

Definition 1.2.16

For a positive integer n, n! (read n factorial) is the product of all of the positive integers less than or equal to n. That is

\[n! = n \times (n-1) \times (n-2) \times ... \times 1\]

Furthermore, we define 0! = 1

Definition 1.2.17

For nonnegative integers n and r, wher n \(\geq\) r, we define the symbol \(\binom{r}{n}\), read n choos r, as

\[\binom{r}{n} = \frac{n!}{r!(n-r)!}\]

1.2.4 Enumerating Outcomes

\[P(A) = \displaystyle\sum_{s_i \in A} P(\{s_i\}) = \displaystyle\sum_{s_i \in A}\frac{1}{N} = \frac{\text{# of elements in A}}{\text{# of elements in S}}\]

1.3 Conditional Probability and Independence

When we update the sample space on new information, we want to be able to update probability calculations or to calculate conditional probabilities.

Definition 1.3.2

If A and B are events in S, and P(B) > 0, then the conditional probability of A given B, written P(A|B), is

\[P(A|B) = \frac{P(A\cap B)}{P(B)}\]

Theorem 1.3.5 (Bayes’ Rule)

Let \(A_1, A_2, ...\) be a partition of the sample space, and let B be any set. Then, for each i = 1,2,…,

\[P(A_i|B) = \frac{P(B|A_i)P(A_i)}{\sum_{j=1}^{\infty} P(B|A_j)P(A_j)}\]

Definition 1.3.7

Two events, A and B, are statistically independent if

\[P(A\cap B) = P(A) P(B)\]

Theorem 1.3.9

If A and B are independent events, then the following pairs are also independent:

A and \(B^c\)
\(A^c\) and B
\(A^c\) and \(B^c\)

Definition 1.3.12

A collection of events \(A_1, A_2, ..., A_n\) are mutually independent if for any subcollection \(A_{i1}, ...., A_{ik}\), we have

\[P(\cap^k_{j=1}A_{ij}) = \prod_{j=1}^{k}P(A_{ij})\]

1.4 Random Variables

Definition 1.4.1

A random variable is a function from a sample space S into the real numbers.

Example 1.4.2 (Random Variables)

Examples of random variables

Experiment	Random variable
Toss two dice	X = sum of the numbers
Toss a coin 25 times	X = number of heads in 25 tosses
Apply different amouts of fertilizer to corn plants	X = yield/acre

1.5 Distribution Functions

Definition 1.5.1

The cumulative distribution function or cdf of a random variable X, denoted by \(F_X(x)\), is defined by

\[F_X(x) = P_X(X \leq x)\], for all x.

Theorem 1.5.3

The function F(x) is a cdf if and only if the following three conditions hold:

\(lim_{x \to -\infty}F(x) = 0\) and \(lim_{x \to \infty}F(x) = 1\)
F(x) is a nondecreasing function of x
F(x) is right-continuous, that is, for every number \(x_0\), \(lim_{x \downarrow x_0}F(x) = F(x_0)\)

Definition 1.5.7

A random variable X is continuous if \(F_X(x)\) is a continuous function of x. A random variable X is discrete if \(F_X(x)\) is a step function of x.

Definition 1.5.8

The random variables X and Y are identically distributed if, for every set \(A \in \mathcal{B}^1, P(X \in A) = P(Y \in A)\)

Theorem 1.5.10

The following two statments are equivalent:

The random variables X and Y are identically distributed
\(F_X(x) = F_Y(x)\) for every x

1.6 Density and Mass Functions

Definition 1.6.1

The probability mass function (pmf) of a discrete random variable X is given by

\[f_X(x) = P(X=x)\] for all x

Definition 1.6.3

The probability density function, or pdf, \(f_X(x)\), of a continuous random variable X is the function that satisfies

\[F_X(x) = \int_{-\infty}^{x} f_X(t) dt\] for all x

Theorem 1.6.5

A function \(f_X(x)\) is a pdf (or pmf) of a random variable X if and only if

\(f_X(x) \geq 0\) for all x
\(\sum_x f_X(x) = 1 (pmf)\) or \(\int_{-\infty}^{\infty}f_X(x) dx = 1 (pdf)\)