Module 11

Probability theory

Introduction

1. Probability

2. Random variable

3. Expected value and variance of one random variable

4. Expected value and variance of combined random variables

Introduction

Probability theory is branch of applied mathematics.
Statistics is built on probability theory, and techniques in data analysis are built on statistics.
Therefore, the basics of probability theory are the building blocks of statistics and data analysis.

1. Probability

It is a number for describing how likely an event happens: The higher the probability of an event, the more likely the event happens.
- An event is defined with respect to a trial (or an experiment), which is a repeatable procedure that leads to possible outcomes that are well-defined.
- E.g., Trial = flip a coin, outcome = the side that faces up when the coin settles, all possible outcomes: {HEAD, TAIL}.
The probability for an event A to happen is defined as the ratio between two numbers: the number of outcomes classified as A : the number of all possible outcomes.
- It can be written as a fraction (e.g., 1/2), as a percentage (e.g., 50%), or, most commonly, as a decimal (e.g., 0.5).
- In this course, we treat probability as the same idea as proportion: “What is the probability for A to happen” means the same thing as “Out of ALL possible outcomes, how many times does A happen?”.
A probability value is defined to be between 0 and 1 (inclusively).
- p(A)=1 means that we are 100% certain that event A is going to happen.
- p(A)=0 means that we are 100% certain that event A is NOT going to happen.
- The probability describing a real-world, random event is between 0 and 1, but will never be equal to 0 or equal to 1.
For any event A, the probability of A NOT happening p(not A) can be computed by subtracting the probability of A happening p(A) from 1, i.e., p(not A)=1-p(A)
Two events, A and B are independent events if and only if p(A and B) = p(A) × p(B)

2. Random variable

In traditional mathematics, variables can take different values, e.g., x = 3, gender = male, year = 3rd, but such values are either fixed or assigned.
In statistics, there is a special kind of variable called random variables.
- Definition: The value of a random variable depends on the outcome of some random event(s) in the world.
- Example 1: Let C be a random variable whose value depends on the outcome of a coin-flipping experiment. If HEAD, then C=1; if TAIL, then C=0.
- Example 2: Let X be a random variable whose value depends on the side of a fair die that faces up after a random rolling. In other words, X can take the following possible values: {1, 2, 3, 4, 5, 6}, each with an equal chance of being assigned to X.
- Example 3: Let S be a random variable whose value depends on the sum of numbers that face up after randomly rolling two fair dice (e.g., in a Monopoly game).
Probability distribution function of a random variable:
- To define a random variable, we need to state clearly every one of its possible values, and the probability associated to each possible value.
- A full table of such "value-probability" mapping is called the probability distribution function of the random variable.
- Very often, we use graphs to represent probability distributions. For example, for rolling a fair die, the probability distribution is can be represented as a graph below

Such random variables are called discrete random variables, i.e., the possible values are not continuous, and there is only a finite number of possible values.
- The probability distribution function of a discrete random variable is called a probability mass function or p.m.f. or PMF.
Random variables are continuous if there are infinitely many possible values.
- Example 1: X = The number of seconds it takes for a person to finish a 100m dash.
- Example 2: H = The body height of a person when he/she is 20 years old.
- Example 3: G = GPA of a student when he/she graduates
Because a continuous random variable has infinitely many possible values, we cannot list them all out. Therefore, we typically use a line graph to represent continuous probability distributions (also known as the probability density function or p.d.f. or PDF).
In the figure below, the triangular graph represents the p.d.f. of X in Example 1 above (the no. of seconds that a person needs to finish a 100m dash).

Probabilities are represented as area under the curve in a p.d.f. The red-shaded region in each of the example graph below illustrates the probability for a person to finish a 100m dash in a specific range of numbers of seconds. E.g., in the bottom-right panel, the red-shaded area represents the probability of a person finishing a 100m dash in within 15-18 seconds (i.e., P(15 ≤ X ≤ 18).

M11_100mdash.png

IMPORTANT: Probability is defined as the area under the p.d.f., but NOT the value on the vertical axis. The vertical axis shows the probability DENSITY, and it has to be multipled by a range of values of the random variable in order to obtain the desired probability!

3. Expected value and variance of one random variable

Expected value
- If you keep repeating the trial that generates a value for the random variable, the "average" of all these values will converge to the expected value of the random variable.
- It is written as a function of the random variable, e.g., the expected value of the random variable X is written as E[X] ( also as E(X) ).
- Conceptually, it is the weighted sum of all possible values of the random variable, weighted by their respective associated probabilities.
- Formally, it is the sum of products between the value and its associated probability across all possible values.
- For example, let X be a random variable that represents your score in a die-rolling game, in which you score 0 points if the outcome is 1, 2, or 3; 9 points if the outcome is 4 or 5, and 18 points if the outcome is 6. Then, the probability mass function of X is as follows:
- [X = 0, P(X=0) = 3/6 = 1/2][X = 9, P(X=9) = 2/6 = 1/3][X = 18, P(X=18) = 1/6]
- The expected value of X is given by E[X] = 0 x (1/2) + 9 x (1/3) + 18 x (1/6) = 0 + 3 + 3 = 6
- Variance
- The variance of a random variable is defined as the expected value of the squared deviation from the mean (where the "mean" is the expected value).
- It is written as VAR(X). Formally, VAR(X) = E[ (X - E[X])² ]
- Variance captures how far away the possible values of a random variable deviate from its expected value. In other words, it measures the variability or dispersion of the possible values of the random variable.
IMPORTANT: Although the value of a random variable depends on the outcome of a trial, which is random and unpredictable, the expected value and the variance of a random variable are constants (i.e., once the random variable is defined, it has only one fixed expected value and one fixed variance)!

4. Expected value and variance of combined random variables

When we use random variables in the real world, we often have to consider combining them and/or with some constants, and then look at the expected values and variances of the resulting random variable.
Let X and Y be any random variables, and a and b be any constants. Below are some useful rules for computing expected values and variances in such cases.
Rules for computing expected values of combined random variables
1. E[X + Y] = E[X] + E[Y]
  (i.e., the expected value of the sum of two random variables is equal to the sum of their respective expected values)
2. E[aX] = a E[X]
  (i.e., the expected value of the product between a constant and a random variable is equal to the product of the constant and the expected value of the random variable)
3. E[a] = a
  (i.e., the expected value of a constant is equal to the constant itself)
4. E[aX + bY] = aE[X] + bE[Y]

Exercise: Can you derive Rule 4 based on the Rules 1-3 above?

Rules for computing variance of combined random variables
1. VAR(a) = 0
  (i.e., the variance of a constant is equal to zero)
2. VAR(aX) = a² VAR(X)
  (i.e., the variance of the product between a constant and a random variable is equal to the square of the constant multipled by the variance of the random variable)
3. In general, for any two random variables X and Y, VAR(aX + bY) = a² VAR(X) + b² VAR(Y) + 2ab COV(X, Y), where COV(X, Y) is the covariance between X and Y.
4. If X and Y are independent, VAR(X + Y)= VAR(X) + VAR(Y) [note: COV(X, Y) = 0 if X and Y are independent]

Exercise: Can you derive Rule 4 using Rule 3 and the fact that X and Y are independent?

Exercise:
Consider the dice roll for the game of Monopoly.
Let X₁ be the random variable of the number that faces up after a fair die roll. Let X₂ be another random variable defined in the same way as X₁, but for another fair die.
We can define a new random variable S, which is the sum of X₁ and X₂, i.e., S = X₁ + X₂, where X₁ and X₂ are independent.
Because X₁ and X₂ are both about the rolling of a die, they have the same expected value and variance.
Given that
E[X₁] = E[X₂] = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5, and
VAR(X₁) = VAR(X₂) = (1-3.5)²(1/6) + (2-3.5)²(1/6) + (3-3.5)²(1/6) + (4-3.5)²(1/6) + (5-3.5)²(1/6) + (6-3.5)²(1/6) = 105 / 36 or around 2.9167,
can you find the expected value and variance of S?
Real-life application in research:
Many measurement instruments in psychology (e.g., depression scores, autistict traits, etc.) are multi-item scales, in which the respondent chooses an answer on a fixed Likert scale in every item.
The final score measured by the scale is often defined as the sum of the score on each of the multiple, (supposedly) independent items.
If we assume every item score to be a random variable, e.g., X₁ for item 1, X₂ for item 2, etc., we can understand the final scale score S = X₁ + X₂ + ... + Xₖ for a k-item scale as a random variable resulting from the combination of all the item-specific random variables.
We can then obtain the expected value and variance of the scale score if we know the expected value and variance of each underlying item!

Page updated

Google Sites

Report abuse