How to Calculate Conditional Probability in R?

Conditional Probability

Conditional probability is a measure of the likelihood of an event occurring, given that another event has already occurred. It allows us to update our beliefs about the probability of an event based on new information. Conditional probability is written as P(B | A), which is read as “the probability of event B given event A.” It can be calculated using the following formula:

P(B | A) = P(A and B) / P(A)

where:

To better illustrate the concept, let’s consider an example:

Suppose we have a deck of 52 playing cards. We know that there are 13 hearts and 12 face cards in the deck. Let’s calculate the conditional probability of drawing a face card, given that the card is a heart.

So, the probability of drawing a face card given that the card is a heart is 3/13 or approximately 0.2308.

Calculate Conditional Probability in R

To calculate conditional probability in R, you can use the prop.table() function. Let’s assume you have a data frame with two variables (or columns) named A and B, and you want to find the conditional probability P(B | A). Here’s how to do it:

 

Here’s a step-by-step example:

# Sample data

data <- data.frame(

A = c("a1", "a1", "a1", "a2", "a2", "a2"),

B = c("b1", "b1", "b2", "b1", "b2", "b2")

)


# Create a contingency table

contingency_table <- table(data$A, data$B)


# Calculate the conditional probability table P(B | A)

conditional_probability_table <-

prop.table(contingency_table, margin = 1)


# Print the conditional probability table

print(conditional_probability_table)

 

The conditional_probability_table variable will now contain the conditional probabilities P(B | A) for all combinations of A and B. The margin = 1 argument in the prop.table() function indicates that the probabilities should be calculated by dividing each cell by the row sums (i.e., the probabilities are conditioned on the first variable, A).

If you want to find a specific conditional probability, like P(B=b1 | A=a1), you can access the corresponding cell in the conditional probability table:

probability_b1_given_a1 <- conditional_probability_table["a1", "b1"]

print(probability_b1_given_a1)

Remember to replace the sample data with your own dataset and variable names.

Example 2 – Cloudy Days

Let’s consider another example of calculating conditional probabilities using R. We’ll work with data related to the likelihood of rain given the presence of clouds.

First, let’s create a simple data frame with the information:

# Data frame with weather information

weather_data <- data.frame(

Cloudy = c("Yes", "Yes", "No", "No"),

Rain = c("Yes", "No", "Yes", "No"),

Frequency = c(30, 20, 10, 40)

)

This table represents the frequency of different weather conditions in a particular region:

Cloudy Rain Frequency

Yes Yes 30

Yes No 20

No Yes 10

No No 40


Now, let’s calculate the conditional probability of rain given the presence of clouds (P(Rain | Cloudy)):

# Total frequency of cloudy days

total_cloudy <-

sum(weather_data$Frequency[weather_data$Cloudy == "Yes"])


# Frequency of rainy days when it's cloudy

rainy_and_cloudy <-

weather_data$Frequency[weather_data$Cloudy == "Yes" &

weather_data$Rain == "Yes"]


# Conditional probability of rain given clouds

P_rain_given_cloudy <- rainy_and_cloudy / total_cloudy

P_rain_given_cloudy

In this example, the total frequency of cloudy days is 50 (30 + 20), and the frequency of rainy days when it’s cloudy is 30. The conditional probability of rain given clouds is 30 / 50 = 0.6 or 60%.

Example 3 – Student Information

Let’s consider another example using conditional probabilities in R. This time, we’ll work with data related to the likelihood of passing an exam given the attendance in a course.

First, let’s create a simple data frame with the information:

# Data frame with student information

student_data <- data.frame(

Attendance = c("High", "High", "Low", "Low"),

Pass = c("Yes", "No", "Yes", "No"),

Frequency = c(80, 20, 30, 70)

)

This table represents the frequency of different student outcomes in a particular course:

Attendance Pass Frequency

High Yes 80

High No 20

Low Yes 30

Low No 70



Now, let’s calculate the conditional probability of passing the exam given high attendance (P(Pass | High Attendance)):

# Total frequency of students with high attendance

total_high_attendance <-

sum(student_data$Frequency[student_data$Attendance == "High"])


# Frequency of students who pass the exam with high attendance

pass_and_high_attendance <-

student_data$Frequency[student_data$Attendance == "High" &

student_data$Pass == "Yes"]


# Conditional probability of passing the exam given high attendance

P_pass_given_high_attendance <-

pass_and_high_attendance / total_high_attendance

P_pass_given_high_attendance


In this example, the total frequency of students with high attendance is 100 (80 + 20), and the frequency of students who pass the exam with high attendance is 80. The conditional probability of passing the exam given high attendance is 80 / 100 = 0.8 or 80%.