Monte Carlo helps us understand processes that we can describe but don’t yet have analytic solutions for. Here are two examples: the birthday problem and the tasting tea problem.

# Birthday Problem

If you are standing in a room with 25 other people, what is the probability that at least two people share the same birthday? This question has a mathematical solution, but if we don’t know it we can use Monte Carlo to help.

Select 25 people with random birthdays

group_birthdays <- sample(1:365, 15, replace = TRUE)

and then check whether two of them share a birthday.

shared_birthday <- length(group_birthdays[duplicated(group_birthdays)])

# Returns 1 if yes and 0 if no

Now place everything into a loop and evaluate 5000 times for the final Monte Carlo:

group_size <- 15
iterations <- 5000
shared_birthdays_counter <- 0

for(i in 1:iterations){

group_birthdays <- sample(1:365, 15, replace = TRUE)

shared_birthday <- length(group_birthdays[duplicated(group_birthdays)])

if(shared_birthday == 1){

shared_birthdays_counter <- shared_birthdays_counter + 1
}

}

The probability of a shared birthday among a group of 15 is…

shared_birthdays_counter / iterations
## [1] 0.2116

The probability of a shared birthday as we increase group size…

sizes <- 2:25
prob_store <- numeric(length(sizes))

for(j in 1:24){

group_size <- j
iterations <- 5000
shared_birthdays_counter <- 0

for(i in 1:iterations){

group_birthdays <- sample(1:365, group_size, replace = TRUE)

shared_birthday <- length(group_birthdays[duplicated(group_birthdays)])

if(shared_birthday == 1){

shared_birthdays_counter <- shared_birthdays_counter + 1
}

}

prob_store[j] <- shared_birthdays_counter / iterations

}

df <- data.frame(
'group_size' = c(2:25),
'probability' = c(prob_store)
)

library(ggplot2)

plot1 <- ggplot(df, aes(x = group_size, y = probability)) +
geom_bar(stat = 'identity', color = 'orange')

plot1

The equation to solve the birthday problem is

$$$n! / (n - k)!$$$

where $$n$$ is the number of possible birthdays and $$k$$ is group size. The beauty of Monte Carlo is that we didn’t need the above equation to learn about our shared birthday process.

# Tasting Tea

Imagine that I make one cup of tea with milk and then ask you the following: did I pour the tea or milk first? I repeat this for eight cups of tea. What is the probability that you guess correctly for 3 of the cups? For all 8 cups?

First, we generate truth. For each cup, ‘M’ means I poured milk first and ‘T’ means I poured tea first.

possible_pours <- c(rep('M', 4), rep('T', 4))
true_pours <- sample(possible_pours, size = 8)

# The true first pours

true_pours
## [1] "T" "M" "T" "M" "T" "T" "M" "M"

Then you make a guess for each cup.

guess <- c('M', 'T', 'T', 'M', 'T', 'T', 'M', 'M')

In this case, you guessed that I poured milk first for cup 1 and tea first for cup 2. How many of your guesses are correct?

correct <- sum(true_pours == guess)

correct
## [1] 6

Now we can put all of that into a Monte Carlo loop.

iterations <- 5000
correct_store <- numeric(iterations)

for(i in 1:iterations){

possible_pours <- c(rep('M', 4), rep('T', 4))
true_pours <- sample(possible_pours, size = 8)

guess <- c('M', 'T', 'T', 'M', 'T', 'T', 'M', 'M')

correct <- sum(true_pours == guess)

correct_store[i] <- correct

}

What is the probability of you guessing correctly for 2 cups…6?

prop.table(table(correct_store))
## correct_store
##      0      2      4      6      8
## 0.0154 0.2250 0.5194 0.2270 0.0132

Just like the birthday problem, there are equations that govern this “tea problem.” We don’t know what they are, but we can still learn about the process by using Monte Carlo approximation.

These examples can be found with greater discussion in Quantitative Social Science by Kosuke Imai.

Bo$$^2$$m =)