In this lab, we will make sure of the lattice and tidyverse packages.

library(lattice)
library(tidyverse)

Most of us have heard about the Law of Large Numbers (LLN) - in this lab we will study the LLN by thinking about the specific example of coin tossing. After a large number of tosses, the number of heads should be about half the number of tosses. There are two possible intrepretations of this statement:

Interpretation I: The difference between the number of heads in the first $$n$$ tosses and $$n/2$$, will approach 0 as $$n$$ gets large

Interpretation II: the proportion of heads in the first $$n$$ tosses will approach 0.5 as $$n$$ gets large

It’s not hard to see that if Interpretation I is true, so is Interpretation II, but not vice versa (Why?)

Let’s simulate coin tossing using R to test the validity of the two interpretations. We can simulate tossing a fair coin once with the following

coin_outcomes = c("H", "T")
sample(coin_outcomes, size = 1, replace=TRUE)

The vector coin_outcomes can be thought of as a hat with two slips of paper in it: one slip says H and the other says T. The function sample draws one slip from the hat and tells us if it was a head or a tail.

Run the second command listed above several times. Just like when flipping a coin, sometimes you’ll get a heads, sometimes you’ll get a tails.

Now let’s toss the coin many times, say N = 10000, and record the result.

N = 10000
tosses = sample(coin_outcomes, size = N, replace=TRUE)

You can look at the outcome of the first 20 tosses by the command

tosses[1:20]

To compute the number of heads obtained up to the nth toss, for $$n = 1, 2, ..., N$$, we can do the following

numheads = vector("integer",length=N)
for(n in 2:N){
}

In the above, numheads is a vector where the nth element numheads[n] gives the number of heads obtained until the nth toss. The values of numheads are computed iteratively that numheads[n] = numheads[n-1] + 1 if tosses[n] is H, and numheads[n] = numheads[n-1] otherwise. The expression ifelse(tosses[n]=="H", 1, 0) will take value 1 if tosses[n]=="H" is true, and 0 otherwise.

Now we can check the validity Interpretation I by plotting the difference between the number of heads and half the number of tosses against the number of tosses. Does the difference seem to approach 0 as the number of tosses get large?

num_tosses = 1:N
geom="line",
xlab="Number of Tosses",
ylab="Number of Heads Minus\n  Half the Number of Tosses")

The argument geom="line" in qplot askes R to make a line plot.

Next, let’s check Interpretation II by plotting the difference between the proportion of heads and 0.5 against the number of tosses. Does the difference seem to be approaching 0 as the number of tosses increase to a large number?

qplot(num_tosses, numheads/num_tosses - 0.5,
geom="line",
xlab="Number of Tosses",
ylim=c(-0.1,0.1))

In the plot above, the difference in proportion for the first draw is very large (either 0.5 or $$-0.5$$) compared to the differeces for other draws, making it difficult to gauge whether the difference approaches 0 visually. I restrict the y-range of the plot to $$-0.1$$ and 0.1 by the argument ylim=c(-0.1,0.1) so that the plot is not affected by the large differences in the first draw. You will get a warning message saying that a few data points are removed because they are outside of the range $$-0.1$$ and 0.1.

Exercise 1: Repeat the simulation above a couple of times. For every repetition, make a new N = 10000 tosses, recompute numheads for the new sequence of tosses, and make the two plots for checking the validity of the two interpretations. What do you observe and conclude about the validity of the two interpretations?

## Tossing an unfair coin

In the simulation above, you should be able to see that:

Interpretation I of the law of large number is WRONG, but Interpretation II is CORRECT.

What if the coin is unfair? What does the law of large number say about tossing an unfair coin? We can simulate tossing an unfair coin with only 0.2 probability to land heads as follows

N = 10000
tosses = sample(coin_outcomes,
size=N,
prob = c(0.2, 0.8),
replace=TRUE)

Exercise 2: Simulate tossing an unfair coin with only 0.2 chance to land heads 10000 times. Make a plot showing how the difference between the number of heads obtained and 20% the number of tosses changes as the number of tosses goes up. Make another plot showing how the proportion of heads obtained change as the number of tosses goes up. Comment on what you see.