In this lab, we will make sure of the lattice
and tidyverse
packages.
Most of us have heard about the Law of Large Numbers (LLN) - in this lab we will study the LLN by thinking about the specific example of coin tossing. After a large number of tosses, the number of heads should be about half the number of tosses. There are two possible intrepretations of this statement:
Interpretation I: The difference between the number of heads in the first \(n\) tosses and \(n/2\), will approach 0 as \(n\) gets large
Interpretation II: the proportion of heads in the first \(n\) tosses will approach 0.5 as \(n\) gets large
It’s not hard to see that if Interpretation I is true, so is Interpretation II, but not vice versa (Why?)
Let’s simulate coin tossing using R to test the validity of the two interpretations. We can simulate tossing a fair coin once with the following
The vector coin_outcomes
can be thought of as a hat with two slips of paper in it: one slip says H
and the other says T
. The function sample
draws one slip from the hat and tells us if it was a head or a tail.
Run the second command listed above several times. Just like when flipping a coin, sometimes you’ll get a heads, sometimes you’ll get a tails.
Now let’s toss the coin many times, say N = 10000
, and record the result.
You can look at the outcome of the first 20 tosses by the command
To compute the number of heads obtained up to the nth toss, for \(n = 1, 2, ..., N\), we can do the following
numheads = vector("integer",length=N)
numheads[1] = ifelse(tosses[1]=="H", 1, 0)
for(n in 2:N){
numheads[n] = numheads[n-1] + ifelse(tosses[n]=="H", 1, 0)
}
In the above, numheads
is a vector where the nth element numheads[n]
gives the number of heads obtained until the nth toss. The values of numheads
are computed iteratively that numheads[n] = numheads[n-1] + 1
if tosses[n]
is H
, and numheads[n] = numheads[n-1]
otherwise. The expression ifelse(tosses[n]=="H", 1, 0)
will take value 1 if tosses[n]=="H"
is true, and 0 otherwise.
Now we can check the validity Interpretation I by plotting the difference between the number of heads and half the number of tosses against the number of tosses. Does the difference seem to approach 0 as the number of tosses get large?
num_tosses = 1:N
qplot(num_tosses, numheads - num_tosses/2,
geom="line",
xlab="Number of Tosses",
ylab="Number of Heads Minus\n Half the Number of Tosses")
The argument geom="line"
in qplot
askes R to make a line plot.
Next, let’s check Interpretation II by plotting the difference between the proportion of heads and 0.5 against the number of tosses. Does the difference seem to be approaching 0 as the number of tosses increase to a large number?
qplot(num_tosses, numheads/num_tosses - 0.5,
geom="line",
xlab="Number of Tosses",
ylab="Proportion of Heads Minus 0.5",
ylim=c(-0.1,0.1))
In the plot above, the difference in proportion for the first draw is very large (either 0.5 or \(-0.5\)) compared to the differeces for other draws, making it difficult to gauge whether the difference approaches 0 visually. I restrict the y-range of the plot to \(-0.1\) and 0.1 by the argument ylim=c(-0.1,0.1)
so that the plot is not affected by the large differences in the first draw. You will get a warning message saying that a few data points are removed because they are outside of the range \(-0.1\) and 0.1.
Exercise 1: Repeat the simulation above a couple of times. For every repetition, make a new N = 10000 tosses, recompute numheads
for the new sequence of tosses, and make the two plots for checking the validity of the two interpretations. What do you observe and conclude about the validity of the two interpretations?
In the simulation above, you should be able to see that:
Interpretation I of the law of large number is WRONG, but Interpretation II is CORRECT.
What if the coin is unfair? What does the law of large number say about tossing an unfair coin? We can simulate tossing an unfair coin with only 0.2 probability to land heads as follows
Exercise 2: Simulate tossing an unfair coin with only 0.2 chance to land heads 10000 times. Make a plot showing how the difference between the number of heads obtained and 20% the number of tosses changes as the number of tosses goes up. Make another plot showing how the proportion of heads obtained change as the number of tosses goes up. Comment on what you see.
This lab was expanded for STAT 220 by Yibi Huang from a lab released from OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel, which originally was based on a lab written by Mark Hansen of UCLA Statistics.
This lab can be shared or edited under a Creative Commons Attribution-ShareAlike 3.0 Unported licence.