Because of this bearing on sampling and for other reasons, I became many years ago much interested in the question, and gave to its solution perhaps more labour than it was worth. In books on Medical Statistics the answer to the question is stated in a mathematical formula, called Poisson's formula, which, in a modified form, I shall give further on. But this did not satisfy me, because I wanted to learn what a reasonably safe limit of error actually meant, and this could be best learnt by experiment; so with the help of some friends I went in for a thorough course of penny-tossing.

Tossing a penny twenty times, an average result would be ten heads and ten tails. To find the deviations from this, we tossed two hundred twenties, i.e., four thousand times. Of the two hundred, thirty-three gave the exact average, viz.:—10 heads; sixty-four gave an error of one, viz.:—9 or 11 heads; forty-nine, an error of two; twenty-six, an error of three; twenty, an error of four; eight gave an error of five, and this limit was not exceeded. From these we may say that six is a reasonably safe limit of error. Ninety-seven cases, say one-half, gave an error not exceeding one; and the mean error is 1.8.

In other words, in twenty tosses you will not get more than 16 nor less than 4 heads; you are as likely as not to get 9, 10, or 11 heads; and lastly, if you lost in twenty throws all heads or tails over 10 your average loss would be 1.8 penny, or say roughly 2d. on the twenty throws.

It was necessary to compare these with another series containing a larger average, say that of 100 heads in 200 throws. I confess the labour of tossing pennies two hundred at a time was little to our taste. So from a bag of pennies borrowed from the bank, we weighed out samples containing two hundred, and for an evening we were busy counting heads and tails in these. The heads in sixty samples ranged from 80 to 114. One hundred heads occurred seven times. The extent and frequency of the errors is shown in the table.

Error.No. of Times.Error.No. of Times.Error.No. of Times.
1 8 6 3 11 1
2 5 7 3 14 3
3 6 8 3 15 1
4 3 9 7 18 2
5 6 10 1 20 1

We may call the limit of error 21. Twenty-nine results out of sixty, say one-half, had an error not exceeding 4; and the mean error is 5.6. In comparing these with the series 10 in 20 we must, working by rule, divide not by 10 but by 3.16, the square root of 10; for if we multiply an average by any number[126] the error is also multiplied but only by the square root of the number. The error varies as the square root of the number. Now

21/3.16 = 6.6 = limit of error for 10 in 20.
5.6/3.16 = 1.8 = mean error " " "
4/3.16 = 1.2 = probable error " " "

It will be seen that these calculated results agree fairly well with those actually obtained. The rule by which these calculations are made is important and will bear further illustration. To calculate the number of heads in 3200 throws, we have to find the limit of error on a true average of 1600 in 3200. This being 16 times the average of 100 in 200, the corresponding errors must be multiplied by 4. This gives

21×4 = 84 = limit of error.
5.6×4 = 22.4 = mean error.
4×4 = 16 = probable error.

The results I have actually obtained with these large numbers are hardly enough to base much on, but have a value by way of confirmation. Expecting 1600 heads, the actual numbers were 1560, 1596, 1643, 1557, 1591, 1605, 1615, 1545.