pn npnq... npqn-1 qn
n + 1
Some of the general properties of this distribution are used frequently and are worth knowing. It can be seen intuitively, and it can be shown, that the average m is given by np. It can also be shown that the variance of the mean, a2 = npq, and that r/n has a mean of p and a variance of pq/n.
Familiarity with the basic idea of the binomial distribution, in practice, rests on the performance of numerical calculations for P^. Suppose that the proportion p of a discrete individual character in a given population is known and we wish to examine the distribution of the character on taking repeated samples of a given size from the population. This is given directly by P^ (Equation 5.6), which represents the proportion over the long run of p individuals that bear the character in samples of a total of n individuals.
Table 5.7 shows values of P^ computed for two values of p with n = 20. The frequency distributions for these special cases are shown in Figure 5.6. Readers who are less familiar with the binomial distribution may wish to satisfy themselves of the validity of values tabulated for individual probabilities by computing P20 for a few specific r values.
The p value has an important effect on the distribution. Consider the case for p = 0.1. This means that in repeated samples of 20, it would be expected, on average, that 10%, or 2 of the 20 individuals drawn, would bear the character of interest, and 18 would not. However, the binomial theorem predicts that even though samples with the average number (r = 2) have the highest probability, only 29% may be expected to have exactly that number. About 12% of the samples are expected to have no individuals who bear the character, and summing of P20 from r = 3 to r = 20 shows that about 32% of the samples will have more than the average number.
For p = 0.7 and n = 20 (Table 5.8 and Figure 5.6), the long run proportion (average number) of individuals who bear the character is 14 (i.e., m = np), but as noted above for p = 0.1, the expected variation about the mean is substantial. For p = 0.7 and n = 20, a2 = 2.94. An estimate of r/n is given by p ± Hpq/n = 0.7 ± 0.10.
It is evident that the application of statistical formulas to real situations involves some assumptions. In the cases given above, the most important assumption is that each sample is randomly drawn with respect to the character of interest from the population under investigation.
Was this article helpful?