Jump to content

The Probable Error of a Mean

From Wikisource
The Probable Error of a Mean (1908)
by William Sealy Gosset

Written under the pseudonym “Student,” and published in Biometrika 6(1). The origin of Student’s t-test.

4046327The Probable Error of a Mean1908William Sealy Gosset

Volume VI
March, 1908
No. 1

Biometrika.


The Probable Error of a Mean.

By Student.

Introduction.

Any experiment may be regarded as forming an individual of a “population” of experiments which might be performed under the same conditions. A series of experiments is a sample drawn from this population.

Now any series of experiments is only of value in so far as it enables us to form a judgment as to the statistical constants of the population to which the experiments belong. In a great number of cases the question finally turns on the value of a mean, either directly, or as the mean difference between the two quantities.

If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty:—(1) owing to the “error of random sampling” the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals. It is usual, however, to assume a normal distribution, because, in a very large number of cases, this gives an approximation so close that a small sample will give no real information as to the manner in which the population deviates from normality: since some law of distribution must be assumed it is better to work with a curve whose area and ordinates are tabled, and whose properties are well known. This assumption is accordingly made in the present paper, so that its conclusions are not strictly applicable to populations known not to be normally distributed; yet it appears probable that the deviation from normality must be very extreme to lead to serious error. We are concerned here solely with the first of these two sources of uncertainty.

The usual method of determining the probability that the mean of the population lies within a given distance of the mean of the sample, is to assume a normal distribution about the mean of the sample with a standard deviation equal to , where is the standard deviation of the sample, and to use the tables of the probability integral.

But, as we decrease the number of experiments, the value of the standard deviation found from the sample of experiments becomes itself subject to an increasing error, until judgments reached in this way may become altogether misleading.

In routine work there are two ways of dealing with this difficulty: (1) an experiment may be repeated many times, until such a long series is obtained that the standard deviation is determined once and for all with sufficient accuracy. This value can then be used for subsequent shorter series of similar experiments. (2) Where experiments are done in duplicate in the natural course of the work, the mean square of the difference between corresponding pairs is equal to the standard deviation of the population multiplied by . We can thus combine together several series of experiments for the purpose of determining the standard deviation. Owing however to secular change, the value obtained is nearly always too low, successive experiments being positively correlated.

There are other experiments, however, which cannot easily be repeated very often; in such cases it is sometimes necessary to judge of the certainty of the results from a very small sample, which itself affords the only indication of the variability. Some chemical, many biological, and most agricultural and large scale experiments belong to this class, which has hitherto been almost outside the range of statistical enquiry.

Again, although it is well known that the method of using the normal curve is only trustworthy when the sample is “large,” no one has yet told us very clearly where the limit between “large” and “small” samples is to be drawn.

The aim of the present paper is to determine the point at which we may use the tables of the probability integral in judging of the significance of the mean of a series of experiments, and to furnish alternative tables for use when the number of experiments is too few.

The paper is divided into the following nine sections:

I. The equation is determined of the curve which represents the frequency distribution of standard deviations of samples drawn from a normal population.

II. There is shown to be no kind of correlation between the mean and the standard deviation of such a sample.

III. The equation is determined of the curve representing the frequency distribution of a quantity , which is obtained by dividing the distance between the mean of a sample and the mean of the population by the standard deviation of the sample.

IV. The curve found in I. is discussed.

V. The curve found in III. is discussed.

VI. The two curves are compared with some actual distributions.

VII. Tables of the curves found in III. are given for samples of different size.

VIII and IX. The tables are explained and some instances are given of their use.

X. Conclusions.

Section I.

Samples of individuals are drawn out of a population distributed normally, to find an equation which shall represent the frequency of the standard deviations of these samples.

If be the standard deviation found from a sample (all these being measured from the mean of the population), then

.

Summing for all samples and dividing by the number of samples we get the mean value of which we will write .

,

where is the second moment coefficient in the original normal distribution of : since , , etc., are not correlated and the distribution is normal, products involving odd powers of vanish on summing, so that is equal to .

If represent the th moment coefficient of the distribution of about the end of the range where ,

.

Again

other terms involving odd powers of , etc., which will vanish on summation.

Now has terms but has , hence summing for all samples and dividing by the number of samples we get

.

Now since the distribution of is normal, , hence

.

In a similar tedious way I find:

,

and

.

The law of formation of these moment coefficients appears to be a simple one, but I have not seen my way to a general proof.

If now be the th moment coefficient of about its mean, we have

,
,
.

Hence

, ,
.

Consequently a curve of Professor Pearson’s type III. may be expected to fit the distribution of .

The equation referred to an origin at the zero end of the curve will be

,

where

,

and

.

Consequently the equation becomes

,

which will give the distribution of .

The area of this curve is (say). The first moment coefficient about the end of the range will therefore be

.

The first part vanishes at each limit and the second is equal to

,

and we see that the higher moment coefficients will be formed by multiplying successively by , , etc., just as appeared to be the law of formation of , , , etc.

Hence it is probable that the curve found represents the theoretical distribution of ; so that although we have no actual proof we shall assume it to do so in what follows.

The distribution of may be found from this, since the frequency of is equal to that of and all that we must do is to compress the base line suitably.

Now if

be the frequency curve of

and

be the frequency curve of ,

then

,

or

,
.

Hence

is the distribution of .

This reduces to

.

Hence will give the frequency distribution of standard deviations of samples of , taken out of a population distributed normally with standard deviation . The constant may be found by equating the area of the curve as follows:—

Area. (Let represent .)

Then

,

since the first part vanishes at both limits.

By continuing this process we find

or

according as is even or odd.

But is

,

and is

.

Hence if be even,

,

and if be odd

.

Hence the equation may be written

( even)

or

( odd)

where as usual represents the total frequency.

Section II.

To show that there is no correlation between (a) the distance of the mean of a sample from the mean of the population and (b) the standard deviation of a sample with normal distribution.

(1) Clearly positive and negative positions of the mean of the sample are equally likely, and hence there cannot be correlation between the absolute value of the distance of the mean from the mean of the population and the standard deviation, but (2) there might be correlation between the square of the distance and the square of the standard deviation.

Let

and .

Then if , be the mean values of and , we have by the preceding part and .

Now

other terms of odd order which will vanish on summation.

Summing for all values and dividing by the number of cases we get

,

where is the correlation between and .

.

Hence or there is no correlation between and .

Section III.

To find the equation representing the frequency distribution of the means of samples of drawn from a normal population, the mean being expressed in terms of the standard deviation of the sample.

We have as the equation representing the distribution of , the standard deviation of a sample of , when the samples are drawn from a normal population with standard deviation .

Now the means of these samples of are distributed according to the equation

[1]

and we have shown that there is no correlation between , the distance of the mean of the sample, and , the standard deviation of the sample.

Now let us suppose measured in terms of , i.e. let us find the distribution of .

If we have and as the equations representing the frequency of and of respectively, then

,
.

Hence

is the equation representing the distribution of for samples of with standard deviation .

Now the chance that lies between and is:

,

which represents the in the above equation.

Hence the distribution of due to values of which lie between and is

,

and summing for all values of we have as an equation giving the distribution of

.

By what we have already proved this reduces to

if be odd,

and to

if be even.

Since this equation is independent of it will give the distribution of the distance of the mean of a sample from the mean of the population expressed in terms of the standard deviation of the sample for any normal population.

Section IV.

Some Properties of the Standard Deviation Frequency Curve.

By a similar method to that adopted for finding the constant we may find the mean and moments: thus the mean is at , which is equal to

(if be even),

or

(if be odd).

The second moment about the end of the range is

.

The third moment about the end of the range is equal to

the mean.

The fourth moment about the end of the range is equal to

.

If we write the distance of the mean from the end of the range and the moments about the end of the range , , etc.

then

, , , .

From this we get the moments about the mean

,
,
.

It is of interest to find out what these become when is large.

In order to do this we must find out what is the value of .

Now Wallis’s expression for derived from the infinite product value of sin is

.

If we assume a quantity which we may add to the in order to make the expression approximate more rapidly to the truth, it is easy to show that etc. and we get

.[2]

From this we find that whether be even or odd approximates to when is large.

Substituting this value of we get

, , .

Consequently the value of the standard deviation of a standard deviation which we have found becomes the same as that found for the normal curve by Professor Pearson () when is large enough to neglect the in comparison with .

Neglecting terms of lower order than we find

, .

Consequently as increases very soon approaches the value of the normal curve, but vanishes more slowly, so that the curve remains slightly skew.

Diagram I. Frequency curve giving the distribution of Standard Deviations of samples of 10 taken from a normal population.

Equation .

Diagram I shows the theoretical distribution of the S.D. found from samples of 10.

.

Section V.

Some properties of the curve .

Writing the equation becomes , which affords an easy way of drawing the curve. Also .

Hence to find the area of the curve between any limits we must find



,

and by continuing the process the integral may be evaluated.

For example, if we wish to find the area between and for we have

area
,

and it will be noticed that for we shall merely have to add to this same expression the term .

The tables at the end of the paper give the area between and

(or and ).

This is the same as the area between , and , and as the whole area of the curve is equal to , the tables give the probability that the mean of the sample does not differ by more than times the standard deviation of the sample from the mean of the population.

The whole area of the curve is equal to

,

and since all the parts between the limits vanish at both limits this reduces to .

Similarly the second moment coefficient is equal to



.

Hence the standard deviation of the curve is . The fourth moment coefficient is equal to



.

The odd moments are of course zero as the curve is symmetrical, so

, .

Hence as increases the curve approaches the normal curve whose standard is .

however is always greater than , indicating that large deviations are more common than in the normal curve.

Diagram II. Solid curve , .

Broken live curve , the normal curve with the same S.D.

Distance of mean from mean of population

I have tabled the area for the normal curve with standard deviation so as to compare with my curve for [3]. It will be seen that odds laid according to either table would not seriously differ till we reach , where the odds are about 50 to 1 that the mean is within that limit: beyond that the normal curve gives a false feeling of security, for example, according to the normal curve it is 99,986 to 14 (say 7000 to 1) that the mean of the population lies between and whereas the real odds are only 99,819 to 181 (about 550 to 1).

Now 50 to 1 corresponds to three times the probable error in the normal curve and for most purposes would be considered significant; for this reason I have only tabled my curves for values of not greater than , but have given the and tables to one further place of decimals. They can be used as foundations for finding values for larger samples[4].

The table for can be readily constructed by looking out in Chambers’ Tables and then gives the corresponding value.

Similarly gives the values when .

There are two points of interest in the curve. Here is equal to half the distance between the two observations. that between and lies or half the probability, i.e. if two observations have been made and we have no other information, it is an even chance that the mean of the (normal) population will lie between them. On the other hand the second moment coefficient is

,

or the standard deviation is infinite while the probable error is finite.

Section VI. Practical Test of the foregoing Equations.

Before I had succeeded in solving my problem analytically, I had endeavoured to do so empirically. The material used was a correlation table containing the height and left middle finger measurements of 3000 criminals, from a paper by W. R. Macdonell (Biometrika, Vol. I. p. 219). The measurements were written out on 3000 pieces of cardboard, which were then very thoroughly shuffled and drawn at random. As each card was drawn its numbers were written down in a book which thus contains the measurements of 3000 criminals in a random order. Finally each consecutive set of 4 was taken as a sample—750 in all—and the mean, standard deviation, and correlation[5] of each sample determined. The difference between the mean of each sample and the mean of the population was then divided by the standard deviation of the sample, giving us the of Section III.

This provides us with two sets of 750 standard deviations and two sets of 750 ’s on which to test the theoretical results arrived at. The height and left middle finger correlation table was chosen because the distribution of both was approximately normal and the correlation was fairly high. Both frequency curves, however, deviate slightly from normality, the constants being for height , , and for left middle finger lengths , , and in consequence there is a tendency for a certain number of larger standard deviations to occur than if the distributions were normal. This, however, appears to make very little difference to the distribution of .

Another thing which interferes with the comparison is the comparatively large groups in which the observations occur. The heights are arranged in 1 inch groups, the standard deviation being only 2.54 inches: while the finger lengths were originally grouped in millimetres, but unfortunately I did not at the time see the importance of having a smaller unit, and condensed them into two millimetre groups, in terms of which the standard deviation is 2.74.

Several curious results follow from taking samples of 4 from material disposed in such wide groups. The following points may be noticed:

(1) The means only occur as multiples of .25.

(2) The standard deviations occur as the square roots of the following types of numbers , , , , , .

(3) A standard deviation belonging to one of these groups can only be associated with a mean of a particular kind; thus a standard deviation of can only occur if the mean differs by a whole number from the group we take as origin, while will only occur when the mean is at .

(4) All the four individuals of the sample will occasionally come from the same group, giving a zero value for the standard deviation. Now this leads to an infinite value of and is clearly due to too wide a grouping, for although two men may have the same height when measured by inches, yet the finer the measurements the more seldom will they be identical, till finally the chance that four men will have exactly the same height is infinitely small. If we had smaller grouping the zero values of the standard deviation might be expected to increase, and a similar consideration will show that the smaller values of the standard deviation would also be likely to increase, such as .436, when 3 fall in one group and 1 in an adjacent group, or .50 when 2 fall in two adjacent groups. On the other hand when the individuals of the sample lie far apart, the argument of Sheppard’s correction will apply, the real value of the standard deviation being more likely to be smaller than that found owing to the frequency in any group being greater on the side nearer the mode.

These two effects of grouping will tend to neutralise each other in their effect on the mean value of the standard deviation, but both will increase the variability.

Accordingly we find that the mean value of the standard deviation is quite close to that calculated, while in each case the variability is sensibly greater. The fit of the curve is not good, both for this reason and because the frequency is not evenly distributed owing to effects (2) and (3) of grouping. On the other hand the fit of the curve giving the frequency of is very good and as that is the only practical point the comparison may be considered satisfactory.

The following are the figures for height:—

Mean value of standard deviations; calculated 2.027 ±.021
Mean value of standard deviations; observed 2.026
Difference= −.001
Standard deviation of standard deviations:—
Calculated .8556 ±.015
Observed .9066
Difference=+ .0510

Comparison of Fit. Theoretical Equation: .

Scale in terms
of standard
deviation of
population
0 to .1 .1 to .2 .2 to .3 .3 to .4 .4 to .5 .5 to .6 .6. to .7 .7 to .8 .8 to .9 .9. to 1.0 1.0 to 1.1 1.1 to 1.2 1.2 to 1.3 1.3 to 1.4 1.4 to 1.5 1.5 to 1.6 1.6 to 1.7 Greater
than 1.7
Calculated
frequency
11/2 101/2 27 451/2 641/2 781/2 87 88 811/2 71 58 45 33 23 15 91/2 51/2 7
Observed
frequency
3 141/2 241/2 371/2 107 67 73 77 771/2 64 521/2 491/2 35 28 121/2 9 111/2 7
Difference +11/2 +4 −21/2 −8 +421/2 −111/2 −14 −11 −4 −7 −51/2 +41/2 +2 +5 −21/2 1/2 +6 0

whence , (about).

In tabling the observed frequency, values between .0125 and .0875 were included in one group, while between .0875 and .0125 they were divided over the two groups. As an instance of the irregularity due to grouping I may mention that there were 31 cases of standard deviations 1.30 (in terms of the grouping) which is .5117 in terms of the standard deviation of the population, and they were therefore divided over the groups .4 to .5 and .5 to .6. Had they all been counted in groups .5 to .6 would have fallen to 29.85 and would have risen to .03. The test presupposes random sampling from a frequency following the given law, but this we have not got owing to the interference of the grouping.

When, however, we test the ’s where the grouping has not had so much effect we find a close correspondence between the theory and the actual result.

There were three cases of infinite values of which, for the reasons given above, were given the next largest values which occurred, namely or . The rest were divided into groups of .1; .04, .05 and .06, being divided between the two groups on either side.

The calculated value for the standard deviation of the frequency curve was while the observed was 1.039. The value of the standard deviation is really infinite, as the fourth moment coefficient is infinite, but as we have arbitrarily limited the infinite cases we may take as an approximation from which the value of the probable error given above is obtained. The fit of the curve is as follows:—

Comparison of Fit. Theoretical Equation: , .

Scale of less than −3.05 3.05 to −2.05 2.05 to −1.55 1.55 to −1.05 1.05 to −.75 .75 to −.45 .45 to −.15 .15 to +.15 +.15 to +.45 +.45 to +.75 +.75 to +1.05 +1.05 to +1.55 +1.55 to +2.05 +2.05 to +3.05 more than +3.05
Calculated
frequency
5 91/2 131/2 341/2 441/2 781/2 119 141 119 781/2 441/2 341/2 131/2 91/2 5
Observed
frequency
9 141/2 111/2 33 431/2 701/2 1191/2 1511/2 122 671/2 49 261/2 16 10 6
Difference +4 +5 −2 −11/2 −1 −8 +1/2 +101/2 +3 −11 +41/2 −8 +21/2 +1/2 +1

whence , .

This is very satisfactory, especially when we consider that as a rule observations are tested against curves fitted from the mean and one or more other moments of the observations, so that considerable correspondence is only to be expected; while this curve is exposed to the full errors of random sampling, its constants having been calculated quite apart from the observations.

Diagram III. Comparison of Calculated Standard Deviation Frequency Curve with 750 actual Standard Deviations.

[[

Scale of Standard Deviation of the Population

The left middle finger samples show much the same features as those of the height, but as the grouping is not so large compared to the variability the curves fit the observations more closely. Diagrams III.[6] and IV. give the standard deviations and the ’s for this set of samples. The results are as follows:—

Diagram IV. Comparison of the theoretical frequency curve , with an actual sample of 750 cases.

Scale of Standard Deviation of the sample

Mean value of standard deviations; calculated 2.186 ±.023
Mean value of standard deviations; observed 2.179
Difference= −.007

Standard deviation of standard deviations:—

Calculated .9224 ±.016
Observed .9802
Difference=+ .0578

Comparison of fit. Theoretical Equation: .

Scale in terms
of standard
deviation of
population
0 to .1 .1 to .2 .2 to .3 .3 to .4 .4 to .5 .5 to .6 .6. to .7 .7 to .8 .8 to .9 .9. to 1.0 1.0 to 1.1 1.1 to 1.2 1.2 to 1.3 1.3 to 1.4 1.4 to 1.5 1.5 to 1.6 1.6 to 1.7 greater
than 1.7
Calculated
frequency
11/2 101/2 27 451/2 641/2 781/2 87 88 811/2 71 58 45 33 23 15 91/2 51/2 7
Observed
frequency
2 14 271/2 51 641/2 91 941/2 681/2 651/2 73 481/2 401/2 421/2 20 221/2 12 5 71/2
Difference +1/2 +31/2 +1/2 +51/2 +121/2 +71/2 −191/2 −16 +2 −91/2 −41/2 +91/2 −3 +71/2 +21/2 1/2 +1/2

whence , .

Calculated value of standard deviation 1 (±.017)
Observed value of standard deviation .982
Difference = −.018

Comparison of Fit. Theoretical Equation: , .

Scale of less than −3.05 3.05 to −2.05 2.05 to −1.55 1.55 to −1.05 1.05 to −.75 .75 to −.45 .45 to −.15 .15 to +.15 +.15 to +.45 +.45 to +.75 +.75 to +1.05 +1.05 to +1.55 +1.55 to +2.05 +2.05 to +3.05 more than +3.05
Calculated
frequency
5 91/2 131/2 341/2 441/2 781/2 119 141 119 781/2 441/2 341/2 131/2 91/2 5
Observed
frequency
4 151/2 18 331/2 44 75 122 138 1201/2 71 461/2 36 11 9 6
Difference −1 +6 +41/2 −1 1/2 −31/2 +3 −3 +11/2 −71/2 +2 +11/2 −21/2 1/2 +1

whence , .

A very close fit.

We see then that if the distribution is approximately normal our theory gives us a satisfactory measure of the certainty to be derived from a small sample in both the cases we have tested; but we have an indication that a fine grouping is of advantage. If the distribution is not normal, the mean and the standard deviation of a sample will be positively correlated, so that although both will have greater variability, yet they will tend to counteract each other, a mean deviating largely from the general mean tending to be divided by a larger standard deviation. Consequently I believe that the tables at the end of the present paper may be used in estimating the degree of certainty arrived at by the mean of a few experiments, in the case of most laboratory or biological work where the distributions are as a rule of a ‘cocked hat’ type and so sufficiently nearly normal.

Section VII. Tables of for values of from to inclusive.

Together with for comparison when .

For comparison
.1 .5633 .5745 .5841 .5928 .6006 .60787 .61462 .60411
.2 .6241 .6458 .6634 .6798 .6936 .70705 .71846 .70159
.3 .6804 .7096 .7340 .7549 .7733 .78961 .80423 .78641
.4 .7309 .7657 .7939 .8175 .8376 .85465 .86970 .85520
.5 .7749 .8131 .8428 .8667 .8863 .90251 .91609 .90691
.6 .8125 .8518 .8813 .9040 .9218 .93600 .94732 .94375
.7 .8440 .8830 .9109 .9314 .9468 .95851 .96747 .96799
.8 .8701 .9076 .9332 .9512 .9640 .97328 .98007 .98253
.9 .8915 .9269 .9498 .9652 .9756 .98279 .98780 .99137
1.0 .9092 .9419 .9622 .9751 .9834 .98890 .99252 .99820
1.1 .9236 .9537 .9714 .9821 .9887 .99280 .99539 .99926
1.2 .9354 .9628 .9782 .9870 .9922 .99528 .99713 .99971
1.3 .9451 .9700 .9832 .9905 .9946 .99688 .99819 .99986
1.4 .9531 .9756 .9870 .9930 .9962 .99791 .99885 .99989
1.5 .9598 .9800 .9899 .9948 .9973 .99859 .99926 .99999
1.6 .9653 .9836 .9920 .9961 .9981 .99903 .99951
1.7 .9699 .9864 .9937 .9970 .9986 .99933 .99968
1.8 .9737 .9886 .9950 .9977 .9990 .99953 .99978
1.9 .9770 .9904 .9959 .9983 .9992 .99967 .99985
2.0 .9797 .9919 .9967 .9986 .9994 .99976 .99990
2.1 .9821 .9931 .9973 .9989 .9996 .99983 .99993
2.2 .9841 .9941 .9978 .9992 .9997 .99987 .99995
2.3 .9858 .9950 .9982 .9993 .9998 .99991 .99996
2.4 .9873 .9957 .9985 .9995 .9998 .99993 .99997
2.5 .9886 .9963 .9987 .9996 .9998 .99995 .99998
2.6 .9898 .9967 .9989 .9996 .9999 .99996 .99999
2.7 .9908 .9972 .9991 .9997 .9999 .99997 .99999
2.8 .9916 .9975 .9992 .9998 .9999 .99998 .99999
2.9 .9924 .9978 .9993 .9998 .9999 .99998 .99999
3.0 .9931 .9981 .9994 .9998 .99999

Section VIII. Explanation of Tables.

The tables give the probability that the value of the mean, measured from the mean of the population, in terms of the standard deviation of the sample, will lie between and . Thus, to take the table for samples of six, the probability of the mean of the population lying between and once the standard deviation of the sample is .9622 or the odds are about 24 to 1 that the mean of the population lies between these limits.

The probability is therefore .0378 that it is greater than once the standard deviation and .0756 that it lies outside ±1.0 times the standard deviation.

Section IX. Illustrations of Method.

Illustration I. As an instance of the kind of use which may be made of the tables, I take the following figures from a table by A. R. Cushny and A. R. Peebles in the Journal of Physiology for 1904, showing the different effects of the optical isomers of hyoscyamine hydrobromide in producing sleep. The sleep of 10 patients was measured without hypnotic and after treatment (1) with D. hyoscyamine hydrobromide, (2) with L. hyoscyamine hydrobromide. The average number of hours’ sleep gained by the use of the drug is tabulated below.

The conclusion arrived at was that in the usual dose 2 was, but 1 was not, of value as a soporific.

Additional hours’ sleep gained by the use of hyoscyamine hydrobromide.

Patient 1 (Dextro-) 2 (Leavo-) Difference (2-1)
01. +0.7 +1.9 +1.2
02. −1.6 +0.8 +2.4
03. 0.2 +1.1 +1.3
04. −1.2 +0.1 +1.3
05. −1.0 0.1 00.0
06. +3.4 +4.4 +1.0
07. +3.7 +5.5 +1.8
08. +0.8 +1.6 +0.8
09. 00.0 +4.6 +4.6
10. +2.0 +3.4 +1.4
Mean+0.75 Mean+2.33 Mean+1.58
S. D.1.70 S. D.1.90 S. D.1.17

First let us see what is the probability that 1 will on the average give increase of sleep; i.e. what is the chance that the mean of the population of which these experiments are a sample is positive. and looking out in the table for ten experiment we find by interpolating between .8697 and .9161 that .44 corresponds to .8873, or the odds are .887 to .113 that the mean is positive.

That is about 8 to 1 and would correspond in the normal curve to about 1.8 times the probable error. It is then very likely that 1 gives an increase of sleep, but would occasion no surprise if the results were reversed by further experiments.

If now we consider the chance that 2 is actually a soporific we have the mean increase of sleep or 1.23 times the S.D. From the table the probability corresponding to this is .9974, i.e. the odds are nearly 400 to 1 that such is the ease. This corresponds to about 415 times the probable error in the normal curve. But I take it the real point of the authors was that 2 is better than 1. This we must test by making a new series, subtracting 1 from 2. The mean value of this series is +1.58 while the S.D. is 1.17, the mean value being +1.35 times the S.D. From the table the probability is .9985 or the odds are about 666 to 1 that 2 is the better soporific. The low value of the S.D. is probably due to the different drugs reacting similarly on the same patient, so that there is correlation between the results.

Of course odds of this kind make it almost certain that 2 is the better soporific, and in practical life such a high probability is in most matters considered as a certainty.

Illustration II. Cases where the tables will be useful are not uncommon in agricultural work, and they would be more numerous if the advantages of being able to apply statistical reasoning were borne in mind when planning the experiments. I take the following instances from the accounts of the Woburn farming experiments published yearly by Dr Voelcker in the Journal of the Agricultural Society.

A short series of pot culture experiments were conducted in order to determine the causes which lead to the production of Hard (glutinous) wheat or Soft (starchy) wheat. In three successive years a bulk of seed corn of one variety was picked over by hand and two samples were selected, one consisting of “hard” grains and the other of “soft.” Some of each of these were planted in both heavy and light soil and the resulting crops were weighed and examined for hard and soft corn.

The conclusion drawn was that the effect of selecting the seed was negligible compared with the influence of the soil.

This conclusion was thoroughly justified, the heavy soil producing in each case nearly 100 per cent. of hard corn, but still the effect of selecting the seed could just be traced in each year.

But a curious point, to which Dr Voelcker draws attention in the 2nd year’s report, is that the soft seeds produced the higher yield of both corn and straw. In view of the well-known fact that the varieties which have a high yield tend to produce soft corn, it is interesting to see how much evidence the experiments afford as to the correlation between softness and fertility in the same variety.

Further, Mr Hooker[7] has shown that the yield of wheat in one year is largely determined by the weather during the preceding harvest. Dr Voelcker’s results may afford a clue as to the way in which the seed is affected, and would almost justify the selection of particular soils for growing seed wheat[8].

The figures are as follows, the yield being expressed in grammes per pot.


Year
1899 1900 1901 Average Standard
Deviation

Soil
Light Heavy Light Heavy Light Heavy
Yield of corn from soft seed 07.85 08.89 14.81 13.55 07.48 15.39 11.328
Yield of corn from softhard seed 07.27 08.32 13.81 13.36 07.97 13.13 10.643

Difference
+.58 +.57 +1.00 +.19 −.49 +2.26 +.685 .778 .88
Yield of straw from soft seed 12.81 12.87 22.22 20.21 13.97 22.57 17.442
Yield of straw from softhard seed 10.71 12.48 21.64 20.26 11.71 18.96 15.927

Difference
+2.10 +.39 +.78 −.05 +2.66 +3.61 +1.515 1.261 1.20

If we wish to find the odds that soft seed will give a better yield of corn on the average, we divide the average difference by the standard deviation, giving us

.

Looking this up in the table for we find or the odds are .9465:535, about 18:1.

Similarly for straw , , and the odds about 45:1.

In order to see whether such odds are sufficient for a practical man to draw a definite conclusion, I take another set of experiments in which Dr Voelcker compares the effects of different artificial manures used with potatoes on the large scale.

The figures represent the difference between the crops grown with the use of sulphate of potash and kainit respectively in both 1904 and 1905.

cwt. qr. lb. ton cwt. qr. lb.
1904 + 10 3 20 : + 1 10 1 26 (two experiments in each year).
1905 + 06 0 3 : + 13 2 8
The average gain by the use of sulphate of potash was 15.25 cwt. and the S.D. 9 cwt., whence, if we want the odds that the conclusion given below is right, corresponding, when , to or odds of 32:1; this is midway between the odds in the former example. Dr Voelcker says ‘It may now fairly be concluded that for the potato crop on light land 1 cwt. per acre of sulphate of potash is a better dressing than kainit.’

As an example of how the tables should be used with caution, I take the following pot culture experiments to test whether it made any difference whether large or small seeds were sown.

Illustration III. In 1899 and in 1903 “head corn” and “tail corn” were taken from the same bulks of barley and sown in pots. The yields in grammes were as follows:

1899 1903
Large seed 13.9 7.3
Small seed 14.4 8.7
+.5 +.6

The average gain is thus .55 and the S.D. .05, giving . Now the table for is not given, but if we look up the angle whose tangent is 11 in Chambers’ tables,

,

so that the odds are about 33:1 that small corn gives a better yield than large. These odds are those which would be laid, and laid rightly, by a man whose only knowledge of the matter was contained in the two experiments. Anyone conversant with pot culture would however know that the difference between the two results would generally be greater and would correspondingly moderate the certainty of his conclusion. In point of fact a large scale experiment confirmed the result, the small corn yielding about 15 per cent. more than the large.

I will conclude with an example which comes beyond the range of the tables, there being eleven experiments.

To test whether it is of advantage to kiln-dry barley seed before sowing, seven varieties of barley were sown (both kiln-dried and not kiln-dried) in 1899 and four in 1900; the results are given in the table.

It will be noticed that the kiln-dried seed gave on an average the larger yield of corn and straw, but that the quality was almost always inferior. At first sight this might be supposed to be due to superior germinating power in the kiln-dried seed, but my farming friends tell me that the effect of this would be that the kiln-dried seed would produce the better quality barley. Dr Voelcker draws the conclusion “In such seasons as 1899 and 1900 there is no particular advantage in kiln-drying before sowing.” Our examination completely justifies this and adds “and the quality of the resulting barley is inferior though the yield may be greater.”

lbs. head corn per acre
Price of head corn in
shillings per quarter
cwts. straw per acre Value of crop per acre
in shillings[*]

N. K. D.
K. D. Diff. N. K. D. K. D. Diff. N. K. D. K. D. Diff. N. K. D. K. D. Diff.
1899 1903 2009 +106 261/2 261/2 0 191/4 25 +53/4 1401/2 152 +111/2
1935 1915 020 280 261/2 −11/2 223/4 240 +11/4 1521/2 1450 −71/2
1910 2011 +101 291/2 281/2 −10 230 240 +10 1581/2 1610 +21/2
2496 2463 033 300 290 −10 230 280 +5 2041/2 1991/2 −5
2108 2180 +072 271/2 270 1/20 221/2 221/2 0 1620 1640 +2
1961 1925 036 260 260 0 193/4 191/2 1/40 1420 1391/2 −21/2
2060 2122 +062 290 260 −3 241/2 221/4 −21/4 1680 1550 −13
1900 1444 1482 +038 291/2 281/2 −10 151/2 160 +1/20 1180 1171/2 1/2
1612 1542 070 281/2 280 1/2 180 171/4 3/4 1281/2 1210 −71/2
1316 1443 +127 300 290 −10 141/4 153/4 +11/2 1091/2 1161/2 +7
1511 1535 +024 281/2 280 1/20 170 171/4 +1/40 1200 1201/2 +1/2

Average
1841.5 1875.2 +33.7 28.45 27.55 −.91 19.95 21.05 +1.10 145.82 144.68 +1.14
Standard
Deviation
63.1 .79 2.25 6.67
Standard
Deviation
÷√8
22.3 .28 0.80 2.40

 * Straw being valued at 15s. per ton.

In this case I propose to use the approximation given by the normal curve with standard deviation and therefore use Sheppard’s tables, looking up the difference divided by . The probability in the case of yield of corn per acre is given by looking up in Sheppard’s tables. This gives , or the odds are about 14:1 that kiln-dried corn gives the higher yield.

Similarly , corresponding to ,[9] so that the odds are very great that kiln-dried seed gives barley of a worse quality than seed which has not been kiln-dried.

Similarly it is about 11 to 1 that kiln-dried seed gives more straw and about 2:1 that the total value of the crop is less with kiln-dried seed.

Section X.

Conclusions.

I. A curve has been found representing the frequency distribution of standard deviations of samples drawn from a normal population.

II. A curve has been found representing the frequency distribution of values of the means of such samples, when these values are measured from the mean of the population in terms of the standard deviation of the sample.

III. It has been shown that this curve represents the facts fairly well even when the distribution of the population is not strictly normal.

IV. Tables are given by which it can be judged whether a series of experiments, however short, have given a result which conforms to any required standard of accuracy or whether it is necessary to continue the investigation.

Finally I should like to express my thanks to Professor Karl Pearson, without whose constant advice and criticism this paper could not have been written.

  1. Airy, Theory of Errors of Observations, Part II. § 6.
  2. This expression will be found to give a much closer approximation to than Wallis’s.
  3. See p. 19.
  4. E.g. if , to the corresponding value for we add : if we add as well and so on.
  5. I hope to publish the results of the correlation work shortly.
  6. There are three small mistakes in plotting the observed values in Diagram III., which make the fit appear worse than it really is.
  7. Journal of Royal Statistical Society, 1907.
  8. And perhaps a few experiments to see whether there is a correlation between yield and ‘mellowness’ in barley.
  9. As pointed out in Section V. the normal curve gives too large a value for when the probability is large. I find the true value in this case to be . It matters little however to a conclusion of this kind whether the odds in its favour are 1,660:1 or merely 416:1.

This work is in the public domain in the United States because it was published before January 1, 1929.


The longest-living author of this work died in 1937, so this work is in the public domain in countries and areas where the copyright term is the author's life plus 86 years or less. This work may be in the public domain in countries and areas with longer native copyright terms that apply the rule of the shorter term to foreign works.

Public domainPublic domainfalsefalse