aggregate is given by a corresponding term in the expansion of
(q + p), and by a well-known theorem[1] this term is approximately
equal to 1√π2npqe−ν2/2npq; where ν is the number of integers
by which the term is distant from np (or an integer close to np);
provided that ν is of (or <) the order √n. Graphically, let the
sortition made for each element be represented by the taking or
not taking with respective frequency p and q a step of length i.
If a body starting from zero takes successively n such steps, the
point at which it will most probably come to a stop is at npi
(measured from zero); the probability of its stopping at any neighbouring
point within a range of ± √ni is given by the
above-written law of frequency, νi being the distance of the stopping-point
from npi. Put νi = x and 2npqi2 = c2; then the probability
may be written .
104. It is a short step, but a difficult one, from this case, in which the element is binomial—heads or tails—to the general case, in which the element has several values, according to the law of frequency—consists, for instance, of the member of points presented by a randomly-thrown die. According to the general theorem, if Q is the sum[2] of numerous elements, each of which assumes different magnitudes according to a law of frequency, z = fr(x), the function f being in general different for different elements, the number of times that Q assumes magnitudes between x and x + ∆x in the course of N trials is Nz∆x, if ; where a is the sum of the arithmetic means of all the elements, any one of which ar = [∫xfr(x)dx], the square brackets denoting that the integrations extend between the extreme limits of the element's range, if the frequency-locus for each element is continuous, it being understood that [∫fr(x)dx] = 1; and k is the sum of the mean squares of error for each element, = ∑[∫ξ2fr(ar + ξ)dξ], if the frequency-locus for each element is continuous, where ar is the arithmetic mean of one of the elements, and ξ the deviation of any value assumed by that element from ar, ∑ denoting summation over all the elements. When the frequency-locus for the element is not continuous, the integrations which give the arithmetic mean and mean square of error for the element must be replaced by summations. For example, in the case of the dice above instanced, the law of frequency for each element is that it assumes equally often each of the values 1, 2, 3, 4, 5, 6. Thus the arithmetic mean for each element is 3.5, and the mean square of error {(3.5 − 1)2 + (3.5 − 2)2 + &c.}/6 = 2.916. Accordingly, the sum of the points obtained by tossing a large number, n, of dice at random will assume a particular value x with a frequency which is approximately assigned by the equation
.
The rule equally applies to the case in which the elements are not similar; one might be the number of points on a die, another the number of points on a domino, and so on. Graphically, each element is no longer represented by a step which is either null or i, but by a step which may be, with an assigned probability, one or other of several degrees between those limits, the law of frequency and the range of i being different for the different elements.
105. Variant Proofs.—The evidence of these statements can only be indicated here. All the proofs which have been offered involve some postulate as to the deviation of the elements from their respective centres of gravity, their “errors.” If these errors extended to infinity, it might well happen that the law of error would not be fulfilled by a sum of such elements.[3] The necessary and sufficient postulate appears to be that the mean powers of deviation for the elements, the second (above written) and the similarly formed third, fourth, &c., powers (up to some assigned power), should be finite.[4]
106. (1) The proof which seems to flow most directly from this postulate proceeds thus. It is deduced that the mean powers of deviation for the proposed representative curve, the law of error (up to a certain power), differ from the corresponding powers of the actual locus by quantities which are negligible when the number of the elements is large.[5] But loci which have their mean powers of deviation (up to some certain power) approximately equal may be considered as approximately coincident.[6]
107. (2) The earliest and best-known proof is that which was originated by Laplace and generalized by Poisson.[7] Some idea of this celebrated theory may be obtained from the following free version, applied to a simple case. The case is that in which all the elements have one and the same locus of frequency, and that locus is symmetrical about the centre of gravity. Let the locus be represented by the equation η = φ(ξ), where the centre of gravity is the origin, and φ(+ξ) = φ(−ξ); the construction signifying that the probability of the element having a value ξ (between say ξ − ½∆ξ and ξ + ½∆ξ is φ(ξ)∆ξ. Square brackets denoting summation between extreme limits, put χ(a) for [Sφ(ξ)e√−1aξ∆ξ] where ξ is an integer multiple of ∆ξ (or ∆x) = ρ∆x, say. Form the mth power of χ(a). The coefficient of e√−1ar∆x in (χ(a))m is the probability that the sum of the values of the m elements should be equal to r∆x; a probability which is equal to ∆xyr, where y is the ordinate of the locus representing the frequency of the compound quantity (formed by the sum of the elements). Owing to the symmetry of the function φ the value of yr, will not be altered if we substitute for e√−1ar∆x, e−√−1ar∆x, nor if we substitute ½(e+√−1ar∆x + e−√−1ar∆x), that is cos ar∆x. Thus (χ(a))m becomes a sum of terms of the form ∆xyr cos ar∆x, where y−r = y+r. Now multiply (χ(a))m thus expressed by cos t∆xa, where, t being an integer, t∆x =x, the abscissa of the “error” the probability of whose occurrence is to be determined. The product will consist of a sum of terms of the form ∆xyr ½(cos a(r + t)∆x + cos a(r − t)∆x). As every value of r − t (except zero) is matched by a value equal in absolute magnitude, −r + t, and likewise every value of r + t is matched by value −r − t, the series takes the form ∆xyr∑ cos qa∆x + ∆xyt, where q has all possible integer values from 1 to the largest value of |r|[8] increased by |t|; and the term free from circular functions is the equivalent of ∆xyr cos a(r + t)∆x, when r = −t, together with ∆xyr cos a(r − t)∆x, when r = +t. Now substitute for a∆x a new symbol β; and integrate with respect to β, the thus transformed (χ(a))m cos t∆xa between the limits β = 0 and β = π. The integrals of all the terms which are of the form ∆xyr cos qβ will vanish, and there will be left surviving only π∆xyt. We thus obtain, as equal to π∆xyt, . Now change the independent variable to a; then as dβ = da∆x,
.
Replacing t∆x by x, and dividing both sides by ∆x, we have
.
Now expanding the cos ax which enters into the expression for χ(a), we obtain
χ(a) = [Sφ(a)] − 12![Sφ(a)a2]x2 + 14![Sφ(a)a4]x4 . . ⋅
Performing the summations indicated, we express χ(a) in terms of the mean powers of deviation for an element. Whence χ(a)m is expressible in terms of the mean powers of the compound locus. First and chief is the mean second power of deviation for the compound, which is the sum of the mean second powers of deviation for the elements, say k. It is found that the sought probability may be equated to - . . ., where k2 is the coefficient defined below.[9] Here π/∆x may be replaced by ∞, since the finite difference ∆x is small with respect to unity when the number of the elements is large;[10] and thus the integrals involved become equateable to known definite integrals. If it were allowable to neglect all the terms of the series but the first the expression would reduce to 1√(2πk)e−u2/k, the normal law of error. But it is allowable to neglect the terms after the first, in a first approximation, for values of x not exceeding a certain range, the number of the elements being large, and if the postulate above enunciated is satisfied.[11] With these reservations it is proved that the sum of a number of similar and symmetrical elements conforms to the normal law of error. The proof is by parity extended to the case in which the elements have different but still symmetrical frequency functions; and, by a bolder use of imaginary quantities, to the case of unsymmetrical functions.
- ↑ By the use of Stirling's and Bernoulli's theorems, Todhunter, History. . . of Probability.
- ↑ The statement includes the case of a linear function, since an element multiplied by a constant is still an element.
- ↑ E.g. if the frequency-locus of each element were 1/π(1 + x2), extending to infinity in both directions. But extension to infinity would not be fatal, if the form of the element's locus were normal.
- ↑ For a fuller exposition and a justification of many of the statements which follow, see the writer's paper on “The Law of Error” in the Camb. Phil. Trans. (1905).
- ↑ Loc. cit. pt. i. § 1.
- ↑ On this criterion of coincidence see Karl Pearson's paper “On the Systematic Fitting of Curves,” Biometrika, vols. i. and ii.
- ↑ Laplace, Théorie analytique des probabilités, bk. ii. ch. iv.; Poisson, Recherches sur la probabilité des judgements. Good restatements of this proof are given by Todhunter, History . . . of Probability art. 1004, and by Czuber, Theorie der Beobachtungsfehler, art. 38 and Th. 2, §4.
- ↑ The symbol || is used to denote absolute magnitude, abstraction being made of sign.
- ↑ Below, pars. 159, 160.
- ↑ Loc. cit. app. I.
- ↑ Loc. cit. p. 53 and context.