Jump to content

Translation:Theoria combinationis observationum erroribus minimis obnoxiae

From Wikisource
Theory of the combination of observations which is subject to the least error (1821–1823)
by Carl Friedrich Gauss, translated from French by Wikisource

Based on the 1855 French translation by Joseph Bertrand.

Carl Friedrich Gauss4446897Theory of the combination of observations which is subject to the least error1821-1823Wikisource

Part One

[edit]

1.

[edit]

No matter how careful one is with observations concerning the measurement of physical quantities, they are inevitably subject to errors of varying degrees. These errors, in most cases, are not simple but arise from several distinct sources that it is best to distinguish into two classes.

Some causes of errors depend, for each observation, on variable circumstances independent of the result obtained: the errors arising from these are called "irregular" or "random," and like the circumstances that produce them, their value is not amenable to calculation. Such are the errors that arise from the imperfection of our senses and all those due to irregular external causes, e.g. vibrations of the air that blur our vision. Some of the errors due to the inevitable imperfection of even the best instruments, e.g. the roughness of the inner part of a level, its lack of absolute rigidity, etc., belong to this same category.

On the other hand, there are other causes that produce an identical error in all observations of the same kind, or one whose magnitude depends only on circumstances that can be viewed as essentially connected to the observation. We will call errors of this category "constant" or "regular" errors.

Moreover, one can see that this distinction is to a certain extent relative, and has a broader or narrower sense depending on the meaning one attaches to the idea of observations of the same nature. E.g. if one indefinitely repeats the measurement of the same angle, the errors arising from imperfect division of the instrument belongs to the class of constant errors. If, on the other hand, one successively measures several different angles, the errors due to imperfect division will be considered random until a table of errors relative to each division has been formed.

2.

[edit]

We exclude the consideration of regular errors from our discussion. It is up to the observer to carefully investigate the causes that can produce a constant error, to eliminate them if possible, or at least assess their effect in order to correct it for each observation, which will then give the same result as if the constant cause had not existed. It is quite different for irregular errors: by their nature, they resist any calculation, and they must be tolerated in observations. However, by skillfully combining results, their influence can be minimized as much as possible. The following investigation is devoted to this most important topic.

3.

[edit]

Errors arising from a simple and determinate cause in observations of the same kind are confined within certain limits that could undoubtedly be assigned if the nature of this cause were perfectly known. In most cases, all errors between these extreme limits must be considered possible. A thorough knowledge of each cause would reveal whether all these errors have equal or unequal likelihood, and in the latter case, what the relative probability of each of them is. The same remark applies to the total error resulting from the combination of several simple errors. This error will also be confined between two limits, one being the sum of the upper limits, the other the sum of the lower limits corresponding to the simple errors. All errors between these limits will be possible, and each can result, in an infinite number of ways, from suitable values attributed to the partial errors. Nevertheless, it is possible to assign a larger or smaller likelihood for each result, from which the law of relative probability can be derived, provided that the laws of each of the simple errors are assumed to be known, and ignoring the analytical difficulties involved in collecting all of the combinations.

Of course, certain sources of error produce errors that cannot vary according to a continuous law, but are instead capable of a finite number of values, such as errors arising from the imperfect division of instruments (if indeed one wants to classify them among random errors), because the number of divisions in a given instrument is essentially finite. Nevertheless, if it is assumed that not all sources of error are of this type, then it is clear that the complex of all possible total errors will form a series subject to the law of continuity, or, at least, several distinct series, if it so happens that, upon arranging all possible values of the discontinuous errors in order of magnitude, the difference between a pair of consecutive terms is greater than the difference between the extreme limits of the errors subject to the law of continuity. In practice, such a case will almost never occur, unless the the instrument is subject to gross defects.

4.

[edit]

Let denote the relative likelihood of an error this means, due to the continuity of the errors, that is the probability that the error lies between the limits and In practice it is hardly possible, or perhaps impossible, to assign a form to the function a priori. Nevertheless, several general characteristics that it must necessarily present can be established: is obviously a discontinuous function; it vanishes for all values of not between the extreme errors. For any value between these limits, the function is positive (excluding the case indicated at the end of the previous article); in most cases, errors of opposite signs will be equally possible, and thus we will have: Finally, since small errors are more easily made than large ones, will generally have a maximum when and will continually decrease as increases.

In general, the integral expresses the probability that the unknown error falls between the limits and It follows that the value of this integral taken between the extreme limits of the possible errors will always be And since is zero for values not between these limits, it is clear that in all cases

the value of the integral will always be

5.

[edit]

Let us consider the integral and denote its value by If the sources of error are such that there is no reason for two equal errors of opposite signs to have unequal likelihood, we will have and consequently, We conclude that if does not vanish and has e.g. a positive value, then there necessarily exists an error source that produces only positive errors or, at least, produces them more easily than negative errors. This quantity which is the average of all possible errors, or the average value of can conveniently be referred to as the "constant part of the error". Moreover, it is easily proven that the constant part of the total error is the sum of the constant parts of the simple errors of which it is composed.

If the quantity is assumed to be known and subtracted from the result of each observation, then, denoting the error of the corrected observation by and the corresponding probability by we will have and consequently, i.e. the errors of the corrected observations will have no constant part, which is clear in and of itself.

6.

[edit]

The value of the integral which is the average value of reveals the presence or absence of a constant error, as well as the value of this error. Similarly, the integral which is the average value of seems very suitable for defining and measuring, in a general manner, the uncertainty of a system of observations. Therefore, between two systems of observations of unequal precision, the one giving a smaller value to the integral should be considered preferable. If it is argued that this convention is arbitrary and seemingly unnecessary, then we readily agree. The question at hand is inherently vague and can only be delimited by a somewhat arbitrary principle. Determining a quantity through observation can be likened, somewhat accurately, to a game in which there is a loss to be feared and no gain to be expected; each error being likened to a loss incurred, the relative apprehension about such a game should be expressed by the probable loss, i.e., by the sum of the products of the various possible losses by their respective probabilities. But what loss should be likened to a specific error? This is not clear in itself; its determination depends partly on our whim. It is evident, first of all, that the loss should not be regarded as proportional to the error committed; for, in this hypothesis, a positive error representing a loss, the negative error should be regarded as a gain: on the contrary, the magnitude of the loss should be evaluated by a function of the error whose value is always positive. Among the infinite number of functions that fulfill this condition, it seems natural to choose the simplest one, which is undoubtedly the square of the error, and thus we are led to the principle proposed above.

Laplace considered the question in a similar manner, but adopted as a measure of loss the error itself, taken positively. This assumption, if we do not deceive ourselves, is no less arbitrary than ours: should we, indeed, consider a double error as more or less regrettable than a simple error repeated twice, and should we, consequently, assign it a double or more than double importance? This is a question that is not clear, and on which mathematical arguments have no bearing; each must resolve it according to their preference. Nevertheless, it cannot be denied that Laplace's assumption deviates from the law of continuity and is therefore less suitable for analytical study; ours, on the other hand, is recommended by the generality and simplicity of its consequences.

7.

[edit]

Let us define we will call the "mean error to be feared" or simply the "mean error" of the observation whose indefinite errors have a relative probability of We do not limit this designation to the immediate result of the observations, but rather extend it to any quantity that can be derived from them in any way. It is important not to confuse this mean error with the arithmetic mean of the errors, which is discussed in art. 5.

When comparing several systems of observations or several quantities resulting from observations that are not given the same precision, we will consider their relative "weight" to be inversely proportional to and their "precision" to be inversely proportional to In order to represent the weights by numbers, we should take, as the unit, the weight of a certain arbitrarily chosen system of observations.

8.

[edit]

If the errors of the observations have a constant part, subtracting it from each obtained result reduces the mean error, increases the weight and precision. Retaining the notation of art. 5, and letting denote the mean error of the corrected observations, we have

If, instead of the constant part another number were subtracted from each observation, the square of the mean error would become

9.

[edit]

Let be a determined coefficient and let the value of the integral

Then will be the probability that the error of a certain observation is less than in absolute value, and will be the probability that this error exceeds If, for has the value it will be equally likely for the error to be smaller or larger than thus, can be called the probable error. The relationship between and depends on the nature of the function which is unknown in most cases. However,i t is interesting to study this relationship in some particular cases.

I. If the extreme limits of the possible errors are and and if, between these limits, all errors are equally probable, the function will be constant between these same limits, and, consequently, equal to Hence, we have and so long as is less than or equal to finally and the probability that the error does not exceed the mean error is

II. If as before and are the limits of possible errors, and if we assume that the probability of these errors decreases from the error onwards like the terms of an arithmetic progression, then we will have

for values of between and
for values of between and

From this, we deduce that and as long as is between 0 and as long as is between 0 and 1; and finally,

In this case, the probability that the error remains below the mean error is

III. If we assume the function to be proportional to then it must be equal to

where denotes the semiperimeter of a circle of radius from which we deduce

(see Disquisitiones generales circa seriem infinitam, art. 28). If we let denote the value of the integral

then we have

The following table gives some values of this quantity:

10.

[edit]

Although the relationship between and depends on the nature of the function some general results can be established that apply to all cases where this function does not increase with the absolute value of the variable then we have the following theorems:

will not exceed whenever is less than
will not exceed whenever exceeds

When the two limits coincide and cannot exceed

To prove this remarkable theorem, let be the value of the integral Then will be the probability that an error is between and Let us set

then we have and

and by hypothesis is always increasing between and or at least is not decreasing, or equivalently is always positive, or at least not negative. Now we have

thus,

Therefore, always has positive value, or at least this expression is never negative, and therefore

will always be positive and less than unity. Let be the value of this difference for since we have

or

This being prepared, let's consider the function

which we set and also Then it is clear that

Since is continually increasing with (or at least does not decrease, which should always be understood), and at the same time is constant, the difference

will be positive for all values of greater than and negative for all values of smaller than It follows that the difference is always positive, and consequently, will certainly be greater than in absolute value, as long as the function is positive, i.e. between and The value of the integral

will therefore be less than that of the integral

and a fortiori less than

i.e., less than Now the value of the first of these integrals is found to be

 ;

and therefore is less than with being a number between and If we consider as a variable, then this fraction, whose differential is

will be continually decreasing as increases from to so long as is less than and therefore its maximum value will be found when and will be so that in this case, the coefficient will certainly be less, or at least not greater than Q.E.P. On the other hand, when is greater than the maximum value of the function will be found when i.e. for and this maximum value will be so in this case, the coefficient will not be greater than Q.E.S.

Thus e.g. for it is certain that will not exceed which means that the probable error cannot exceed to which it was found to be equal in the first example in art. 9. Furthermore, it is easily concluded from our theorem that is not less than when is less than and on the other hand, it is not less than when is greater than

11.

[edit]

Since several of the problems discussed below involve the integral it will be worthwhile for us to evaluate it in some special cases. Let us denote the value of the integral

by I. When for values of between and we have

II. In the second case of art. 9, with still between and we have

III. In the third case, where

we find, as explained in the commentary cited above, that

It can also be demonstrated, with only the assumptions of the previous article, that the ratio is never less than

12.

[edit]

Let etc. denote the errors made in observations of the same kind, and suppose that these errors are independent of each other. Let be the relative probability of error and let be a rational function of variables etc. Then the multiple integral

(I)

extended to all values of the variables etc. for which the value of falls between the given limits and represents the probability that the value of is between and This integral is evidently a function of whose differential we set so that the integral in question is equal to and therefore, represents the relative probability of an arbitrary value of Since can be regarded as a function of the variables etc., which we set

the integral (I) will be

where takes values between and and the other variables take all values for which is real. Hence we have

the integration, where is to be regarded as a constant, being extended to all values of the variables etc. for which takes a real value.

13.

[edit]

The previous integration would require knowledge of the function which is unknown in most cases. Even if this function were known, the calculation would often exceed the capabilities of analysis. Therefore, it will be impossible to obtain the probability of each value of but it is different if one asks only for the average value of which will be given by the integral extended to all possible values of And since it is evident that for all values which cannot attain, either due to the nature of the function (e.g. for negative values, if etc.), or because of the limits imposed on etc., one can assume that it is clear that the integration can be extended to all real values of from to

But the integral taken between determinate limits and is equal to the integral

,

taken from to and extended to all values of the variables etc. for which is real. This integral is therefore equal to the integral

in which is expressed as a function of etc., and the integration is extended to all values of the variables that leave between and Thus, the integral}}

can be obtained from the integral

where the integration is extended to all real values of that is, from to to etc.

If the function reduces to a sum of terms of the form

then the value of the integral

extended to all values of or equivalently the average value of will be equal to a sum of terms of the form

that is, the average value of is equal to a sum of terms derived from those that make up by replacing etc. with their average values. The proof of this important theorem could easily be derived from other considerations.

15.

[edit]

Let us apply the theorem of the previous article to the case where

and denotes the number of terms in the numerator.

We immediately find that the average value of is equal to the letter having the same meaning as above. The true value of may be lower or higher than its average, just as the true value of may, in each case, be lower or higher than but the probability that by chance, the value of differs by a small amount from will approach certainty as becomes larger. In order to clarify this, since it is not possible to determine this probability exactly, let us investigate the mean error to be feared when It is clear from the principles of art. 6 that this error will be the square root of the average value of the function

To find it, it suffices to observe that the average value of a term such as is equal to ( having the same meaning as in art. 11), and that the average value of a term such as is equal to therefore, the average value of this function will be

Since this last formula contains the quantity if we only want to get an idea of the precision of this determination, it will suffice to adopt a certain hypothesis about the function E.g. if we take the third assumption of arts. 9 and 11, this error will be equal to Alternatively, we can obtain an approximate value of by means of the errors themselves, using the formula

In general, it can be stated that a precision twice as great in this determination will require a quadruple number of errors, meaning that the weight of the determination is proportional to the number

Similarly, if the errors of the observations contain a constant part, we will deduce from their arithmetic mean a value of the constant part, and this value will be approached as the number of errors increases. In this determination, the mean error to be feared will be represented by where denotes the constant part, and denotes the mean error of the observations uncorrected for their constant error. It will be simply represented by if represents the mean error of the observations corrected for the constant part (see art 8).

16.

[edit]

In the arts. 12-15, we assumed that the errors etc. belonged to the same type of observation, so that the probability of each of these errors was represented by the same function. However, it is clear that the general principles outlined in arts. 12-14 can be applied with equal ease in the more general case where the probabilities of the errors etc., are represented by different functions etc., i.e. when these errors belong to observations of varying precision or uncertainty. Let denote the error of an observation with a mean error to be feared of and let etc. denote the errors of other observations with mean errors to be feared of etc. Then the average value of the sum etc. will be etc. Now, if it is also known that the quantities etc. are respectively proportional to the numbers etc., then the average value of the expression

will be However, if we adopt for the value that this expression will take, by substituting the errors etc., as chance offers them, then the mean error affecting this determination will become, just as in the preceding article,

where etc., have the same meaning with respect to the second and third observation, as does with respect to the first; and if we can assume the numbers etc., proportional to etc., this mean error to be feared will be equal to

 ;

But this method of determining an approximate value for is not the most advantageous. Consider the more general expression

whose average value will also be regardless of the coefficients etc. The mean error to be feared when substituting the value for a value of as determined by the likelihoods of etc., will, according to the principles above, be given by the formula

To minimize this error, we must set

These values cannot be evaluated until the exact ratios etc. are known. In the absence of exact knowledge[1], it is safest to assume them equal to each other (see art. 11), in which case

i.e. the coefficients etc., should be assumed equal to the relative weights of the various observations, taking the weight of the one corresponding to the error as the unit. With this assumption, let denote, as above, the number of proposed errors. Then the average value of the expression

will be and when we take, for the true value of the randomly determined value of this expression, the mean error to be feared will be

and, finally, if we are allowed to assume that the quantities etc., are proportional to etc., this expression reduces to

which is identical to what we found in the case where all observations were of the same type.

17.

[edit]

When the value of a quantity, which depends on an unknown magnitude, is determined by an observation whose precision is not absolute, the result of this observation may provide an erroneous value for the unknown, but there is no room for discretion in this determination. But if several functions of the same unknown have been found by imperfect observations, we can obtain the value of the unknown either by any one of these observations, or by a combination of several observations, which can be carried out in infinitely many ways. The result will be subject, in all cases, to a possible error, and depending on the combination chosen, the mean error to be feared may be greater or smaller. The same applies if several observed quantities depend on multiple unknowns. Depending on whether the number of observations equals the number of unknowns, or is smaller or larger than this number, the problem will be determined, undetermined, or more than determined (at least in general), and in this third case, the observations can be combined in infinitely many ways to provide values for the unknowns. Among these combinations, the most advantageous ones must be chosen, i.e., those that provide values for which the mean error to be feared is as small as possible. This problem is certainly the most important one presented by the application of mathematics to natural philosophy.

In Theoria motus corporum coelestium we have shown how to find the most probable values of unknowns when the probability law of the observational errors is known, and since, in almost all cases, this law remains hypothetical by its nature, we have applied this theory to the highly plausible hypothesis that the probability of error is proportional to Hence this method that I have followed, especially in astronomical calculations, and which most calculators now use under the name of Method of Least Squares.

Laplace later considered the question from another point of view, and showed that this principle is preferable to all others, regardless of the probability law of the errors, provided that the number of observations is very large. But when this number is limited, the question remains open; so that, if we reject our hypothetical law, the method of least squares would be preferable to others, for the sole reason that it leads to simpler calculations.

We therefore hope to please geometers by demonstrating in this Memoir that the method of least squares provides the most advantageous combination of observations, not only approximately, but also absolutely, regardless of the probability law of errors and regardless of the number of observations, provided that we adopt for the mean error, not Laplace's definition, but the one which we have given in arts. 5 and 6.

It is necessary to warn here that in the following investigations, only random errors reduced by their constant part will be considered. It is up to the observer to carefully eliminate the causes of constant errors. We reserve for another occasion the examination of the case where observations are affected by an unknown constant error, and we will address this issue in another Memoir.

18.

[edit]

Problem. Let be a given function of the unknowns etc.; we ask for the mean error to be feared in determining the value of when, instead of the true values of etc., we take the values derived from independent observations; etc., being the mean errors corresponding to these various observations.

Solution. Let etc. denote the errors of the observed values etc.; the resulting error for the value of the function can be expressed by the linear function

where etc., represent the derivatives etc., when etc., are replaced by their true values.

This value of is evident if we assume the observations to be accurate enough so that the squares and products of the errors are negligible. It follows that the average value of is zero, since we assume that the errors of the observations have no constant part. Now the mean error to be feared in the value of will be the square root of the average value of or equivalently will be the average value of the sum

but the average value of is that of is , etc., and finally the average values of the products are all zero. Hence we find that

It is good to add several remarks to this solution.

I. Since we neglect powers of errors higher than the first, we can, in our formula, take for etc., the values of the differential coefficients etc., derived from the observed values etc. Whenever is a linear function, this substitution is rigorously exact.

II. If instead of mean errors, one prefers to introduce weights etc. for the respective observations, with the unit being arbitrary, and being the weight of the value of Then we will have

III. Let be another function of etc., and let

The error in the determination of from the observed values etc., will be

and the mean error to be feared in this determination will be

It is obvious that the errors and will not be independent of each other, and the mean value of the product will not be like the mean value of but instead it will be equal to

IV. The problem includes the case where the values of the quantities etc., are not immediately given by observation, but are deduced from any combinations of direct observations. For this extension to be legitimate, the determinations of these quantities must be independent, i.e., they must be provided by different observations. If this condition of independence is not fulfilled, the formula giving the value of would no longer be accurate. For example, if the same observation were used both in determining and in determining the errors and would no longer be independent, and the mean value of the product would no longer be zero. If, in this case, the relationship between and and the results of the simple observations from which they derive is known, we can calculate the mean value of the product as indicated in remark III, and consequently correct the formula which gives

19.

[edit]

Let etc., be functions of the unknowns etc. Let be the number of these functions, and let be the number of unknowns. Suppose that observations have given, immediately or indirectly, etc., and that these determinations are absolutely independent of each other. If is greater than then the determination of the unknowns is an indeterminate problem. If is equal to then each of the unknowns etc., can be reduced to a function of etc., so that the values of the former can be deduced from the observed values of the latter, and the previous article will allow us to calculate the relative accuracy of these various determinations. If is less than then each unknown etc., can be expressed in infinitely many ways as a function of etc., and, in general, these values will be different; they should coincide if the observations were, contrary to our assumptions, rigorously accurate. It is clear, moreover, that the various combinations will provide results whose accuracy will generally be different.

Moreover, if, in the second and third cases, the quantities etc., are such that of them, or more, can be regarded as functions of the others, the problem is more than determined relative to these latter functions and indeterminate relative to the unknowns etc.; and we could not even determine these latter unknowns, even if the functions etc., were exactly known: but we exclude this case from our investigations.

If etc., are not linear functions of the unknowns, we can always assign them this form, by replacing the primitive unknowns with their difference from their approximate values, which we assume known; the mean errors to be feared in the determinations

being respectively denoted by etc., and the weights of these determinations by etc., so that

We will assume that both the ratios of the mean errors and the weights are known, one of which will be arbitrarily chosen. Finally, if we set

then things will proceed as if immediate observations, equally precise and with mean error had given

20.

[edit]

Problem. Let etc., be the following linear functions of the unknowns etc.,

(1)

Among all systems of coefficients etc., that identically satisfy

being independent of etc., find the one for which obtains its minimum value.

Solution. — Let us set

(2)

are linear functions of and we have

(3)

where denotes the sum and similarly for the other sums.

The number of quantities etc., is equal to the number of unknowns etc., namely . Thus, by elimination, one can obtain an equation of the following form,[2]

which will be identically satisfied if we replace with their values from (3). Consequently, if we set

(4)

then we will have identically

(5)

This equation shows that among the different systems of coefficients etc., we must consider the system

Moreover, for any system, we will have identically

and this equation, being identical, leads to the following:

Adding these equations after multiplying them, respectively, by etc., we will have, by virtue of the system (4),

which is the same as

thus, the sum

will have its minimum value when etc. Q.E.I.

Moreover, this minimum value will be obtained as follows. Equation (5) shows that we have

Let's multiply these equations, respectively, by etc., and add them; considering the relations (4), we find

21.

[edit]

When the observations have provided approximate equations etc., it will be necessary, to determine the unknown to choose a combination of the form

such that the unknown acquires a coefficient equal to , and that the other unknowns are eliminated.

According to art. 18, the weight of this determination will be given by

According to the previous article, the most suitable determination will be obtained by taking etc. Then will have the value and it is clear the same value would be obtained (without knowing the multipliers etc.), by performing elimination on the equations etc. The weight of this determination will be given and the mean error to be feared will be

A similar approach would lead to the most suitable values of the other unknowns etc., which would be those obtained by performing eliminating on the equations etc.

If we denote the sum or equivalently

by , then it is clear that etc. will be the partial differential quotients of the function i.e.

Therefore, the values of the unknowns that are deduced from the most suitable combination, and which we can call the most plausible values, are precisely those that minimize . Now represents the difference between the observed value and the computed value. Thus, the most plausible values of the unknowns are those that minimize the sum of the squares of the differences between the calculated and observed values of the quantities etc., these squares being respectively multiplied by the weight of the observations. I had established this principle a long time ago through other considerations, in Theoria Motus Corporum Coelestium.

If one wants to assign the relative precision of each determination, it is necessary to deduce the values of etc. from the equations (3), which gives them in the following form:

(7)

Accordingly, the most plausible values of the unknowns etc., will be etc. The weights of these determinations will be etc. and the mean errors to be feared will be

for
for
for

in agreement with the results obtained in Theoria Motus Corporum Coelestium.

22.

[edit]

The case where there is only one unknown is the most frequent and simplest of all. In this case we have etc. We will then have etc., etc., and consequently,

Hence

Therefore, if by several observations that do not have the same precision and whose respective weights are etc., we have found, for the same quantity, a first value a second a third etc., then the most plausible value will be

and the weight of this determination will be If all observations are equally plausible, then the most probable value will be

i.e. the arithmetic mean of the observed values; taking the weight of an individual observation as the unit, the weight of the average will be

Part Two

[edit]

23.

[edit]

A number of investigations still remain to be discussed, through which the preceding theory will be clarified and extended.

Let us first investigate whether the elimination used to express the variables etc., in terms of etc., is always possible. Since the number of equations is equal to the number of unknowns, we know that this elimination will be possible if etc. are independent of each other; otherwise, it is impossible.

Suppose, for a moment, that etc. are not independent, but rather there exists between these quantities an identical equation

We will then have

Let us set

(1)

from which it follows that

Multiplying the equations (1) resp. by etc., and adding, we obtain

and this equation leads to etc. From this we conclude, first of all, Secondly, the equations (1) show that the functions etc., are such that their values do not change when the variables etc., increase or decrease proportionally to etc. respectively. It is clear that the same holds for the functions etc.: but this can only happen in the case where it would be impossible to determine etc. using the values of etc., even if these were exactly known; but then the problem would be indeterminate by its nature, and we will exclude this case from our investigations.

24.

[edit]

If etc. denote multipliers playing the same role relative to the unknown as the multipliers etc. relative to the unknown i.e. so that we have

then we will identically have

Let etc. be the analogous multipliers relative to the variable so that we have:

and consequently,

In the same way as we found in art. 20 that

we will find here

and so on.

We will also have, as in art. 20

If we multiply the values etc. (art. 20. (4)), respectively, by etc., and add; we obtain

or

If we multiply etc., respectively, by etc., and add, we will find

and thus

In the same manner, we find

25.

[edit]

Let etc. denote the values taken by the functions etc., when etc. are replaced by their most plausible values, etc., i.e.

If we set

so that is the value of the function corresponding to the most plausible values of the variables, and therefore, as was shown in art. 20, the minimum value of Then the value of will be corresponding to etc.\end{aligned},</math> and this value is zero, according to the way etc. have been obtained. Thus, we have

and similarly we would obtain

and

Finally, multiplying the values of etc. respectively by and adding, we get or

26.

[edit]

Replacing etc., with the expressions (7) from art. 21 in the equation we find, through the same reductions as before,

Multiplying either these equations or the equations (1) of art. 20, by etc., and then adding, we obtain the identity

27.

[edit]

The function can take several forms, which are worth developing.

Let us square the equations (1) art. 20, and add them. Then we find

this is the first form.

Next let us multiply the same equations by etc. respectively, and add. Then we obtain and replacing etc., with the values indicated in the previous article, we find that or this is the second form.

Finally, replacing, in this second form, etc. by the expressions (7) art. 21, we obtain the 'third form':

We can also give a fourth form which results automatically from the third form and the formulas of the previous article:

or

From this last form we clearly see that is the minimum value of

28.

[edit]

Let etc., be the errors made in the observations that gave etc. Then the true values of the functions etc., will be etc. respectively, and the true values of etc., will be etc. respectively. therefore, the true value of will be

and the error made in the most suitable determination of the unknown which we will denote by will be

Similarly, the error made in the most suitable determination of the value of will be

The average value of the square will be

The average value of will similarly be as shown above. We can also determine the average value of the product which will be

These results can be stated more briefly as follows:

The average values of the squares etc., are respectively equal to the products of with the second-order partial differential quotients

and the average value of a product such as is the product of with where is regarded as a function of etc.}

29.

[edit]

Let be a given linear function of the quantities etc., i.e.

the value of deduced from the most plausible values of etc., will then be and we denote this by The error thus committed will be

which we denote by The average value of this error will obviously be zero, meaning the error will not contain a constant part, but the average value of i.e., the sum

will, according to the preceding article, be equal to the product of with the sum

i.e., the product of with the value produced by the function when we substitute

If we let denote this value of then the mean error to be feared when we take will be and the weight of this determination will be .

Since we have identically

will be equal to the value of the expression or the value produced by when we substitute for etc. the values corresponding to etc..

Finally, observing that expressed as a function of the quantities etc., will have as its constant part, if we suppose that

then we will have

30.

[edit]

We have seen that the function attains its absolute minimum when we substitute etc. or, equivalently, etc. If we assign another value to one of the unknowns, e.g. while the other unknowns remain variable, may acquire a relative minimum value, which can be obtained from the equations

Therefore, we must have etc., and since

we have

Likewise, we have

and the relative minimum value of will be

Reciprocally, we conclude that if is not to exceed then the value of must necessarily be between the limits and It is important to note that becomes equal to the mean error to be feared in the most plausible value of if we set i.e., if is the mean error of observations whose weights are .

More generally, let us find the smallest value of the function that can correspond to a given value of where denotes, as in the previous article, a linear expression whose most plausible value is . Let us denote by the prescribed value of by According to the theory of maxima and minima, the solution to the problem will be given by the equations

or etc., where denotes an as yet undetermined multiplier. If, as in the previous article, we identically set,

then we will have

or

where has the same meaning as in the previous article.

Since is a homogeneous function of the second degree with respect to the variables etc., its value when etc. will evidently be and thus the minimum value of when will be Reciprocally, if must remain less than a given value the value of will necessarily be between the limits and will be the mean error to be feared in the most plausible value of if represents the mean error of observations whose weights are .

31.

[edit]

When the number of unknowns etc. is quite large, the determination of the numerical values of etc. by ordinary elimination is quite tedious. For this reason we have indicated, in Theoria Motus Corporum Coelestium art. 182, and later developed, in Disquisitione de elementis ellipticis Palladis (Comm. recent. Soc. Gotting Vol. I), a method that simplifies this work as much as possible. Namely, the function must be reduced to the following form:

where the divisors etc., are determined quantities; etc., are linear functions of etc., such that the second does not contain the third contains neither nor the fourth contains neither nor nor and so on, so that the last contains only the last of the unknowns etc.; and finally, the coefficients of etc., in etc., are respectively equal to etc. Then we set etc. and we will easily obtain the values of etc. by solving these equations, starting with the last one. I do not believe it necessary to repeat the algorithm that leads to the transformation of the function .

However, the elimination required to find the weights of these determinations requires even longer calculations. We have shown in the Theoria Motus Corporum Coelestium that the weight of the last unknown, (which appears by itself in is equal to the last term in the series of divisors etc. This is easily found; hence, several calculators, wanting to avoid cumbersome elimination, have had the idea, in the absence of another method, to repeat the indicated transformation by successively considering each unknown as the last one. Therefore, I hope that geometers will appreciate my indication of a new method for calculating the weights of determinations, which seems to leave nothing more to be desired on this point.

Setting

(1)

we have identically

and from this we deduce:

(2)

The values of etc. deduced from these equations will be presented in the following form:

(3)

By taking the complete differential of the equation

we obtain

and thus

This expression must be equivaleny to the one obtained from the equations (3),

and therefore we have

(4)

By substituting in these expressions the values of and etc. obtained from the equations (3), we will have performed the elimination. For the determination of the weights, we have

(5)

The simplicity of these formulas leaves nothing to be desired. Equally simple formulas could be found to express the other coefficients and etc.; however, as their use is less frequent, we will refrain from presenting them.

33.

[edit]

The importance of the subject has prompted us to prepare everything for the calculation and to form explicit expressions for the coefficients etc., etc. etc. This calculation can be approached in two ways. The first involves substituting the values of and so forth, deduced from the equations (3) into the equations (2), and the second involves substituting the values from the equations (2) into the equations (3). The first method leads to the following formulas:

These formulas will determine and so on.

We will then have,

which will determine and so forth; then

which will determine etc., and so on.

The second method yields the following system:

from which we deduce

from which we deduce and

from which we deduce and so on.

Both systems of formulas offer nearly equal advantages when seeking the weights of the determinations of all unknowns and so forth; however, if only one of the quantities and so forth is required, the first system is much preferable.

Moreover, the combination of equations (1) and (4) yields the same formulas, and provides, in addition, a second way to obtain the most plausible values and so forth, which are

The other calculation is identical to the ordinary calculation in which it is assumed etc.

34.

[edit]

The results obtained in art. 32 are only particular cases of a more general theorem which can be stated as follows:

Theorem If represents the following linear function of the unknowns etc.,

whose expression in terms of the variables etc., is

then will be the most plausible value of and the weight of this determination will be

Proof. The first part of the theorem is obvious, since the most plausible value of must correspond to the values etc.

To demonstrate the second part, let's note that we have

and consequently, when

we have

whatever the differentials etc. Hence, assuming always, we obtain

Now it is easily seen that if the differentials etc. are independent of each other, so will be etc., therefore, we will have,

Hence, the value of corresponding to the same assumptions, will be

which, by art. 29, demonstrates the truth of our theorem.

Moreover, if we wish to perform the transformation of the function without resorting to formulas (4) of art. 32, we immediately have the relations

which will allow us to determine etc., and we will finally have

35.

[edit]

We will particularly address the following problem, both because of its practical utility and the simplicity of the solution:

Find the changes that the most plausible values of the unknowns undergo by adding a new equation, and assign the weights of these new determinations.

Let us keep the previous notations. The primitive equations, reduced to have a weight of unity, will be we will have and etc., will be the partial derivatives

Finally, by elimination, we will have

(1)

Now suppose we have a new approximate equation (which we assume to have a weight equal to unity), and we seek the changes undergone by the most plausible values of etc., and of the coefficients etc..

Let us set

and let

be the result of the elimination. Finally, let

which, taking into account the equations (1), becomes

and let

It is clear that will be the most plausible value of the function as resulting from the primitive equations, without considering the value provided by the new observation, and will be the weight of this determination.

Now we have

and consequently,

or

Furthermore,

From this, we deduce,

which will be the most plausible value of deduced from all observations.

We will also have

thus

will be the weight of this determination.

Similarly, for the most plausible value of deduced from all observations, we find

the weight of this determination will be

and so on. Q.E.I.

Let us add some remarks.

I. After substituting the new values etc., the function will obtain the most plausible value

and since we have, identically,

the weight of this determination, according to art. 29, will be

These results could be deduced immediately from the rules explained at the end of art. 21. The original equations had, indeed, provided the determination whose weight was A new observation gives another determination independent of the first, whose weight is and their combination produces the determination with a weight of

II. It follows from the above that, for etc. we must have etc., and consequently,

Furthermore, since

we must have

and

III. Comparing these results with those of the art. 30, we see that here the function has the smallest value it can obtain when subjected to the condition

36.

[edit]

We will give here the solution to the following problem, which is analogous to the previous one, but we will refrain from indicating the demonstration, which can be easily found, as in the previous article.

Find the changes in the most plausible values of the unknowns and the weights of the new determinations when changing the weight of one of the primitive observations.

Suppose that after completing the calculation, it is noticed that the weight which has been assigned to an observation is too strong or too weak, e.g. the first one which gave and that it would be more accurate to assign it the weight instead of the weight It is not necessary to then restart the calculation. Instead it is convenient to form the corrections using the following formulas.

The most plausible values of the unknowns will be corrected as follows:

and the weights of these determinations will be found upon dividing unity by

respectively.

This solution applies in the case where, after completing the calculation, it is necessary to completely reject one of the observations, since this amounts to making ; similarly, will be suitable for the case where the equation which in the calculation had been regarded as approximate, is in fact absolutely precise.

If, after completing the calculation, several new equations were to be added to those proposed, or if the weights assigned to several of them were incorrect, the calculation of the corrections becomes too complicated, and it is preferable to start over.

37.

[edit]

In the arts. 15 and 16, we have given a method to approximate the accuracy of a system of observations; but this method assumes that the real errors encountered in a large number of observations are known exactly; however, this condition is rarely fulfilled, if ever.

If the quantities for which the observation provides approximate values depend on one or more unknowns, according to a given law, then the method of least squares allows us to find the most plausible values of these unknowns. If we then calculate the corresponding values of the observed quantities, they can be regarded as differing little from the true values, so that their differences with the observed values will represent the errors committed, with a certainty that will increase with the number observations. This is the procedure followed in practice by calculators, who have attempted, in complicated cases, to retrospectively evaluate the precision of the observations. Although sufficient in many cases, this method is theoretically inaccurate and can sometimes lead to serious errors; therefore, it is very important to treat the issue with more care.

In the following discussion, we retain the notation used in art. 19. The method in question consists of considering etc., as the true values of the unknowns etc., and etc., as those of the functions etc. If all observations have equal precision and their common weight is taken to be unity, these same quantities, changed in sign, represent, under this assumption, the errors of the observations. Consequently, according to art. 15,

will be the mean error of the observations. If the observations do not have the same precision, then etc., represent the errors of the observations, respectively multiplied by the square roots of the weights, and the rules of art. 16 lead to the same formula,

which already expresses the mean error of these observations, when their weight is . However, it is clear that an exact calculation would require replacing etc. with the values of etc., deduced from the true values of the unknowns etc., and replacing the quantity by the corresponding value of Although we cannot assign this latter value, we are nonetheless certain that it is greater than (which is its minimum possible value), and it would only reach this limit in the infinitely unlikely case where the true values of the unknowns coincide with the most plausible ones. We can therefore affirm, in general, that the mean error calculated by ordinary practice is smaller than the exact mean error, and consequently, that too much precision is attributed to the observations. Now let us see what a rigorous theory yields.

38.

[edit]

First of all, we need to determine how the quantity depends on the true errors of the observations. As in art. 28, Let us denote these errors by etc., and let us set, for simplicity,

and

Let etc., be the true values of the unknowns etc., for which etc., are, respectively, etc. The corresponding values of etc., will obviously be so that we will have

Finally,

will be the value of the function corresponding to the true values of the etc. Since we also have identically

we will also have

From this, it is clear that is a homogeneous function of the second degree of the errors etc.; for various values of the errors this function may become greater or smaller. However, the extent of the errors remains unknown to us, so it is good to carefully examine the function , and to first calculate its average value according to the elementary calculus of probability. We will obtain this average value by replacing the squares etc. with etc., and omitting the terms in etc., whose average value is zero; or equivalently, by replacing each square , by and neglecting . Accordingly, the term will provide ; the term will produce

each of the other terms will also give so that the total average value will be where denotes the number of observations, and denotes the number of unknowns. Due to errors offered by chance, the true value of may be greater or smaller than this average value, but the difference decrease as the number of observations increases, so that

can be regarded as an approximate value of Consequently, the value of provided by the erroneous method we discussed in the previous article, must be increased by the ratio of to

39.

[edit]

To clearly understand the extent to which it is permissible to consider the value of provided by the observations as equal to the exact value, we must seek the mean error to be feared when This mean error is the square root of the average value of the quantity

which we will write as:

and since the average value of the second term is evidently zero, the question reduces to finding the average value of the function

If we denote this average value by then the mean error we seek will be

Expanding the function we see that it is a homogeneous function of the errors etc., or equivalently, of the quantities etc.; therefore, we will find the average value by:

1. Replacing the fourth powers etc., by their average values;

2. Replacing the products etc., by their average values, that is, by etc.;

3. Neglecting products such as etc.. We will assume (see art. 16) that the average values of etc., are proportional to etc., so that the ratios of one to another are where denotes the average value of the fourth powers of the errors for observations whose weight is . Thus the previous rules could also be expressed as follows: Replace each fourth power etc., by each product etc., by and neglect all terms such as or

These principles being understood, it is easy to see that:

I. The average value of is

II. The average value of the product is

because

Similarly, the average value of is

the average value of is

and so on. Thus the average value of the product

or

will be

The products or etc., will have the same average value. Thus the product

will have an average value of

III. To shorten the following developments, we will adopt the following notation. We give the character a more extended meaning than we have done so far, by making it designate the sum of similar but not identical terms arising from all permutations of the observations. According to this notation, we will have

Calculating the average value of term by ter, we first have, for the average value of the product

Similarly, the average value of the product is

and so on. Therefore, the average value of the product

is

Now the average value of is

The average value of is

and so on. Hence, we easily conclude that the average value of the product

is

Thus, for the average value of the product we have

IV. Similarly, for the average value of the product we find

Now, we have

so this average value will be

V. By a similar calculation, we find that the average value of is

and so on. Adding up, we obtain the average value of the product

this value is

VI. Similarly, we find that

is the average value of the product

and

is the average value of the product

and so on.

Hence by addition we find the average value of the square

which is

VII. Finally, from all these preliminaries, we conclude that

Therefore, the mean error to be feared when

will be

40.

[edit]

The quantity

which occurs in the expression above, generally cannot be reduced to a simpler form. However, we can assign two limits between which its value must necessarily lie. First, It is easily deduced from the previous relations that

from which we conclude that

is a positive quantity smaller than unity, or at least not larger. The same will be true for the quantity

which is equal to the sum

Similarly,

will be smaller than unity; and so on. Therefore,

must be smaller than Second, we have

since

from which it is easily deduced that

is greater, or at least not smaller, than Therefore, the term

must necessarily lie between the limits

and

or, between the broader limits

and

Thus, the square of the mean error to be feared for the value

lies between the limits

and

so that a degree of precision as great as desired can be achieved, provided the number of observations is sufficiently large.

It is very remarkable that in hypothesis III of art. 9, on which we had formerly relied to establish the theory of least squares, the second term of the square of the average error completely disappears (since ); and because, to find the approximate value of the average error of the observations, it is always necessary to treat the sum

as if it were equal to the sum of the squares of random errors, it follows that, in this hypothesis, the precision of this determination becomes equal to that which we found, in art. 15, for the determination from true errors.

  1. The exact determination of etc., is conceivable only in the case where, by the nature of the matter, the errors etc. proportional to etc., are considered equally probable, or rather in the case where

  2. We will later explain the reasoning that led us to denote the coefficients of this formula by the notation etc..