§ 1. In the last chapter we were occupied with the Average mainly under its qualitative rather than its quantitative aspect. That is, we discussed its general nature, its principal varieties, and the main uses to which it could be put in ordinary life or in reasoning processes which did not claim to be very exact. It is now time to enter more minutely into the specific question of the employment of the average in the way peculiarly appropriate to Probability. That is, we must be supposed to have a certain number of measurements,—in the widest sense of that term,—placed before us, and to be prepared to answer such questions as; Why do we take their average? With what degree of confidence? Must we in all cases take the average, and, if so, one always of the same kind?
The subject upon which we are thus entering is one which, under its most general theoretic treatment, has perhaps given rise to more profound investigation, to a greater variety of opinion, and in consequence to a more extensive history and literature, than any other single problem within the range of mathematics.[1] But, in spite of this, the main logical principles underlying the methods and processes in question are not, I apprehend, particularly difficult to grasp: though, owing to the extremely technical style of treatment adopted even in comparatively elementary discussions of the subject, it is far from easy for those who have but a moderate command of mathematical resources to disentangle these principles from the symbols in which they are clothed. The present chapter contains an attempt to remove these difficulties, so far as a general comprehension of the subject is concerned. As the treatment thus adopted involves a considerable number of subdivisions, the reader will probably find it convenient to refer back occasionally to the table of contents at the commencement of this volume.
§ 2. The subject, in the form in which we shall discuss it, will be narrowed to the consideration of the average, on account of the comparative simplicity and very wide prevalence of this aspect of the problem. The problem is however very commonly referred to, even in non-mathematical treatises, as the Rule or Method of Least Squares; the fact being that, in such cases as we shall be concerned with, the Rule of Least Squares resolves itself into the simpler and more familiar process of taking the arithmetical average. A very simple example,—one given by Herschel,—will explain the general nature of the task under a slightly wider treatment, and will serve to justify the familiar designation.
Suppose that a man had been firing for some time with a pistol at a small mark, say a wafer on a wall. We may take it for granted that the shot-marks would tend to group themselves about the wafer as a centre, with a density varying in some way inversely with the distance from the centre. But now suppose that the wafer which marked the centre was removed, so that we could see nothing but the surface of the wall spotted with the shot-marks; and that we were asked to guess the position of the wafer. Had there been only one shot, common sense would suggest our assuming (of course very precariously) that this marked the real centre. Had there been two, common sense would suggest our taking the mid-point between them. But if three or more were involved, common sense would be at a loss. It would feel that some intermediate point ought to be selected, but would not see its way to a more precise determination, because its familiar reliance,—the arithmetical average,—does not seem at hand here. The rule in question tells us how to proceed. It directs us to select that point which will render the sum of the squares of all the distances of the various shot-marks from it the least possible.[2]
This is merely by way of illustration, and to justify the familiar designation of the rule. The sort of cases with which we shall be exclusively occupied are those comparatively simple ones in which only linear magnitude, or some quality which can be adequately represented by linear magnitude, is the object under consideration. In respect of these the Rule of Least Squares reduces itself to the process of taking the average, in the most familiar sense of that term, viz.
the arithmetical mean; and a single Law of Error, or its graphical equivalent, a Curve of Facility, will suffice accurately to indicate the comparative frequency of the different amounts of the one variable magnitude involved.
§ 3. We may conveniently here again call attention to a misconception or confusion which has been already noticed in a former chapter. It is that of confounding the Law of Error with the Method of Least Squares. These are things of an entirely distinct kind. The former is of the nature of a physical fact, and its production is one which in many cases is entirely beyond our control. The latter,—or any simplified application of it, such as the arithmetical average,—is no law whatever in the physical sense. It is rather a precept or rule for our guidance. The Law states, in any given case, how the errors tend to occur in respect of their magnitude and frequency. The Method directs us how to treat these errors when any number of them are presented to us. No doubt there is a relation between the two, as will be pointed out in the course of the following pages; but there is nothing really to prevent us from using the same method for different laws of error, or different methods for the same law. In so doing, the question of distinct right and wrong would seldom be involved, but rather one of more or less propriety.
§ 4. The reader must understand,—as was implied in the illustration about the pistol shots,—that the ultimate problem before us is an inverse one. That is, we are supposed to have a moderate number of ‘errors’ before us and we are to undertake to say whereabouts is the centre from which they diverge. This resembles the determination of a cause from the observation of an effect. But, as mostly happens in inverse problems, we must commence with the consideration of the direct problem. In other words, so far as concerns the case before us, we shall have to begin by supposing that the ultimate object of our aim,—that is, the true centre of our curve of frequency,—is already known to us: in which case all that remains to be done is to study the consequences of taking averages of the magnitudes which constitute the errors.
§ 5. We shall, for the present, confine our remarks to what must be regarded as the typical case where considerations of Probability are concerned; viz.
that in which the law of arrangement or development is of the Binomial kind. The nature of this law was explained in Chap. II., where it was shown that the frequency of the respective numbers of occurrences was regulated in accordance with the magnitude of the successive terms of the expansion of the binomial (1 + 1)n. It was also pointed out that when n becomes very great, that is, when the number of influencing circumstances is very large, and their relative individual influence correspondingly small, the form assumed by a curve drawn through the summits of ordinates representing these successive terms of the binomial tends towards that assigned by the equation