The principles of science - William Stanley Jevons

It will of course be understood that the probable error has regard only to those causes of errors which in the long run act as much in one direction as another; it takes no account of constant errors. The true result accordingly will often fall far beyond the limits of probable error, owing to some considerable constant error or errors, of the existence of which we are unaware.

Rejection of the Mean Result.

We ought always to bear in mind that the mean of any series of observations is the best, that is, the most probable approximation to the truth, only in the absence of knowledge to the contrary. The selection of the mean rests entirely upon the probability that unknown causes of error will in the long run fall as often in one direction as the opposite, so that in drawing the mean they will balance each other. If we have any reason to suppose that there exists a tendency to error in one direction rather than the other, then to choose the mean would be to ignore that tendency. We may certainly approximate to the length of the circumference of a circle, by taking the mean of the perimeters of inscribed and circumscribed polygons of an equal and large number of sides. The length of the circular line undoubtedly lies between the lengths of the two perimeters, but it does not follow that the mean is the best approximation. It may in fact be shown that the circumference of the circle is very nearly equal to the perimeter of the inscribed polygon, together with one-third part of the difference between the inscribed and circumscribed polygons of the same number of sides. Having this knowledge, we ought of course to act upon it, instead of trusting to probability.

We may often perceive that a series of measurements tends towards an extreme limit rather than towards a mean. In endeavouring to obtain a correct estimate of the apparent diameter of the brightest fixed stars, we find a continuous diminution in estimates as the powers of observation increased. Kepler assigned to Sirius an apparent diameter of 240 seconds; Tycho Brahe made it 126; Gassendi 10 seconds; Galileo, Hevelius, and J. Cassini, 5 or 6 seconds. Halley, Michell, and subsequently Sir W. Herschel came to the conclusion that the brightest stars in the heavens could not have real discs of a second, and were probably much less in diameter. It would of course be absurd to take the mean of quantities which differ more than 240 times; and as the tendency has always been to smaller estimates, there is a considerable presumption in favour of the smallest.‍[291]

In many experiments and measurements we know that there is a preponderating tendency to error in one direction. The readings of a thermometer tend to rise as the age of the instrument increases, and no drawing of means will correct this result. Barometers, on the other hand, are likely to read too low instead of too high, owing to the imperfection of the vacuum and the action of capillary attraction. If the mercury be perfectly pure and no appreciable error be due to the measuring apparatus, the best barometer will be that which gives the highest result. In determining the specific gravity of a solid body the chief danger of error arises from bubbles of air adhering to the body, which would tend to make the specific gravity too small. Much attention must always be given to one-sided errors of this kind, since the multiplication of experiments does not remove the error. In such cases one very careful experiment is better than any number of careless ones.

When we have reasonable grounds for supposing that certain experimental results are liable to grave errors, we should exclude them in drawing a mean. If we want to find the most probable approximation to the velocity of sound in air, it would be absurd to go back to the old experiments which made the velocity from 1200 to 1474 feet per second; for we know that the old observers did not guard against errors arising from wind and other causes. Old chemical experiments are valueless as regards quantitative results. The old chemists found the atmosphere in different places to differ in composition nearly ten per cent., whereas modern accurate experimenters find very slight variations. Any method of measurement which we know to avoid a source of error is far to be preferred to others which trust to probabilities for the elimination of the error. As Flamsteed says,‍[292] “One good instrument is of as much worth as a hundred indifferent ones.” But an instrument is good or bad only in a comparative sense, and no instrument gives invariable and truthful results. Hence we must always ultimately fall back upon probabilities for the selection of the final mean, when other precautions are exhausted.

Legendre, the discoverer of the method of Least Squares, recommended that observations differing very much from the results of his method should be rejected. The subject has been carefully investigated by Professor Pierce, who has proposed a criterion for the rejection of doubtful observations based on the following principle:‍[293]′“—observations should be rejected when the probability of the system of errors obtained by retaining them is less than that of the system of errors obtained by their rejection multiplied by the probability of making so many and no more abnormal observations.” Professor Pierce’s investigation is given nearly in his own words in Professor W. Chauvenet’s “Manual of Spherical and Practical Astronomy,” which contains a full and excellent discussion of the methods of treating numerical observations.‍[294]

Very difficult questions sometimes arise when one or more results of a method of experiment diverge widely from the mean of the rest. Are we or are we not to exclude them in adopting the supposed true mean result of the method? The drawing of a mean result rests, as I have frequently explained, upon the assumption that every error acting in one direction will probably be balanced by other errors acting in an opposite direction. If then we know or can possibly discover any causes of error not agreeing with this assumption, we shall be justified in excluding results which seem to be affected by this cause.

In reducing large series of astronomical observations, it is not uncommon to meet with numbers differing from others by a whole degree or half a degree, or some considerable integral quantity. These are errors which could hardly arise in the act of observation or in instrumental irregularity; but they might readily be accounted for by misreading of figures or mistaking of division marks. It would be absurd to trust to chance that such mistakes would balance each other in the long run, and it is therefore better to correct arbitrarily the supposed mistake, or better still, if new observations can be made, to strike out the divergent numbers altogether. When results come sometimes too great or too small in a regular manner, we should suspect that some part of the instrument slips through a definite space, or that a definite cause of error enters at times, and not at others. We should then make it a point of prime importance to discover the exact nature and amount of such an error, and either prevent its occurrence for the future or else introduce a corresponding correction. In many researches the whole difficulty will consist in this detection and avoidance of sources of error. Professor Roscoe found that the presence of phosphorus caused serious and almost unavoidable errors in the determination of the atomic weight of vanadium.‍[295] Herschel, in reducing his observations of double stars at the Cape of Good Hope, was perplexed by an unaccountable difference of the angles of position as measured by the seven-feet equatorial and the twenty-feet reflector telescopes, and after a careful investigation was obliged to be contented with introducing a correction experimentally determined.‍[296]

When observations are sufficiently numerous it seems desirable to project the apparent errors into a curve, and then to observe whether this curve exhibits the symmetrical and characteristic form of the curve of error. If so, it may be inferred that the errors arise from many minute independent sources, and probably compensate each other in the mean result. Any considerable irregularity will indicate the existence of one-sided or large causes of error, which should be made the subject of investigation.