The sciences tend, in general, to become more and more quantitative. All phenomena "exist in space and involve molecular movements, measurable in velocity and extent." The ideal of all sciences is thus to reduce all phenomena to measurements of mass and motion. This ideal is obviously far from being attained. Especially in the social sciences are quantitative measurements difficult, and in these sciences we must remain therefore at best in the region of shrewd guesses or fairly reliable probability.
Statistics and probability. While in the social sciences, exact quantitative measurements are difficult, they are to an extent possible, and to the extent that they are possible we can arrive at fairly accurate generalizations as to the probable occurrence of phenomena. There are many phenomena where the elements are so complex that they cannot be analyzed and invariable causal relations established.
In a study of the phenomena of the weather, for example, the phenomena are so exceedingly complex that anything approaching a complete statement of their elements is quite out of the question. The fallibility of most popular generalizations in these fields is evidence of the difficulty of dealing with such facts. Must we be content then simply to guess at such phenomena? ... In instances of this sort, another method ... becomes important: The Method of Statistics. In statistics we have an exact enumeration of cases. If a small number of cases does not enable us to detect the causal relations of a phenomenon, it sometimes happens that a large number, accurately counted, and taken from a field widely extended in time and space, will lead to a solution of the problem.[1]
[Footnote 1: Jones; Logic, Inductive and Deductive, p. 190.]
If we find, in a wide variety of instances, two phenomena occurring in a certain constant correlation, we infer a causal relation. If the variations in the frequency of one correspond to variations in the frequency of the other, there is probability of more than connection by coincidence.
The correlation between phenomena may be measured mathematically; it is possible to express in figures the exact relations between the occurrence of one phenomenon and the occurrence of another. The number which expresses this relation is called the coefficient of correlation. This coefficient expresses relationship in terms of the mean values of the two series of phenomena by measuring the amount each individual phenomenon varies from its respective mean. Suppose, for example, that in correlating crime and unemployment, the coefficient of correlation were found to be .47. If in every case of unemployment crime were found and in every case of crime, unemployment, the coefficient of correlation would be +1. If crime were never found in unemployment, and unemployment never in crime, the coefficient of correlation would be -1, indicating a perfect inverse relationship. A coefficient of 0 would indicate that there is no relationship. The coefficient of .47 would accordingly indicate a significant but not a "high" correlation between crime and unemployment.
We cannot consider here all the details of statistical methods, but attention may be called to a few of the more significant features of the process. Statistics is a science, and consists in much more than the mere counting of cases.
With the collection of statistical data, only the first step has been taken. The statistics in that condition are only raw material showing nothing. They are not an instrument of investigation any more than a kiln of bricks is a monument of architecture. They need to be arranged, classified, tabulated, and brought into connection with other statistics by the statistician. Then only do they become an instrument of investigation, just as a tool is nothing more than a mass of wood or metal, except in the hands of a skilled workman.[1]
[Footnote 1: Mayo-Smith: Statistics and Sociology, p. 18.]
The essential steps in a statistical investigation are: (1) the collection of material, (2) its tabulation, (3) the summary, and (4) a critical examination of the results. The terms are almost self-explanatory. There are, however, several general points of method to be noted.