In productive labour, especially where payment is based upon the number of standard articles produced in a day, or upon the number of standard operations performed in a given time, the records of actual performance are probably the best measures of success available as a standard against which to judge the reliability of a test. The record for one day or for one week would be less reliable usually than the record for a month or a longer period.
In many business organizations and industries there is no such satisfactory standard of success as individual production records, and in such cases it is necessary to make use of the judgments of foremen, supervisors, or superintendents. These are far less satisfactory records of efficiency and are subject to gross errors and prejudices, but they are the only available measures of many workers. If the rating as to ability is the consensus of the judgments of two or more supervisors, each making his rating without any reference to that made by any other person, the result is much more reliable than the rating of any single supervisor would be.
Very grave errors creep into a rating of efficiency where the ratings are made by different supervisors, each supervisor rating only a few men. Even where a detailed schedule of qualities is listed, each to be given a definite weight or importance in making up the total rating, as in the Army Rating Scale, the degree of ability which one man’s experience leads him to call “Average” will call forth a rating of “Superior” from another equally able supervisor whose experience has been with slightly different people. If individuals A, B, and C are rated by the first supervisor and individuals D, E, and F by the second, it is not at all safe to assume that C is rated fairly in relation to D. Only when two individuals are rated by the same supervisors upon the same scale and under the same conditions is it legitimate or safe to assume that their relative abilities are well indicated by the ratings.
Assuming that the reader has obtained a reliable order of merit for the individuals he is using as a check upon the value of the Mentimeter tests, no test should be considered useful which does not result in approximately this same order of merit. The tests are, of course, so short and so crude that it is not to be expected that any test will, except by chance, show exactly the same order of ability as the production records or supervisor’s ratings furnish, but some tests will show much closer correspondence than others. Those tests which correspond most closely should be employed, while those tests which do not correspond at all should not be employed, regardless of any statement of the authors or any preconceived ideas of the reader as to what tests ought to foretell ability in any particular line of work. The proof of a test or of any method of prognostication lies in the degree to which it actually arranges people in the order of their relative efficiency in the tasks for which one seeks to foretell success.
A mere glance at a record such as that shown below for twenty-eight sixth-grade pupils would show that there was a real relationship between the scholarship marks, the teacher’s estimate of intelligence, and the results of educational measurements taken by an outsider.
| SCORES AND RATINGS OF SIXTH-GRADE CLASS | |||
|---|---|---|---|
| NAME OF PUPIL | EDUCATIONAL MEASUREMENTS SCORE (NO. OF ERRORS) | TEACHER’S RANKING OF INTELLIGENCE (1 IS BRIGHTEST) | SUMMARY OF TEACHER’S MARKS IN SCHOLARSHIP |
| Adelaide | 36. | 19 | 85 |
| Ruth | 16.5 | 15 | 90 |
| Alexander | 25.5 | 7 | 93 |
| LaMonte | 46.5 | 6 | 93 |
| Earl | 76.5 | 18 | 77 |
| Joseph | 20.5 | 20 | 85 |
| Amadeo | 75. | 14 | 85 |
| Leo | 48. | 3 | 93 |
| William | 53.5 | 9 | 82 |
| Isabel | 25. | 21 | 76 |
| Ida | 36.5 | 4 | 94 |
| Hazel | 15. | 10 | 90 |
| Frederick | 65. | 26 | 86 |
| Charles | 58.5 | 13 | 85 |
| Edward | 30. | 1 | 95 |
| Benjamin | 62.5 | 24 | 76 |
| Bruce | 56. | 22 | 87 |
| Alden | 55. | 12 | 87 |
| George | 60.5 | 17 | 87 |
| Alice | 29. | 11 | 88 |
| Almira | 15.5 | 5 | 96 |
| Helen | 16.5 | 2 | 90 |
| Elizabeth | 65.5 | 23 | 75 |
| Amelia | 24.5 | 8 | 92 |
| Edwin | 19. | 16 | 89 |
| Robert | 67. | 28 | 71 |
| Edna | 47. | 27 | 78 |
| Samuel | 72. | 25 | 80 |
The things which are not so evident at a glance are the degrees of relationship between these three types of measures. Is the relation of educational measurements to the teacher’s estimates greater than the relation of the measurements to the marks in scholarship given by the teacher? In order to measure precisely the relative degrees of correspondence between various measures and estimates of the abilities of individuals, it is quite evident that something more accurate and exact than mere inspection is necessary.
For an explanation of the method by which the exact relationship may be worked out mathematically between the results of a test and the true abilities of the individuals tested, the reader is referred to pages [326]–331 in the appendix. The discussion which will be found there of the method of calculating a coefficient of coördination will not be difficult to understand nor will the method be difficult of application for any one who wishes to measure the exact reliability of any of the Mentimeter tests or of any other test. For many purposes such a record as is shown on the preceding page, giving the score of the individual in each test used, will reveal the essential facts regarding the correspondence between test results and demonstrated ability. The reader should be cautious, however, about accepting a conclusion drawn from casual observation of such a table as that shown on the preceding page without checking up the accuracy of this conclusion by actually working out the coefficient of coördination according to the method shown in the appendix.
When the reader has tried out, upon a fairly large group of persons of known ability, the Mentimeter tests which seem to him to promise greatest usefulness, and when he has made his calculations and discovered which tests actually do classify his people most accurately, it will then be possible for him to make an intelligent scientific selection of tests for practical use. Let us suppose, for example, that an employer wishes to have a set of tests whereby he may select intelligent sales-girls. By giving the ten or twelve tests which seem most hopeful for the purpose to fifty or sixty saleswomen, who have been in his employ long enough to demonstrate their relative degrees of ability and intelligence, the five or six tests may be chosen whose results show the closest relation to their demonstrated ability for intelligent salesmanship.
The results obtained by the separate tests chosen should also be compared, for two tests may measure practically the same mental trait and have a very high coördination with each other. In such a case, it would seem almost a useless waste to retain in the group two tests which measured the same phase of ability. The one of the pair which showed the less close relationship to the true ranking might be dropped from the list without much loss to the total effectiveness of the group of tests. A group of tests thus carefully selected would prove very helpful and effective in the selection of untrained material for training or in the classification of experienced employees according to their intellectual qualifications for the type of position held by the people on whom the validity of the tests had been proved.