THEORY AND PRINCIPLE OF PSYCHOLOGICAL TESTS AS APPLIED TO VOCATIONAL ANALYSIS
The more general questions of the theory of tests, their selection, evaluation, and technique of application and record, need not be considered here. The reader unfamiliar with these matters will find them fully treated in the various standard manuals of tests, and in numerous special articles and monographs referred to in the bibliography.
There are, however, certain particular aspects of the theory and use of mental tests which have special importance for vocational psychology. These are:
1. The question of the degree to which proficiency in one respect or ability or test implies proficiency in others.
2. The degree to which these intercorrelations are revealed by preliminary trials and modified by continued practice.
3. The question of the significance of preliminary trials in revealing the relative abilities of individuals as these would be shown after all the individuals had acquired their maximum skill or practice level of proficiency; that is, the relation between momentary capacity and ultimate achievement.
Attempts to intercorrelate mental or motor abilities as measured by laboratory tests have usually produced more or less irregular results. Some of the coefficients have been positive, some negative, but in only a few cases have many of them been large when the individuals tested have been chosen at random or with no deliberate intention of measuring only the extremes of the curve of distribution. Thus in a recent report of the correlations of abilities among several hundred adult individuals it is remarked that a certain test for logical memory is "one of the very best tests," partly because of "its high correlation with other tests" (an average correlation of .29).
Two reasons are largely responsible for these low coefficients. The first is the fact that the measures correlated have usually been initial trials, or at most averages of a very few trials. This means great individual variability and considerable consequent unreliability of the data. A more important factor, perhaps, is the fact that these preliminary trials do not necessarily represent the final capacities of the individuals. They are determined by a host of incidental or accidental influences and reveal only momentary ability, not ultimate capacity. There is every reason for expecting to find positive correlation of "desirable" traits, and we may well expect to find this increasingly true the more our measures test the final limits of capacity in the various tests. In other words, the only real correction for unreliable measures is to be made by continuing the test until the individual has reached the limit of practice in it.
Only occasional attempts have been made to determine the influence of practice on the correlation of abilities, and those that have been reported have been based on so few practice trials that no review of them need be given. In the present chapter I shall present the results of an experiment in which a group of observers were repeatedly tested until in each test a practice limit was approximated, a limit which, in most cases, one hundred further trials failed to improve. The results have a real interest for vocational psychology.
The experiment consisted in putting each of thirteen individuals through 205 repetitions of seven different mental tests. The trials were controlled as thoroughly as possible with respect to such factors as interim occupation, exercise, food, rotation of tests, temperature, illumination, and incentive and interest. The subjects, four women and nine men, ranging from eighteen to thirty-nine years in age, were mature, zealous, and faithful. Competition was stimulated by the award of desirable prizes, and each worker received a daily wage. Records were announced to the subjects only after each thirty-five trials. So far as previous practice in these particular tests is concerned, all the subjects were naïve. Five trials were made daily, these trials being distributed through the day at about two-hour intervals. The tests themselves occupied about forty minutes at each sitting.
The tests used were the following familiar laboratory forms:
1. Adding. Adding seventeen mentally to each of fifty two-place numbers and reciting aloud the correct answer. Order of numbers random at each trial. Record with stop watch, time required for perfect score.
2. Naming Opposites. Correctly naming opposites of each of fifty adjectives which occurred each time in random order. Record, time required for a perfect score.
3. Color Naming. The Columbia laboratory form of this test, with ten repetitions of each of twelve colors. Position of card changed at each trial. Record, time required for perfect score.
4. Discrimination Reaction. Discriminating between red and blue, and reacting correctly with appropriate hand. Record, average time, in sigma, and number of false reactions.
5. Cancellation. Crossing out digits from the Woodworth-Wells form of this test. Record, time required for 75 correct cancellations of equally difficult digits.
6. Coördination. The familiar three-hole test, for accuracy of aim. Record, time required for one hundred correct strokes.
7. Tapping. Executing four hundred taps at maximal speed, with hand stylus, right hand, elbow support. Record, time required.
Each test has been correlated[16] with all the remaining tests at various points in the curve of practice. Correlations were made at each of the following points:
| 1. | Preliminary trial | designated | 1st trial |
| 2. | Median of first 5 trials | designated | 5th trial |
| 3. | Median of trials 20 to 25 | designated | 25th trial |
| 4. | Median of trials 75 to 80 | designated | 80th trial |
| 5. | Median of trials 200 to 205 | designated | 205th trial |
At each of these points the thirteen individuals were arranged in an order of relative ability for each of the tests, and these orders were correlated with each other. Table 23 gives, for each test, at each point, the average correlation with all the other tests, and also the grand average correlations of all tests.
TABLE 23
Showing the Average Correlation of Each Test with All Others, at Various Points in the Curve of Practice
| Trial | Adding | Opposites | Color Naming | Discrimination | Coördination | Tapping | Final Average |
| 1 | .19 | .10 | .15 | -.07 | -.15 | .17 | .065 |
| 5 | .41 | .26 | .15 | .35 | .21 | .32 | .280 |
| 25 | .50 | .35 | .43 | .27 | .03 | .35 | .320 |
| 80 | .55 | .43 | .53 | .31 | .18 | .34 | .390 |
| 205 | .48 | .62 | .61 | .35 | .34 | .52 | .490 |
Except in the case of discrimination the effect of practice is to increase to a marked degree the intercorrelations of the various tests. Adding increases steadily up to the eightieth trial. Opposites and color naming gain even more steadily to the very end of the experiment, the increase in the coefficients being four to six fold. Tapping increases more slowly but no less certainly. In coördination the increase is very irregular, but the coefficients show, on the whole, a change from -.15 at the first trial to .34 at the finish. Only in the case of discrimination is there failure to increase after the fifth trial. In no case, after the preliminary trial, is there a negative coefficient among the average correlations, and indeed in only one case is there a coefficient smaller than .15. The final averages show steady increase from .065 at the preliminary to .28 at the fifth, .32 at the twenty-fifth, .39 at the eightieth, and .49 at the two-hundred-and-fifth trials. With practice, then, the average correlations of all tests become positive, and the coefficients become greater the longer the practice is continued.
In producing this increase in the intercorrelation of specific abilities through the medium of practice, at least three different factors probably coöperate. These factors have not an equal significance for vocational psychology and its interests in tests.
One of the least important of these factors is the variability of individual performance. In the beginning of the experiment each individual is more variable than at later points in the curve. This momentary variability need not be supposed to affect all the tests in the same way nor all individuals in the same direction. This fact may then tend somewhat to reduce the correlation of the preliminary trials and may in some cases materially affect the first five or ten trials. Beyond the twenty-fifth trial the variability in these tests is much reduced, and particularly so in the measures here used, which are in all cases, after the preliminary trial, the medians of five successive trials.
Another factor that deserves mention is the possibility of change in the character of the tests themselves, through practice with them. It is quite probable, for example, that the opposites test comes, after many repetitions, to resemble more and more that type of process or function involved in color-naming. The responses become more and more intimately associated with the stimulus words, the suggested responses to each word become more and more limited in number and in most cases reduced to a single word for each stimulus. This state of affairs is true of color-naming at the very beginning of the experiment. As the order of the stimulus words is changed at each trial, the test may come to involve more and more the simple task of giving merely the quickest possible association of the right response, and the overcoming of inhibitions and interferences of a more or less general sort, with less and less emphasis on the element of selection. Much the same may also be true of the addition test. It is in these three tests that the increase in correlation is most marked, and the actual coefficients highest at the end of the experiment. Careful analysis of what takes place as one improves in these simple tests would no doubt yield interesting material.
But these two factors—decrease in variability and change in the character of the tests—seem to be far from sufficient to account for the results. The tapping test remains much the same type of process throughout, the only apparent modifications consisting of slight changes in method and perhaps some gradual changes in the muscles. There is certainly no reason for suspecting that tapping and opposites or tapping and discrimination become, as tests, more alike because of frequent repetition. But the increase in correlation is clear in both these cases. Again, it is well established that the discrimination reaction, in the form here used, also tends to become reflex through practice, the conscious discrimination coming only after the correct reaction is made. These experiments called for between 3,075 and 4,100 single discrimination reactions on the part of each observer, which would afford ample time for such a change to show itself. Mere change in the character of the test would then lead us to expect color-naming, opposites, and adding to come more and more to resemble discrimination reaction. But they do not, if the coefficients may be taken as evidence. The coefficients of these tests with discrimination show no tendency to increase, even by the end of the experiment. The assumption of increasing similarity in the character of these pairs of tests would seem gratuitous. Moreover, if there were such increase in similarity, and this be also supposed to account for the higher correlation of color-naming and opposites with adding, coördination and adding should show the same increase in correlation. Just the reverse is actually the case, the correlation of coördination and adding decreasing consistently.
Some further factor must then be responsible for the general increase in correlation, aside from decrease in variability (which affects only the first few trials) and progressive qualitative approximation of the tests (which is seen to be inadequate). The doctrine of "general ability" or "general intelligence" at once suggests itself in this connection. If there is such a thing as "general ability" or "general intelligence," we should expect all samplings of that ability to correlate more and more as the measures came to be truer samples. We might indeed expect to find evidences of this general ability only when measuring the "ultimate capacity" of the individuals concerned. The momentary ability revealed in initial trials, or even in the first half-dozen trials, in a given set of tests might well be expected to show only low degrees of correlation. These trials would not be measures of ultimate capacity, but would be largely determined by previous practice, chance variability, momentary attitude and initial method of attack. They would, in short, be samplings only of momentary ability, not of final capacity.
Or if the assumption of a common factor be rejected, the present evidence tends strongly to support our earlier conclusion concerning the positive correlation between desirable mental functions. Some form of the doctrine of "general ability," at any rate, seems to be supported. But the conclusion seems to call for the qualification that "general ability" shall have reference to final capacity rather than to momentary performance, if the correlations are to be high. If each individual be given the opportunity to attain his limit of efficiency, his highest level of performance, then, when these final limits are reached, individuals who excel their fellows in one type of work will also tend to excel in other types of work.
The theory and practice of tests has in the past been too content to rest its claims on the meager results of a few preliminary samplings of an individual's ability. The fact that, even when a great variety of such samplings of a given individual are aggregated and balanced off against one another, few results of real diagnostic value are achieved should be sufficient warning against this tendency. My conviction is that for this purpose we shall find it necessary to determine the individual's "limit of practice" in the various tests before we shall secure diagnostic results which will be verified by the individual's subsequent achievement in daily life. We should know much more than we now know concerning the tendency and meaning of such correlations as show close relation between initial performance and ultimate capacity. This is particularly true if we wish to extend the method of tests beyond educational diagnosis and to use them as a means of vocational guidance or of industrial selection. For educational diagnosis we wish primarily to know what kind of practice the individual most needs. For vocational and industrial purposes we need rather to know what limits the individual can eventually reach, in given kinds of performance, as the result of practice, and to what degree his present equipment of incentive renders probable the actual achievement of this limit.
On the question of the significance of preliminary trials and the effects of practice on the relative standing of individuals in their group, there are important facts to be considered. In the direct application of mental tests it has too often been assumed that the actual performance of an individual, in one or a dozen trials at a given task, is in some way or other significant of that individual's final capacity in such work. It is true that several investigators have studied the effects of practice on individual differences. These workers were interested above all in questions as to relative rate of improvement, or amount or permanence of gain. Such studies have produced suggestive results, although they have been based, for the most part, on records of only a few subjects or on relatively few practice trials.
To what degree are individual differences after a given number of trials indicative of the final maximum capacity of the individuals concerned? At what various rates do the determining factors enter into the practice curves of a group of workers? What manner and amount of displacement in their relative order of ability are thus produced? At what point or points in the curves do the individuals assume their final order of relative capacity after training? How do the replies to these questions vary with the character of the task?
In the case of the experiments already described, record has been here taken of the following points in the curves of practice:
| Preliminary trial | called | initial trial |
| Median of trials 1 to 5 | called | 5th trial |
| Median of trials 20 to 25 | called | 25th trial |
| Median of trials 46 to 50 | called | 50th trial |
| Median of trials 76 to 80 | called | 80th trial |
| Median of trials 126 to 130 | called | 130th trial |
| Median of trials 171 to 175 | called | 175th trial |
At each of these points the thirteen subjects were arranged in order of relative ability for the test at the given stage of practice. Each of these orders, or cross sections, of the group of practice curves was then correlated with the final order of position as shown in trials one hundred and seventy to one hundred and seventy-five. Table 24 gives the coefficients of correlation derived in this way. A careful study of this table will prove instructive.
TABLE 24
Showing the Correlation of Ultimate Capacity with Capacity at Different Points in the Curve of Learning
(See Text for Explanation)
| The Test | Preliminary | 5th Trial | 25th Trial | 50th Trial | 80th Trial | 130th Trial | Final Trial 175th |
| Adding | .15 | .19 | .87 | .87 | .97 | .96 | 1.00 |
| Opposites | -.08 | .62 | .49 | .83 | .94 | .98 | 1.00 |
| Color Naming | .68 | .89 | .86 | .91 | .97 | .97 | 1.00 |
| Discrimination | .68 | .62 | .60 | .50 | .50 | .79 | 1.00 |
| Cancellation | .67 | .68 | .88 | .69 | .93 | (1.00) | — |
| Coördination | .52 | .79 | .77 | .90 | .95 | (1.00) | — |
| Tapping | .23 | .48 | .63 | .68 | .69 | .89 | 1.00 |
| Averages | .41 | .61 | .73 | .77 | .85 | .92 | 1.00 |
It is at once evident that the preliminary trial is by no means always a measure of the final relative capacities of the individuals tested. The average of all seven coefficients increases from .41 at the preliminary trial to .92 at the one hundred and thirtieth trial. As the trials proceed then, the relative positions of the thirteen individuals become more and more definitely fixed, but in the beginning the indication is obscure. The rate of this process, however, varies with the test, and to a considerable degree. Adding shows changes in position which effect a correlation of .87 only after the twenty-fifth trial. Beyond this point there is little change, the eightieth and one hundred and thirtieth trials correlating equally well, and practically perfectly, with the final order. After twenty-five trials, then, the final capacities of the individuals in the adding test may be said to be indicated fairly accurately. Opposites, in the fiftieth trial, yields a coefficient equal to that of addition in the twenty-fifth trial, and by the eightieth trial the correlation may be said to be complete. Only after fifty trials, then, can the test be said to yield comparative measures which reflect the individual's final capacity in this form of controlled association. In the case of tapping it is only at the one hundred and thirtieth trial that the correlation with final position exceeds .69.
These results may be easily comprehended by thinking of each test (as for instance the tapping test) as a prolonged race, consisting of a large number of heats (205 separate trials). All individuals begin with a running start, their respective initial speeds depending on the momentum they have acquired through a certain amount of previous practice, and on such momentary ability and zeal as they possess at the time. But as the succeeding "heats" or trials occur some individuals who were originally in the lead begin to lose ground in relation to others who, though initially slower, are now speeding up and overtaking the leaders. Still others may retain their original relative positions to the end of the race. In the table of coefficients, a correlation of 1.00 indicates that at that point the ultimate relative positions of the contestants have at last become established. The nearer the figure approaches zero the more uncertain are the relative positions at the particular trial. To terminate the race at a point where the correlation is low and to reward the contestants according to the position they had reached at that point would be manifestly unfair to those who were still speeding up and partial to those who were losing ground.
Color-naming, discrimination, cancellation, and coördination show up to much greater advantage. Even the preliminary trials in these tests show fairly high correlations with the final orders. The first two of these show little change as practice proceeds. In the case of the latter two tests, although the initial correlations are fairly high, there is nevertheless considerable increase as the trials proceed.
The meaning of these results seems to be that before one attempts to interpret individual differences as disclosed by performance in such a series of simple tests, he should have clearly in mind the distinction between temporary proficiency and ultimate capacity. If he is interested, for example, in determining the vocational prospects of a youth, or the relative merits of candidates or culprits, it is important that he realize that relative abilities in many of these laboratory tests may be changed quite beyond recognition by continued work. It is highly desirable to know more than we now know concerning the degree to which initial and intermediate trials in these tests reflect final capacity. In the past the question seems hardly to have been asked. Individual differences in early trials, in some tests, are fairly significant of the working level to which the performer may be brought later. In other tests this is not the case. On the significance of these early trials may depend, in many cases, the vocational value of the particular test.
Changes in the nature of the tests, variations of methods of attack, and specific improvement in the directness, independence and rapidity of the special nervous connections concerned—these three factors would all declare themselves in the form of "changes in ability." A useful piece of work in the case of all tests will be the analysis of the nature of the changes resulting from practice. But in any case the presence of these changes in correlation shows that we are not, in early trials, measuring the same tendency or capacity in all performers. The concrete tasks of daily life doubtless show just such qualitative changes, during practice, as we may suppose to be present in some of these tests. Just as it is ultimate capacity in daily life that is, with a given set of incentives, most important, so in the laboratory the measurement of "ability after practice" ought to be more emphasized than it is at present.
If it is true that with practice all tests correlate with one another, so that an individual who is good in one type of work is also, when his practice level has been reached, good in other types of work, the task of vocational psychology is at once enormously simplified. In place of further search for special occupational tests adapted in some peculiar way to particular types of work, our task is rather that of extending the general intelligence scales until they represent higher and higher degrees of general ability.
It is quite probable that further advance in this direction will come, not from the elaboration or invention of more tests, but by the selection of a very few tests, and the examination of the final limits of practice with respect to them. The problem will then be the selection of sets of tests in which initial performance shows high correlation with ultimate capacity in the tests themselves, or else the laborious and undramatic, but perhaps preferable, alternative of continuing every test until the practice limit is reached by the individual. In the latter case it would be well to learn more about the nature and range of these limits than we know at present.
In so far as particular tasks are actually found to call for highly specialized aptitudes, for the detection of which tests are sought, there will be the further problem of correlating these various tests with the particular aptnesses or fitnesses toward the detection of which diagnosis is directed.
There will also be the problem of the alignment of the various types of work along the general intelligence scales, as rapidly as these are extended and elaborated. In so far as this method is followed, the task of selecting from candidates those best fitted for the accomplishment of special types of work will be easily handled. Vocational selection will readily find methods suited to its purposes. But vocational guidance, as distinguished from vocational selection, must for some time to come depend largely on the determination of interests, incentives, satisfactions, emotional values and preferences, and the discovery and direction of these through general channels of information and through the methods of industrial and pre-vocational education.
This is a hard and an arduous program. It calls for strenuous work on the part of investigators, patience and faithfulness on the part of observers, and wide coöperation of investigators with each other. From the immediately practical point of view it also offers an inviting opportunity to those foundations and individuals who are interested in supporting the further development of "the arts of social control over human nature."