THE DEVELOPMENT OF PSYCHOLOGICAL TESTS

ORIGIN AND HISTORY OF TESTS

Barren as phrenology and physiognomics were of formulable and useful results, they nevertheless served the purpose of directing attention toward the study of individual differences in mental characteristics as a distinct branch of inquiry. The next step consisted in the semi-experimental plan of observing the individual's behavior under a variety of uncontrolled circumstances or on more carefully planned occasions, in the endeavor to secure more or less exact quantitative expressions of the degree to which he displayed certain types of ability. Underlying the various abilities and involved in them there were assumed to lie a limited number of faculties or powers of the mind. Each individual was conceived to possess much the same faculties, but in varying degrees or amounts or forms. Attention, memory, apperception, reasoning, will, feeling, etc., were the fundamental "faculties"; and differences in character were thought of as depending upon the varying amounts and interrelations of these fundamental faculties. In the endeavor to discover types of experiment which would measure these "faculties" it was found, in time, that a given "faculty" did not appear, on close examination, to be as unitary as it was formerly supposed to be. It was seen that to have a good memory for one kind of material did not at once signify a good memory for every sort of thing. Determination in one direction did not imply the general quality of resoluteness. It began to be realized that attention, memory, discrimination, and the other "faculties" are very much more highly specialized than these general names indicate. The unitary soul had early been split up into the list of "faculties" or categories, and now these in turn came each to be split up into finer and finer aptitudes and tendencies, until, in the radical reaction of recent years, we find the human mind described as made up of an infinite number of independent connections or bonds between more or less specific stimulus and more or less definite response. The old "faculties" came now to be looked on as descriptive terms for certain rather general and abstracted characteristics of these multitudinous and detailed reaction tendencies, rather than as in themselves agents or powers or forces, as they were formerly conceived.

During this change in theoretical description and continuing into our present era of compromise and revision, methods were developed of measuring the amount and quality, or, more simply conceived, the speed, strength and regularity of mental and motor ability. Beginning in the form of experiments on sensory discrimination, reaction time and imagery type, and combined with physiological measurements of motor strength, rapidity and fatigue, these experiments developed, in certain hands, into what are now known as "mental tests." Since the principle and method of mental and physical tests is the chief characteristic of the present status of vocational psychology, and since the work of the immediate future seems destined to develop mainly in this same direction, we may profitably consider at this point the history and development of the mental test. We may later take up the general principle and theory of the test as an instrument of psychological analysis and diagnosis, with special reference to the requirements and implications of such tests as may be of service in vocational psychology. We shall then be in position to review the special vocational tests that have as yet been proposed, to evaluate their outstanding results, and to point to some of the more immediate prospects and problems under consideration by those interested in the application of psychological tests in vocational analysis and guidance.

We may begin with an account of the first definite attempt to explore systematically the personality of individuals by the method of tests. The "Columbia Freshman Tests" are of especial interest in the history of vocational psychology, since in their formulation and plan explicit thought was given to the practical use to which the results of tests might be put by the individuals examined, and by the statistical study of the results by students of the subject. In 1894, under the guidance of Professor Cattell, there was instituted the plan of testing the students of Columbia College during their first and fourth academic years. A description of the tests employed was published by Cattell and Farrand in 1896, and a statistical study of results was published by Wissler in 1901.

The motive back of these tests is well expressed in the following paragraph which was also used as material for a test of logical memory:

"Tests such as we are now making are of value both for the advancement of science and for the information of the student who is tested. It is of importance for science to learn how people differ and on what factors these differences depend. If we can disentangle the complex influences of heredity and environment we may be able to apply our knowledge to guide human development. Then it is well for each of us to know in what way he differs from others. We may thus in some cases correct defects and develop aptitudes which we might otherwise neglect."

The nature of these Columbia tests and the method of recording and reporting them are indicated in the forms which were printed and used for this special purpose. (Samples of these are given in the Appendix.) They are given here not so much for the sake of the enumeration of the tests, since many of these are no longer in common use, but because of their historic interests for vocational psychology and because of the general plan outlined in them. In general this plan is that of accumulating measurements of a large number of individuals and thus showing each one how he compares with the normal or average, or where he stands in the general curve of distribution of the members of the group. These tests were applied to the same individuals on their entrance to and their graduation from college, in order to indicate changes that might have been made during the intervening period.

Especially interesting also are other blanks containing additional data, such as age, health, physical characteristics, physiognomic features, enumeration of stigmata, etc. In addition to the tests and measurements, the examiner, both before and after the interview, recorded his general impression of the individual, in the terms indicated on the blank form. We shall have occasion to refer to these judgments of general impression in more detail when we come to consider the use of the interview and the testimonial in vocational psychology. Account was also taken of the gymnasium records of the student, as to nationality, birth, parentage, habits, health, etc.

The Columbia tests may be thought of as representative of several similar projects developed in this country and in Germany, France and England by many workers. The names of Galton, Cattell, Kraepelin, Binet, Henri, and Jastrow stand out conspicuously in the early history of mental tests. The first step was thus the invention, description and trial of a great number of miscellaneous tests, with little analysis of the tests themselves, the nature of the functions tested by them, or their relation to each other. Aside from the strictly motor and physical tests those devised were mainly of so-called intellectual character: measurements of speed and accuracy with which certain definite tasks could be accomplished. They were, moreover, very simple in character, not necessarily related to the work of daily life, with only a single or but a few trials made on each individual. Tests of affective and volitional factors were slower in developing. Little account was taken of interests, instinctive and emotional characteristics, attitudes, adaptation, methods of attack, limits of ability after practice, or many other aspects of individuality which later work has shown to be important.

The next step in the development of tests consisted in the coöperative effort to standardize the nature and methods, the conditions and mode of record. Many hands had part in this process, until in recent years, through publication, comparison and discussion of the subject, fairly uniform principles of technique, record, and treatment of measures have been agreed upon. This made possible the comparison of results secured by different investigators, and facilitated the statistical treatment of the data, so that later work might profit by what had already been tried or accomplished by earlier workers. After many years of this sort of coöperative work, another series of studies was inaugurated to attempt what has come to be known as "testing the tests." These studies proceeded by examining into the degree to which the various tests correlate with each other, with other indications of the individual's ability, with age, sex, health, education, school standing, special training, etc. Such questions as the following will suggest the problems involved in "testing the tests."

1. Which of the various tests correlate with each other?

2. What correlation exists between mental and motor abilities?

3. Do the tests measure fundamental qualities or general powers of the individual, or specialized capacities, or perhaps mainly the effect of general or special training?

4. If they measure general qualities, which of the existing tests are the best for this purpose?

5. How many trials are needed to afford a reliable index of the individual's ability?

6. What are the principal incidental factors that influence the result of tests?

7. Which tests are most easily influenced or disturbed by extraneous factors?

8. Can tests of the simpler laboratory type be used to indicate the individual's ability as shown in his daily work and play?

9. How simple or complex should the various tests be in order to give the best results?

10. How many tests, and which, are required to give a fairly correct picture of the individual's psychological make-up?

11. To what degree do preliminary trials indicate the final capacity of an individual?

12. Does the intercorrelation of tests change in any way with practice, repetition, and familiarity with the material?

13. Just what mental functions may the particular tests be said to measure?

14. How important are these functions in practical, educational and vocational life?

15. By what amounts and in what various ways do individuals differ among themselves in such abilities as the tests measure?

16. Are there other important aspects of psychological constitution and equipment for which there now exist no adequate tests?

The investigation of these numerous problems has resulted in the accumulation of a considerable literature of mental tests. Many of the earlier forms of tests were abandoned because of their unsatisfactory or meaningless character. Others have been retained and improved in form, and many new ones are constantly being devised and elaborated, described and standardized. The precautions to be observed, the instructions to be given, and the methods of record and interpretation have been presented in various books and manuals. The tests have been developed for more and more complex functions, and now relate not only to relatively simple capacities but to highly elaborate and subtle forms of achievement. As rapidly as is consistent with accuracy, norms and standards of performance for different ages, school grades, vocational requirements, etc., are being accumulated and reported. Typical charts of age norms in selected tests are given in the Appendix.

As the tests have thus developed they have been organized for a variety of special purposes, such as for school measurement, educational diagnosis, clinical examination, laboratory experiment, and more recently for the purposes of vocational guidance and selection. Among the first of these to develop systematically, and also the ones with the most immediate vocational application, are the graded intelligence scales, which shall be our next concern.

GRADED INTELLIGENCE SCALES AND NORMS

An important step in the history of general tests is represented by the accumulation of norms and standards of performance for the different selected tests, and the arrangement of scales of tests with increasing difficulty, as further aids in fixing the individual's status.

After a standardized and tested form of test has been selected, norms of performance are accumulated by applying the test to large numbers of persons of the same general type. The classification may be on the basis of age, school grade, occupation, nationality, etc. In this way it becomes possible to determine for a given individual how he compares with other members of his group; whether he is above or below the average, and how far; whether he would belong among the best ten, or the poorest ten, or the third ten, etc., of one hundred selected at random. Such norms also reveal to what degree the tested ability varies with the other factors, on the basis of which the group was selected, as age, sex, education, size, health, race, etc.

As rapidly as reliable norms are established, it becomes possible to select for each age, school grade, occupation, etc., a set of tests which the average person of that age, schooling or calling should be able to perform to a certain known degree of proficiency. Failure to accomplish this indicates performance lower than that expected and in so far as success is dependent solely on mental ability, indicates inferior capacity. Similarly, ability to do more than the average or normal record requires indicates a capacity that is precocious, rare, and superior.

In this way are derived standard graded scales which represent a decided advance in the science of psychological diagnosis. There are three rather different forms in which attempts have been made to secure such scales. In one form the scale consists of a series of steps, each step consisting of different sorts of performance; that is, different tests or tasks are used. These tasks are arranged in groups, each group representing tests which should be passed acceptably by individuals of the given age, school grade, etc. In another form of scale the type of task is the same throughout, but the different points on the scale are represented by increasingly difficult specimens of material. The scale thus presents graded steps of difficulty in doing the same general sort of thing. In the third form the task remains precisely the same throughout, and performance is measured in terms of the time in which the task can be completed and the accuracy which is displayed. Sometimes, in scales of this type, although the instructions are always the same, the test is performed with varying degrees of approximation to a qualitative standard, and the steps may then consist of these graded qualitative achievements.

As representative of the first form of scale we may refer to the widely used Binet-Simon scale for the determination of mental age. Whatever we mean by intelligence, it is a characteristic which is essential to vocational activity. It is furthermore a characteristic which normally tends to increase in its degree or manifestation from infancy up to at least ten or twelve years of age. Beyond that point there are, to be sure, striking individual differences in that characteristic which we call intelligence, but beyond this point it does not seem so dependent on the physical age of the organism. Five-year-old children tend to be pretty much alike in intelligence. At least, the change from five years to seven years is commonly attended by very apparent growth in this respect, and a five-year-old is more like other five-year-olds in the things he can do than he is like seven-year-olds.

Experiment and observation show that the ages up to ten or twelve tend to indicate rather definite mental status, in the long run, although, to be sure, children of a given age vary considerably from one another. But beyond this point the age of an individual is not by any means an indication of the sort or degree of ability to be expected of him. The further we go beyond this point, the less significant becomes the mere statement of the individual's age. We may thus indicate the mental attainment of a child of less than twelve years by stating the average age of children who can do the things, know the facts, display the abilities that he can. This figure we will use to indicate his mental age as distinguished from his chronological or physical or actual age. A record-blank which enumerates the tests comprising the Binet-Simon scale is given in the Appendix. Those who may be interested in using this or similar scales should familiarize themselves with some of the many books and manuals that have been written concerning them, the methods of using them, their characteristic results and their evaluation. These scales will be again considered in a later section, when we discuss the measures of general intelligence as they relate to vocational guidance and selection.

Other scales than the Binet-Simon series have been proposed, and this series has itself undergone modifications at the hands of later investigators—changes calculated to render it more reliable and adaptable. Much work is now being done in the attempt to develop scales or sets of tests which will reveal characteristic differences among people whose mentality has gone beyond the point which the juvenile scales reach.

The work of Trabue in standardizing the "completion test" so that individuals may be quantitatively compared on the basis of it may serve as an example of the second form of scale. This particular test consists in requiring the individual to supply meaningful words or phrases in the blank spaces formed by mutilating logical text. It is similar to the simple exercise sometimes found in elementary text books of grammar and spelling. It seems that the ability to supply the missing words or phrases quickly in such mutilated material calls for the exercise of a type of ability which correlates to a high degree with most other measures of intelligence. Individual differences as shown by school grades, age, opinion of teachers, estimates of associates, results of other mental tests, etc., are readily and with considerable reliability revealed in the individual's ability to perform this type of test. This investigator has, after much preliminary labor, constructed a form of this test in which the material gradually increases in difficulty from beginning to end. Efficiency in the test may be measured by the point one can reach in the text in a given time. This test has been standardized, not on the basis of physical age, as in the case of the Binet-Simon scale, but on the basis of school grade, from the second grade through the high school, some four or six years beyond the point where the Binet-Simon scale ceases to be useful. A copy of this test is also given in the Appendix. Those who wish to use it should consult the original description of it, for technique, precautions, norms, and interpretation.

A good example of the third form of scale is to be found in Sylvester's standardization of the "form-board" test. The "form-board" is one of the most useful tests in detecting intellectual defect that is so pronounced as to constitute the individual a "mental defective." Out of a solid base board are cut various geometrical forms, such as diamonds, stars, squares, triangular blocks, etc. These blocks are placed alongside the base from which they have been cut. The task is that of replacing all the blocks in their appropriate places, with the greatest possible speed. The test tends to reveal characteristic defects in understanding instructions, perceiving the general and specific situations, profiting by experience, recognizing form and size and other space relations, etc. The individual may work blind-folded or may use his eyes.

In the standardized form the sizes, shapes and positions are uniformly adopted and the technique of instruction and procedure is specified. Under these conditions the time required to complete the task by normal children of the ages five to fourteen years has been recorded. Sylvester presents a curve based on the examination of 1,537 normal children. The curve shows the average time of performance for each age and also indicates the range of performance for each age. In the case of a given individual it is thus easy, by referring to the standard table of norms, to determine whether he is up to the normal record for his age, whether he is within the normal range of variation for this age, and how deficient or precocious he may be in this respect. Tables of this type are now being accumulated for a great variety of single standard tests.

In addition to scales of this type, which proceed by setting for the individual a graded series of tasks and determining his success in their accomplishment, there is a further type of graded scale which is now represented by several standard specimens. This is the type of scale which is designed to afford an instrument for the measurement of such products as the actual work of the individual incidentally yields. Thorndike's "Scale for the Measurement of Handwriting" is the model on which many of the later scales of this type have been based. In this scale actual specimens of handwriting are arranged in a graduated series in such a way that the steps from specimen to specimen are equally appreciable or noticeable, and in this sense uniform. When such a scale extends from an actual zero point, it is possible to "measure" the quality of handwriting in quite the same way as that in which one measures the height of an individual or the length of a table. The quantitative measure consists in the statement of the number of stages which intervene between that quality of product represented by the specimen and the zero point of the scale. The position assigned to the specimen being measured is determined by moving the specimen along the graded series of standards until a point is reached where the specimen seems, on the basis of direct inspection, to belong. Such scales have been formulated for various special forms of school work, such as handwriting, drawing, arithmetic, literary composition, mechanical construction, etc. By such means it is possible not only to measure the "general intelligence" of the worker, but also his actual ability in creating a definite type of product. There seems to be no limit to the possibilities of scales of this form, and their value in determining the more definite and particular capacities, whether from the point of view of original endowment or from the point of view of the effects of training, is obvious.

These various scales for measuring general intelligence have been used chiefly for the purposes of educational diagnosis, in determining the degree of backwardness of children in the grades, their need for special educational attention, or the hopelessness of further pedagogical effort with them. But it is obvious at once that tests of this type are of great use to an employer in eliminating, from among the candidates for work, those who are hopelessly mentally defective, feeble-minded, and irresponsible. There are many sorts of work in which the employment of feeble-minded persons, unrecognizable as such by their physical traits or by a casual inspection, not only entails loss and annoyance but may constitute a positive danger and constant menace to those who rely on the defective individual. Such work as that of delivery boys, messengers, domestic servants, nurses, elevator operators, drivers, motormen, etc., may be cited as instances of work into which the feeble-minded easily slip, unless there is some standardized means of recognizing them.

The importance of detecting these incompetents and keeping them from work in which their irresponsibility means economic waste and personal and social danger is of distinct vocational interest. Studies of cases brought to the Clearing House for Mental Defectives in New York City show that of the first two hundred and eighty-one feeble-minded women of child-bearing age, about two-thirds had been engaged in some form of economic labor in which their incompetence was distinctly dangerous to those associated with them. The following table shows how these two hundred and eighty-one feeble-minded women had been employed:

Living at home and assisting at simple tasks94
Domestic service (families, bars, hotels, etc.)67
Engaged in factory operations21
Living in institutions, reformatories, asylums20
Prostitutes30
Laundresses5
Working in stores, clerking, errands, etc.5
Nursemaids9
Odd jobs6
Married and keeping house11
Housework, with relatives13

The investigators originally reporting these data write as follows: "These defective women had borne eighty-nine illegitimate children, which were acknowledged and could be somewhat definitely located, and sixteen women were illegitimately pregnant at the time of their examination at the Clearing House. Twenty-four of the two hundred and eighty-one had married and these had borne forty-six legitimate children. The average mental age of the illegitimate mothers was nine years."

The employment of feeble-minded women as domestics, factory operatives, laundresses, clerks, and nursemaids constitutes not only a nuisance to the general public, but a real source of inefficiency and danger to the community. Graded scales for the measurement of intelligence will have amply repaid the labor devoted to their formulation if they aid us in the proper segregation and vocational supervision of the mentally defective. The feeble-minded boy is more likely to be observed in the natural course of things, because of the more strictly competitive types of work into which boys customarily go, but it is far from realized how much loss of property, life, and general happiness is entailed upon the community by the indiscriminate employment of untested boys and men as floating employees.

But the vocational value of the graded intelligence scales and norms is not limited to the work of detecting and eliminating the feeble-minded. Many of the tests as now standardized yield measures of intelligence, capacity and comprehension ranging far above the level which constitutes the borderline of mental defect. Some of them reach somewhat higher than the average intelligence and capacity of the college freshman. It is thus possible, through the use of the graded scales, to measure in quantitative terms the general intelligence as well as various more special capacities of applicants and candidates for positions for which general intelligence is the chief requisite. Such tests are now used in many places in the selection of clerical workers, telephone operators, stenographers, waitresses, motormen, salesmen, office help, inspectors, watchmen, soldiers, and special types of factory workers. Thus Trabue reports a study in which Professor Scott tested thirty efficiency experts employed by a large industrial concern in New England. Ten psychological tests were used, including a completion test. The men were also judged on the basis of their relative abilities by the members of the firm. The combined tests correlated with the combined judgments, giving the very high coefficient of .87. The completion test alone yielded a coefficient of .64. From the point of view of vocational selection we may expect the principle of the graded intelligence scale to become increasingly valuable as more and more norms are established. The first definite contribution of vocational psychology is thus not so much toward the guidance of the individual worker as for the guidance of the employer who may be required to select from a number of applicants those whose general intellectual equipment is most adequate. But we shall later have occasion to point out a further contribution which this makes possible, in so far as it may enable us to classify the operations involved in various types of work and to align these operations and tasks along the general intelligence scale. Such alignment will enable us to specify the approximate degree of general intelligence which a given position demands, and thus, in the case of the simpler tasks, afford a means of vocational guidance as well as vocational selection.