The Psychology of Arithmetic - Edward L. Thorndike

5. A teacher weighed all the children in a certain grade. One girl weighed 70 pounds. Her older sister was 49 pounds heavier. How many pounds did the sister weigh?

Revision. Mary weighs 70 lb. Jane weighs 49 pounds more than Mary. Jane weighs .... pounds.

The distinction between a problem described as clearly and simply as possible and the same problem put awkwardly or in ill-known words or willfully obscured should be regarded; and as a rule measurements of ability to apply arithmetic should eschew all needless obscurity or purely linguistic difficulty. For example,

A boy bought a two-cent stamp. He gave the man in the store 10 cents. The right change was .... cents.

is better as a test than

If a boy, purchasing a two-cent stamp, gave a ten-cent stamp in payment, what change should he be expected to receive in return?

The distinction between the description of a bona fide problem that a human being might be called on to solve out of school and the description of imaginary possibilities or puzzles should also be considered. Nos. 3 and 9 of Stone are bad because to frame the problems one must first know the answers, so that in reality there could never be any point in solving them. It is probably safe to say that nobody in the world ever did or ever will or ever should find the number of apples in a box by the task of No. 4 of the Courtis Test 8.

This attaches no blame to Dr. Stone or to Mr. Courtis. Until very recently we were all so used to the artificial problems of the traditional sort that we did not expect anything better; and so blind to the language demands of described problems that we did not see their very great influence. Courtis himself has been active in reform and has pointed out ('13, p. 4 f.) the defects in his Tests 6 and 8.

"Tests Nos. 6 and 8, the so-called reasoning tests, have proved the least satisfactory of the series. The judgments of various teachers and superintendents as to the inequalities of the units in any one test, and of the differences between the different editions of the same test, have proved the need of investigating these questions. Tests of adults in many lines of commercial work have yielded in many cases lower scores than those of the average eighth grade children. At the same time the scores of certain individuals of marked ability have been high, and there appears to be a general relation between ability in these tests and accuracy in the abstract work. The most significant facts, however, have been the difficulties experienced by teachers in attempting to remedy the defects in reasoning. It is certain that the tests measure abilities of value but the abilities are probably not what they seem to be. In an attempt to measure the value of different units, for instance, as many problems as possible were constructed based upon a single situation. Twenty-one varieties were secured by varying the relative form of the question and the relative position of the different phrases. One of these proved nineteen times as hard as another as measured by the number of mistakes made by the children; yet the cause of the difference was merely the changes in the phrasing. This and other facts of the same kind seem to show that Tests 6 and 8 measure mainly the ability to read."

The scientific measurement of the abilities and achievements concerned with applied arithmetic or problem-solving is thus a matter for the future. In the case of described problems a beginning has been made in the series which form a part of the National Intelligence Tests ['20], one of which is shown on page 49 f. In the case of problems with real situations, nothing in systematic form is yet available.