L. H. MacDANIELS and S. S. ATWOOD, Cornell University

All its members would agree that the Northern Nut Growers Association should have an officially accepted schedule for judging black walnuts and the other kinds of nuts with which it is concerned. Some yardstick is needed to serve as a basis for the comparison of varieties which the members of the Association will use. Persons familiar with nut varieties are freqeuntly asked to answer questions about the best varieties to plant. Of course there is no simple answer to such a question as many factors besides the nuts themselves determine the value of a variety. The quality and value of the nuts are, however, the most important initial consideration in selecting a variety on its merit and there should be some objective test adopted to aid in evaluating nut samples.

During the many years that the Northern Nut Growers Association has been operating more than a hundred and fifty varieties of black walnuts have been named. Yet at the present time we are not certain which are the better varieties except in a very general way. There is no widely accepted judging schedule being used as is evident in the tables published by Seward Berhow in his paper in the 1945 Proceedings (2). In these tables scores are given but these come from several sources and are not comparable and hence are of little value in making comparisons.

There have been many schedules for judging black walnuts presented in the past. One of the first was proposed by the late Willard G. Bixby (3, 4). This was complicated and never came into general use although the testing done by Mr. Bixby was a valuable contribution to our knowledge of varieties. The late N. F. Drake tested many varieties through the years according to a schedule of his own devising (5, 6). Professor Drake's schedule was related to his concept of a perfect walnut and the various values were related to this on a percentage basis. This schedule never had wider acceptance, chiefly because it was too complicated and required too much figuring.

Mr. C. A. Reed has probably tested more varieties of nuts and is more familiar with varieties than any other person but he does not have a definite scoring schedule. Kline and Chase (7) summarized results of the testing work that had been done and Kline (8) compared varieties according to a system which he devised in which they were rated in terms of return per hour of labor spent in cracking and extracting the kernels. Mr. C. C. Lounsberry has proposed a method of scoring which was related to kernel cavity measurement (9).

In 1935, a Committee on Varieties and Standards endeavored to formulate a working schedule that would be adopted as official. This committee set up a score that represented the best thinking of the group at that time (1). Twenty-five nut samples were used. The score was the sum of the weight of an individual nut in grams plus twice the per cent kernel of the weight of the nuts recovered in the first crack plus the total percentage of kernel plus 1/10 of a point for each quarter kernel recovered. Penalties were proposed for shrunken kernels and empty nuts. Through the years a large number of samples have been tested according to this scoring schedule (11). In 1943, MacDaniels and Wilde (12) summarized the previous work done, added many tests and evaluated the scoring system. This was not considered to be altogether satisfactory. In the first place, it was somewhat cumbersome and had never been adopted by the Association nor had it been used much by others. The figuring of percentages and penalties made a score too involved for wide aceptance. A very serious difficulty was the problem of shrunken kernels and empty nuts. Obviously, with a score related to the weight of the sample before cracking, the inclusion of a number of empty nuts made it impossible to make any accurate correction in the percentages that were used in the score. Penalties did not solve the problem. Also the initial weight of the sample varied with the amount of husk clinging to the shells. From this work it was evident that an acceptable score would have to be formulated on some other basis.

The next approach was to analyze data of this type statistically in an attempt to devise a better scoring system (1). The results from such a study proved valuable in answering such questions as 1) the size of sample necessary to obtain significant differences between samples; 2) the significance of small differences in measurements or in scores and 3) the amount of variation that is normal and without significance in comparing varieties.

The following qualifications were considered essential to a workable schedule:

1) The schedule must be easy to use.

2) The schedule must concern itself with objective qualities or characters which can be weighed or measured. It cannot be concerned with flavor and other characters upon which there may be disagreement and which depend upon personal preference.

3) Characters must be avoided which vary with the treatment of the samples themselves such as color of kernels.

4) It must give a score that will separate samples on small differences.

Considering the problem from these angles and scrutinizing the older schedules, a number of ideas came out. First of all, why include the shells? If shells are discarded a number of problems would be solved, such as the cleaning of the nuts and adjustments for shrivelled and empty nuts. Also, why reduce any of the weights or measures to percentages which only add to the complexity of the score? The actual amount of kernels recovered reflects both the size of nuts and the yield of kernels. Plumpness of the kernels is reflected in the total weight of kernels and does not need to be considered separately.

The important elements in a score were considered to be:

1) The crackability of the nuts of the variety. This is measured by the weight of kernels obtained in the first crack.

2) The yield of the variety. This is measured in the total weight of kernels.

3) The marketability of the product. This can be measured by the number of pieces in the sample. In general, the smaller the number and the larger the size of the pieces the better the marketability.

With this general background in mind, many samples were tested and the results published in the 1945 report[1]. In order to secure the data needed the kernels of the individual nuts in the samples were weighed separately.

NOTE: All samples were cracked with the (John W.) Hershey nut cracker.

Some of the conclusions drawn from these tests were as follows:

1) Using kernel weights only gives a rapid and accurate test of differences between varieties.

2) Ten nuts are adequate for a single sample.

3) The location of the tree with reference to climate and soil is probably the most important single factor influencing kernel yield. No evidence was obtained, however, to indicate that the varieties ranked significantly different at different locations.

4) If reasonable care is used in cracking the differences due to different operators tend to be non-significant.

The statistical proof that a ten-nut sample is adequate and that differences between operators are not significant are two findings that are important in setting up a schedule.

During the past year further testing has been done, in which scores were computed from ten-nut samples.[A] The samples had preliminary cool, dry storage to assure comparable moisture content. Enough nuts were cracked in each sample to secure ten that were well filled. Empty nuts were recorded. The following data were kept for each sample:

1) The weight of the kernels recovered in first crack in grams.

2) The total weight of the kernels in grams.

3) The number of quarters and number of halves recovered.

Scores were computed as 1) the weight of the first crack in grams plus 2) half of the total weight of the kernels recovered in grams plus 3) the number of quarters divided by four and, 4) the number of halves divided by two. In this score, it was considered that the crackability of the sample was measured by the weight of the first crack; the yield, by the total weight of kernels secured from the sample; the marketability by the number of quarters and halves. From the use of this schedule scores were secured ranging from 83.9 for the variety Thomas grown in Maryland to 37.4 for the variety Huen, which is a small nut giving relatively small kernel yield.

Analyses of the data to determine the percentage of the score that was derived from each component showed that crackability as measured by the weight of the kernels recovered in first crack gave an average of 54% of the score with a range of 49 to 58 for the different samples; yield, as measured by total weight of kernels divided by two, 31% with range of 27 to 34%; marketability measured by number of quarters divided by four 14% with range of 10 to 22% and number of halves divided by two 1%. The percentage of the score derived from the number of halves was so small as to be negligible. It seemed better, therefore, to base the score on only three elements, namely, the weight of the first crack, the total yield of kernels and the number of quarters recovered from the sample.

On this basis the problem becomes that of deciding the weights that should be given to these three components. The score as set up emphasizes the crackability of the variety much more than its marketability. This seems logical because the value of a variety is in large part dependent upon the ease of recovery of the kernels on first cracking. Several different combinations of the weighting of these three components were considered and it was decided that the most logical was to weight the elements as follows: 1) The weight of first crack in grams. 2) The total weight of the kernels divided by two and 3) the number of quarters recovered divided by 2. If there are halves, each half would count as two quarters.

Table I. Average scores from 18 black walnut samples cracked by three operators and computed by two scoring systems.

Variety	Source	Year	Scoring	Systems [3]
		I	II
			points	points
Thomas	Maryland	'46	83.9	93.1
Snyder	Ithaca, N. Y. (A)	'46	81.8	89.2
Ohio	Maryland	'46	79.5	88.9
Thomas	Ithaca, N. Y. (A)	'46	76.4	85.5
Norris	Tennessee	'45	76.1	83.9
Stambaugh	Ithaca, N. Y. (A)	'46	75.9	81.0
Stambaugh	Ithaca, N. Y. (A)	'46	74.0	83.2
Thomas	Tennessee	'45	71.5	79.6
Thomas	Ithaca, N. Y. (B)	'46	65.7	74.6
Cornell	Ithaca, N. Y. (C)	'46	59.3	67.6
Stabler	Maryland	'45	56.9	64.5
Cresco	Ithaca, N. Y. (A)	'46	55.8	65.2
Seedling No. 1	Geneva, N. Y.	'46	52.7	62.2
Seedling No. 3	Geneva, N. Y.	'46	50.6	59.0
Brown	Ohio	'45	49.7	59.4
Stabler	Tennessee	'45	47.5	51.4
Seedling No. 2	Geneva, N. Y.	'46	44.4	52.2
Huen	Iowa	'46	37.4	44.9
Least significant difference (5%)		6.3	6.6

Calculating the percentage of each component in the total score on this basis gives crackability 48%, yield 27%, marketability 25%. This schedule gives relatively more weight to marketability as against the other two components. The average scores of 18 samples cracked by three operators and calculated on both the above described schedules are given in table I.

The table shows that the rank of the different samples was not changed materially by using only the three components, except in a few cases in which there were an appreciable number of halves. The Stabler has many one-lobed nuts which increase the number of halves recovered. It is to be noted that with both schedules the least significant difference at the 5% level is about 6 score points.

Table II gives the score calculated by schedule II for five samples, each cracked by six operators. The difference between operators is not significant but the difference between varieties is highly significant.