COMPUTER SIMULATION RESULTS

A computer simulation of some of the network features previously described has been made on an IBM 7090. Networks with an excess of elements and with only positive interconnecting weights were used. However, in place of the variable bias method, a simple choice of the element of sum closest to, and on the wrong side of, zero was made without regard to the effectiveness of the element in correcting the final output. No fatigue factors were used.

The results of these simulations are very encouraging, but at the same time indicate the need for the more sophisticated methods. No attempt will be made here to describe the results completely.

In one series of learning experiments, a 22-element network was used which had three layers, 10 elements on the first, 11 on the second, and 1 on the third. The single element on the third was the final output, and was a fixed majority function of the 11 elements in the second layer. These in turn each received inputs from each of the 10 on the first layer and from each of the 6 basic inputs. The 10 on the first layer each received only the 6 basic inputs. A set of four logical functions, A, B, C, and D, was used. Function A was actually a linear threshold function which could be generated by the weights 8, 7, 6, 5, 4, 3, 2, functions B and C were chosen by randomly filling in a truth table, while D was the parity function.

TABLE I
ABCD
 r  e  r  e r  e r  e
5 54 8 100 11 101 4 52
4 37 9 85 4 60 5 62
4 44 6 72 9 85 6 56

Table I gives the results of one series of runs with these functions and this network, starting with various random initial weights. The quantity, r, is the number of complete passes through the 64-entry truth table before the function was completely learned, while e is the total number of errors made. In evaluating the results it should be noted that an ideal learning device would make an average of 32 errors altogether on each run. The totals recorded in these runs are agreeably close to this ideal. As expected, the linear threshold function is the easiest to learn, but it is surprising that the parity function was substantially easier than the two randomly chosen functions. [Table II] gives a chastening result of the same experiment with all interconnecting weights removed except that the final element is a fixed majority function of the other 21 elements. Thus there was adaptation on one layer only. As can be seen [Table I] is hardly better than [Table II] so that the value of variable interconnecting weights was not being fully realized. In a later experiment the number of elements was reduced to 12 elements and the same functions used. In this case the presence of extra interconnecting weights actually proved to be a hindrance! However a close examination of the incrementing process brought out the fact that the troublesome behavior was due to the greater chance of having only a few (often only one) elements do nearly all the incrementing. It is expected that the use of the additional refinements discussed herein will produce a considerable improvement in bringing out the full power of adaptation in multiple layers of a network.

TABLE II
ABCD
 r  e  r  e r  e r  e
7 47 18  192 8 110 4 48
3 40 7 69 10  98 6 68
4 43 7 82 4 47 6 46