NETWORKS OF ELEMENTS

It can be demonstrated that if a sufficiently large number of linear threshold elements is used, with the outputs of some being the inputs of others, then a final output can be produced which is any desired logical function of the inputs. The difficulty in such a network lies in the fact that we are no longer provided with a knowledge of the correct output for each element, but only for the final output. If the final output is incorrect there is no obvious way to determine which sets of weights should be altered.

As a result of considerable study and experimentation at Aeronutronic, a network model has been evolved which, it is felt, will get around these difficulties. It consists of four basic features which will now be described.

Positive Interconnecting Weights

It is proposed that all weights in elements attached to inputs which come from other elements in the network be restricted to positive values. (Weights attached to the original inputs to the network, of course, must be allowed to be of either sign.) The reason for such a restriction is this. If element 1 is an input to element 2 with weight c₁₂, element 2 to element 3 with weight c₂₃, etc., then the sign of the product, c₁₂c₂₃ ..., gives the sense of the effect of a change in the output of element 1 on the final element in the chain (assuming this is the only such chain between the two elements). If these various weights were of either possible sign, then a decision as to whether or not to change the output in element 1 to help correct an error in the final element would involve all weights in the chain. Moreover, since there would in general be a multiplicity of such chains, the decision is rendered impossibly difficult.

The above restriction removes this difficulty. If the output of any element in the network is changed, say, from -1 to +1, the effect on the final element, if it is affected at all, is in the same direction.

It should be noted that this restriction does not seriously affect the logical capabilities of a network. In fact, if a certain logical function can be achieved in a network with the use of weights of unrestricted sign, then the same function can be generated in another network with only positive interconnecting weights and, at worst, twice the number of elements. In the worst case this is done by generating in the restricted network both the output and its complement for each element of the unrestricted network. (It is assumed that there are no loops in the network.)

A Variable Bias

The central problem in network learning is that of determining, for a given input, the set of elements whose outputs can be altered so as to correct the final element, and which will do the least amount of damage to previous adaptations to other inputs. Once this set has been determined, the incrementing rule given for a single element will apply in this case as well (subject to the restriction of leaving interconnecting weights positive), since the desired final output coincides with that desired for each of the elements to be changed (because of positive interconnecting weights).

In the process of arriving at such a decision three factors need to be considered. Elements selected for change should tend to be those whose output would thereby be affected for a minimum number of other possible inputs. At the same time it should be ascertained that a change in each of the elements in question does indeed contribute significantly towards correcting the final output. Finally, a minimum number of such elements should be used.

It would appear at first that this kind of decision is impossible to achieve if the complexity of the decision apparatus is kept comparable to that of the basic input-output network as mentioned earlier. However, in the method to be described it is felt that a reasonable approximation to these requirements will be achieved without an undue increase in complexity.

It is assumed that in addition to its normal inputs, each element receives a variable input bias which we can call b. The output of every element should then be determined by the sign of the usual weighted sum of its inputs plus this bias quantity. This bias is to be the same for each element of the network. If b = 0 the network will behave as before. However, if b is increased gradually, various elements throughout the network will commence changing from -1 to +1, with one or a few changing at any one time as a rule. If b is decreased, the opposite will occur.

Now suppose that for a given input the final output ought to be +1 but actually is -1. Assume that b is then raised so high that this final output is corrected. Then commence a gradual decline in b. Various elements may revert to -1, but until the final output does, no weights are changed. When the final output does revert to -1, it is due to an element’s having a sum (weighted sum plus bias) which just passed down through zero. This then caused a chain effect of changing elements up to the final element, but presumably this element is the only one possessing a zero sum. This can then be the signal for the weights on an element to change—a change of final output from right to wrong accompanied simultaneously by a zero sum in the element itself.

After such a weight change, the final output will be correct once more and the bias can again proceed to fall. Before it reaches zero, this process may occur a number of times throughout the network. When the bias finally stands at zero with the final output correct, the network is ready for the next input. Of course if -1 is desired, the bias will change in the opposite direction.

It is possible that extending the weight change process a little past the zero bias level may have beneficial results. This might increase the life expectancy of each learned input-output combination and thereby reduce the total number of errors. This is because the method used above can stop the weight correction process so that even though the final output is correct, some elements whose output are essential to the final output have sums close to zero, which are easily changed by subsequent weight changes.

It will be noted that this method conforms to all three considerations mentioned previously. First, by furnishing each element the same bias, and by not changing weights until the final output becomes incorrect with dropping bias, there is a strong tendency to select elements which, with b = 0, would have sums close to zero. But the size of the sum in an element is a good measure of the amount of damage done to an element for other inputs if its current output is to be changed. Second, it is obvious that each element changed has had a demonstrable effect on the final output. Finally, there will be a clear tendency to change only a minimum of elements because changes never occur until the output clearly requires a change.

On the other hand this method requires little more added complexity to the network than it already has. Each element requires a bias, an error signal, and the desired final output, these things being uniform for all elements in a network. Some external device must manipulate the bias properly, but this is a simple behavior depending only on an error signal and the desired final output—not on the state of individual elements in the network. What one has, then, is a network consisting of elements which are nearly autonomous as regards their decisions to change weights. Such a scheme appears to be the only way to avoid constructing a central weight-change decision apparatus of great complexity. This rather sophisticated decision is made possible by utilizing the computational capabilities the network already possesses in producing outputs from inputs.

It should be noted here that this varying bias method requires that the variable bias be furnished to just those elements which have variable weights and to no others. Any fixed portion of the network, such as preliminary layers or final majority function for example, must operate independently of the variable bias. Otherwise, the final output may go from right to wrong as the bias moves towards zero and no variable-weight element be to blame. In such a case the network would be hung up.

Logical Redundancy in the Network

A third aspect of the network model is that for all the care taken in the previous steps, they will not suffice in settling quickly to a set of weights that will generate the required logical function unless there is a great multiplicity of ways in which this can be done. This is to say that a learning network needs to have an excess margin of weights and elements beyond the minimum required to generate the functions which are to be learned.

This is analogous to the situation that prevails for a single element as regards the allowed range of values on its weights. It can be shown for example, that any function for n=6 that can be generated by a single element can be obtained with each weight restricted to the range of integer values -9,-8, ..., +9. Yet no modification of the stated weight change rule is known which restricts weight values to these and yet has any chance of ever being learned for most functions.

Fatigued Elements

It would appear from some of the preliminary results of network simulations that it may be useful to have elements become “fatigued” after undergoing an excessive number of weight changes. Experiments have been performed on simplifications of the model described so far which had the occasional result that a small number of elements came to a state where they received most of the weight increments, much to the detriment of the learning process. In such cases the network behaves as if it were composed of many fewer adjustable elements. In a sense this is asking each element to maintain a record of the data it is being asked to store so that it does not attempt to exceed its own information capacity.

It is not certain just how this fatigue factor should enter in the element’s actions, but if it is to be compatible with the variable bias method, this fatigue factor must enter into the element’s response to a changing bias. Once an element changes state with zero sum at the same time that the final output becomes wrong, incrementing must occur if the method is to work. Hence a “fatigued” element must respond less energetically to a change of bias, perhaps with a kind of variable factor to be multiplied by the bias term.