INTRODUCTION
This paper is concerned with the problem of designing a network of linear threshold elements capable of efficiently adapting its various sets of weights so as to produce a prescribed input-output relation. It is to accomplish this adaptation by being repetitively presented with the various inputs along with the corresponding desired outputs. We will not be concerned here with the further requirement of various kinds of ability to “generalize”—i.e., to tend to give correct outputs for inputs that have not previously occurred when they are similar in some transformed sense to other inputs that have occurred.
In putting forth a model for such an adapting or “learning” network, a requirement is laid down that the complexity of the adaption process in terms of interconnections among elements needed for producing appropriate weight changes, should not greatly exceed that already required to produce outputs from inputs with a static set of weights. In fact, it has been found possible to use the output-from-input computing capacity of the network to help choose proper weight changes by observing the effect on the output of a variety of possible weight changes.
No attempt is made here to defend the proposed network model on theoretical grounds since no effective theory is known at present. Instead, the plausibility of the various aspects of the network model, combined with empirical results must suffice.
SINGLE ELEMENTS
To simplify the problem it is assumed that the network receives a set of two-valued inputs, x₁, x₂, ..., xₙ, and is required to produce only a single two-valued output, y. It is convenient to assign the numerical quantities +1 and -1 to the two values of each variable.
The simplest network would consist of a single linear threshold element with a set of weights, c₀, c₁, c₂, ..., cₙ. These determine the output-input relation or function so that y is +1 or -1 according as the quantity, c₀ + c₁x₁ + c₂x₂ + ... + cₙxₙ, is positive or not, respectively. It is possible for such a single element to exhibit an adaptive behavior as follows. If, for a given set, x₁, x₂, ..., xₙ, the output, y, is correct, then make no changes to the weights. Otherwise change the weights according to the equations
- Δc₀ = y*
- Δcᵢ = y*xᵢ, i = 1,2, ...,n
- where y* is the desired output.
It has been shown by a number of people that the weights of such an element are assured of arriving at a set of values which produce the correct output-input relation after a sufficient number of errors, provided that such a set exists. An upper bound on the number of possible errors can be given which depends only on the initial weight values and the logical function to be learned. This does not, however, solve our network problem for two reasons.
First, as the number, n, of inputs gets large, the number of errors to be expected for most functions which can be learned increases to unreasonable values. For example, for n = 6, most such functions result in 500 to 1000 errors compared to an average of 32 errors to be expected in a perfect learning device.