Given a representation of one kind of entity (the input), generate
a representation of another (the output).
Both representations often take the form of patterns,
that is, vectors of numbers.
In this case, the inputs and outputs are represented in a distributed
way.
Training (supervised): expose the learner to a number of input patterns
and their correct associated output patterns.
Generalization: given an unfamiliar input pattern, respond with an
appropriate output pattern.
Examples
Perceptual input, category output
State (perceptual) input, action Q-value output
Word input, meaning output
Meaning input, word output
English input, Spanish output
Implementation of feedforward neural networks
Neural network basics
The elements of a neural network
Network of interconnected processing units, each
representing an element of "meaning" in the broadest sense
At any given time, each unit has an associated
activation, usually a number between 0 and 1
or -1 and +1.
The activation of a unit represents the degree to which the meaning
associated with that unit is currently "active" (being considered,
treated as relevant, etc.).
Activations vary relatively rapidly.
We will use x for activations, with subscripts representing
the index of the unit.
The units are joined in a network by directed
connections.
The pattern of activation over the units in the network (or some
subset of the network) represents the system's short-term memory,
what it is currently "thinking about".
Each of the connections has an associated strength, its
weight.
The weight on the connection from unit i to unit j
(wji in our notation)
represents the tendency for the meanings of the units to be associated,
the tendency for "thinking about" the meaning of i to lead to
"thinking about" the meaning of j.
In neural network terms, the weight represents the extent to which
the activation of i affects the activation of j.
Together the weights in a network represent the system's long-term memory.
Feedforward networks and pattern association
In a feedforward neural network, the units
can be thought of as arranged in separate groups, or
layers.
The connections joining units in two layers all have the same direction.
Some of the units are designated input units.
These are clamped
to particular activations
when the network is presented an input pattern.
The activation of a clamped unit does not change.
Some of the units are designated output
units; their activations
represent the network's response to the current input, the pattern
that the network "believes" should be associated with the input
pattern.
Each output unit repeatedly updates its activation while the
network is "running".
In a feedforward network, each output unit updates its activation
once in response to each input pattern.
How neural network units update
Calculating the input to a unit
When a unit updates, it first calculates the input (I) that
it receives from the other units that connect to it.
The input to a unit is the weighted sum of the activations of the connected units
(usually all of the units in the input layer).
The input to unit j from unit i is the product of
i's activation and the weight connecting the units, that is,
xi wji
The total input to unit j is the sum of the inputs from
all of the other connected units.
If we represent the activations of the units connected to j
as a vector xI and the weights from these units
into j as a vector wj, then the input to
j is just the dot product of these two
vectors.
Calculating the activation of a unit
Once an output unit has calculated the input it receives from other units,
it usually "squashes" that value into some pre-specified range.
This is because we want units to represent by being more or less "on" or "off", and
values outside the on/off range would be meaningless.
A squashing function is a function that takes an unbounded input and
compresses it to some range.
The simplest squashing function is the threshold function.
(We will look at another later on.)
The threshold function sets the unit's activation to 1 if its input is over some
threshold value, to 0 (or -1) otherwise.
The threshold can be an independent property of each unit or of the output
units as a whole.
Say the threshold for the output unit in the figure (θ4) is 0.0; then its
activation would be set to 1.