In a local representational scheme, each thing being represented
is assigned a single representational element, that is, in a neural
network, a single unit in the input or output layer of the networks.
Conversely each unit is associated with only one represented thing.
So each time a local input is presented to a network, one unit is turned on,
and all the others are off.
In a distributed representation, each thing being represented involves
more than one representational element (unit in a neural network).
Conversely each unit is associated with more than one represented thing.
Instead of representing things, the units represent features of things.
So each time a distributed input is presented to a neural network, multiple
units are turned on.
What they're good for
In a local representation scheme, there is no sharing between different
represented things.
We know the things are represented completely independently,
and there can be no generalization on the basis of the similarity
between things because there is no similarity, or seen another way,
all things are equally different from one another.
In a distributed representation scheme, the representations of the
different things overlap.
If the inputs to a neural network
are distributed, the network responds on the basis
of the similarity of a new input to all of the inputs it has been trained on.
Thus generalization to new inputs is possible.
If the outputs from a neural network
are distributed, the network is being trained not
just to treat different inputs as belonging to the same category
but as having the same features.
Designing representations
Example 1: visual input with 3 pixels,
5 colors (red, green, blue, yellow, black)
A local representation of the inputs would assign a single
input unit to each possible visual input, that is,
5^3 = 125 units in all.
No generalization to novel inputs would be possible.
One distributed input scheme would assign a single input unit
to each combination of pixel and color.
This would require 5 X 3 = 15 input units.
The network could generalize to novel inputs on the basis of
their similarity to trained inputs.
For example, if the network were presented for the first time
with the pattern RED - RED - BLACK, it might be able to respond
appropriately if it had been trained on patterns such as RED - GREEN - BLACK
and BLUE - RED - BLACK, assuming that generalizations on the basis
of similarities such as these are the right generalizations
to make.
In the previous scheme, color is still representated locally within
each pixel.
We could go further by using only 3 units to represent each color
with the units signifiying red, green, and blue.
This would require only 3 X 3 = 9 input units.
Yellow in a given pixel would then be represented by turning on the
red and green units among the three for that pixel.
But note that yellow is always treated as similar to red and green
in this scheme.
For example, YELLOW - YELLOW - BLACK will be more similar to GREEN - RED - BLACK
than it is to BLUE - BLUE - BLACK.
This may or may not be the right kind of generalization to make.
Example 2: network for counting objects in a visual scene
An obvious way to represent the scenes (assuming color is not relevant)
is by treating the input space as a direct representation of the
visual space and turning on all of the units that correspond to objects
in the scene.
This is a distributed representation of scenes because for each scene
multiple units are turned.
On the basis of this first kind of representation, it will be difficult to
learn to count though because of the invariance problem.
Different scenes with the same number of objects will be very different from
each other.
In particular they will have very different numbers of units turned on
(the number of units turned on depends on the size of the objects as
well as how many there are).
Therefore some sort of preprocessing can help.
One possibility is to transform the raw inputs into patterns in which
a single unit, perhaps at the center of mass, is turned on for each
input object.
This would still be distributed, but would be more local than the
raw input.
We also need a way to represent the numbers that are the output of
the network.
A local scheme would assign a single unit to each, a unit for one,
a unit for two, etc.
One problem with this scheme is that the number of possible numbers
is limited by the number of units.
It is also possible to represent the numbers in a distributed fashion.
One possibility is to represent them in a binary fashion, with each
output unit representing a power of 2.
For example, with 4 output units, FOUR could be represented by 0 1 0 0
and SEVEN by 0 1 1 1.
There would still be a limit on the numbers that could be represented,
but more numbers could be represented with fewer units.
Note, however, that this scheme would train the network to associate
input patterns with features of numbers.
For example, with the four-unit scheme, it would train inputs with
two, three, six, seven, ten, eleven, fourteen, and fifteen objects
to turn on the third unit and other inputs to turn that unit off.
We probably wouldn't want these numbers to be treated as similar to
one another.
But there is another, more sensible, distributed way of
representing outputs: a "thermometer" encoding in which each
output turns for all numbers equal to or greater than some value.
This makes larger numbers resemble each other more than they
resemble smaller numbers.
Cognitive Science at Indiana University | Fall 2004