# Local and distributed representations

## What they are

• In a local representational scheme, each thing being represented is assigned a single representational element, that is, in a neural network, a single unit in the input or output layer of the networks. Conversely each unit is associated with only one represented thing. So each time a local input is presented to a network, one unit is turned on, and all the others are off.
• In a distributed representation, each thing being represented involves more than one representational element (unit in a neural network). Conversely each unit is associated with more than one represented thing. Instead of representing things, the units represent features of things. So each time a distributed input is presented to a neural network, multiple units are turned on.

## What they're good for

• In a local representation scheme, there is no sharing between different represented things. We know the things are represented completely independently, and there can be no generalization on the basis of the similarity between things because there is no similarity, or seen another way, all things are equally different from one another.
• In a distributed representation scheme, the representations of the different things overlap. If the inputs to a neural network are distributed, the network responds on the basis of the similarity of a new input to all of the inputs it has been trained on. Thus generalization to new inputs is possible. If the outputs from a neural network are distributed, the network is being trained not just to treat different inputs as belonging to the same category but as having the same features.

## Designing representations

• Example 1: visual input with 3 pixels, 5 colors (red, green, blue, yellow, black)
• A local representation of the inputs would assign a single input unit to each possible visual input, that is, 5^3 = 125 units in all. No generalization to novel inputs would be possible.
• One distributed input scheme would assign a single input unit to each combination of pixel and color. This would require 5 X 3 = 15 input units. The network could generalize to novel inputs on the basis of their similarity to trained inputs. For example, if the network were presented for the first time with the pattern RED - RED - BLACK, it might be able to respond appropriately if it had been trained on patterns such as RED - GREEN - BLACK and BLUE - RED - BLACK, assuming that generalizations on the basis of similarities such as these are the right generalizations to make.
• In the previous scheme, color is still representated locally within each pixel. We could go further by using only 3 units to represent each color with the units signifiying red, green, and blue. This would require only 3 X 3 = 9 input units. Yellow in a given pixel would then be represented by turning on the red and green units among the three for that pixel. But note that yellow is always treated as similar to red and green in this scheme. For example, YELLOW - YELLOW - BLACK will be more similar to GREEN - RED - BLACK than it is to BLUE - BLUE - BLACK. This may or may not be the right kind of generalization to make.
• Example 2: network for counting objects in a visual scene
• An obvious way to represent the scenes (assuming color is not relevant) is by treating the input space as a direct representation of the visual space and turning on all of the units that correspond to objects in the scene. This is a distributed representation of scenes because for each scene multiple units are turned.
• On the basis of this first kind of representation, it will be difficult to learn to count though because of the invariance problem. Different scenes with the same number of objects will be very different from each other. In particular they will have very different numbers of units turned on (the number of units turned on depends on the size of the objects as well as how many there are). Therefore some sort of preprocessing can help. One possibility is to transform the raw inputs into patterns in which a single unit, perhaps at the center of mass, is turned on for each input object. This would still be distributed, but would be more local than the raw input.
• We also need a way to represent the numbers that are the output of the network. A local scheme would assign a single unit to each, a unit for one, a unit for two, etc. One problem with this scheme is that the number of possible numbers is limited by the number of units.
• It is also possible to represent the numbers in a distributed fashion. One possibility is to represent them in a binary fashion, with each output unit representing a power of 2. For example, with 4 output units, FOUR could be represented by 0 1 0 0 and SEVEN by 0 1 1 1. There would still be a limit on the numbers that could be represented, but more numbers could be represented with fewer units. Note, however, that this scheme would train the network to associate input patterns with features of numbers. For example, with the four-unit scheme, it would train inputs with two, three, six, seven, ten, eleven, fourteen, and fifteen objects to turn on the third unit and other inputs to turn that unit off. We probably wouldn't want these numbers to be treated as similar to one another.
• But there is another, more sensible, distributed way of representing outputs: a "thermometer" encoding in which each output turns for all numbers equal to or greater than some value. This makes larger numbers resemble each other more than they resemble smaller numbers.