To return to the exercise , close this browser window.
Schedules of Reinforcement
Laboratory experiments on instrumental conditioning often use rats or pigeons. In most experiments,
their food or water is restricted to make sure that a bit of food or a sip of water will be effective
reinforcers. At the start of training every response is reinforced. This is called "continuous"
reinforcement. On this schedule, the rats or pigeons quickly become satiated ("full"; "have enough") and
stop responding for the food or water. Schedules of Intermittent (or partial) reinforcement prevent satiation
because they reinforce only a small fraction of the subject's (correct) responses.
Intermittent reinforcement uses a schedule that specifies which responses will get
reinforcement. The two most common kinds of schedules are ratio schedules and interval
schedules. Ratio schedules reinforce every nth response (every 2nd, 5th, 8th, 20th, etc. response). Interval schedules
reinforce the first response after a specified amount of time has passed since the last reinforcement
(first response after 5 sec., 10 sec. 28 sec., etc. has
passed)
Intermittent reinforcement makes an instrumental conditioned response much steadier and
more resistant to extinction. This is true for humans as well as animals. When
subjects are switched from "continuous" reinforcement to extinction (no response reinforced),
they extinguish quickly, often after less than 50 unreinforced responses. Extinction following intermittent
reinforcement can take thousands, even tens of thousands, of responses.
Two processes contribute to the strong resistance to extinction that schedules of reinforcement
produce. First, it is hard to detect the change from reinforcement to non-reinforcement. The switch from
continuous reinforcement (every response followed by reinforcer) to extinction (no response followed by
reinforcer) is quite obvious. But the switch from intermittent reinforcement (only a few responses
followed by reinforcer) to extinction is hard to notice. Subjects do not expect reinforcement after most
responses, so they don't when extinction has started. Second, extinction is frustrating, and frustration is aversive (see asgn4x). When you learn to make a
response for a reinforcement, you learn to expect that reinforcement. In extinction, the expected
reinforcement fails to occur, which fits the proper definition of frustration (failing to receive an expected
goal or reward). Therefore, each response during extinction is punished, which acts to suppress it.
On an intermittent schedule of reinforcement, you learn not to expect reinforcement after most
responses. Therefore, responses that don't produce reinforcement are not punished, so responding
continues. To his surprise, not only did his
supply of pellets last, but the pressing behavior became stronger, steadier,
and much more resistant to extinction, not weaker and less regular as one
might expect. This accidental finding became a powerful research tool in
behavioral analysis because it generates predictable behavior, which remains
quite stable over long periods of time. This stability permits detailed
study of individual subjects rather than averaging data from several subjects.
Pharmaceutical companies have used this procedure to help identify potentially
useful psychological active medications. For example, a signal that predicts
a brief shock normally suppresses rats' normally steady responding for
food reinforcement until after the shock has been delivered. Potential
antianxiety drugs decrease or abolish this suppression. Figure 1 shows the response rates (responses/minute) on fixed and variable forms of the two
basic schedules. The X-axis shows time. Y-axis shows cumulative (
~added up, as in cumulative grade point; GPA is cumulative
grade point divided by cumulative [total] hours) responses. The steeper the slop, the faster
the responding; the straighter the line, the steadier the responding. Fixed schedules deliver reinforcement on a constant (fixed) ratio or interval, eg., for the 20th
response or for the first response after 28 seconds have passed since the last reinforcement.
Variable schedules deliver reinforcement on a schedule that varies around an average value (after every
5th, 10th, 20th or 50th response with an average of 20 responses, or after every 4th, 12th, 28th or 54th
second with an average of 28 seconds).
Some schedules produce very steady rates of responding for many hours (the variable
interval schedule in Figure 1-4u). Others produce very high rates of responding (the ratio
schedules in Figure 1-4u). Others produce highly predictable variations in response rate (the fixed
interval schedule in Figure 1-4u).
Each pattern reflects the reinforcement schedule. Ratio schedules generate high rates of
responding (steep slope on cumulative record) because the faster the subject wors , the sooner it earns
the next reinforcer. Interval schedules have shallow slopes, because reinforcement is limited by
the scheduled time interval between reinforcements. A fixed interval schedule produces a
scalloped cumulative record (See Figure 1-3e), because a reinforcement signals that no response will be
reinforced for several seconds.
B. F. Skinner (1956) discovered
this (then) surprising effect quite by accident. Early in his career, Skinner
began his study of lever pressing by rats for food. One Saturday, he found
he did not have enough food pellets made to last the weekend. He wanted
to keep up the daily training of the rats, but making pellets was a long,
tedious task. (In those days researchers made almost everything for themselves.
Now they buy most of their supplies and equipment from commercial manufacturers.)
So he decided to spread out the remaining pellets by reinforcing only the
first response that occurred after one minute had passed since the last
reinforcer had been delivered.
Figure 1 Cumulative response curves generated by the four basic schedules of
reinforcement. The Y-axis is cumulative responses (each response moves the line up one
step). The blue slashes below each curve show when reinforcement occurred. Note that
the curve show many more responses than reinforcements, Ratio schedules: number of
responses made determines when next response occurs. Interval schedules: passage
of time determines when next response occurs. Fixed schedules: time or ratio remains the
same. Variable schedules: time or ratio varies around an average value.