To return to the exercise , close this browser window.

Schedules of Reinforcement

Laboratory experiments on instrumental conditioning often use rats or pigeons. In most experiments, their food or water is restricted to make sure that a bit of food or a sip of water will be effective reinforcers. At the start of training every response is reinforced. This is called "continuous" reinforcement. On this schedule, the rats or pigeons quickly become satiated ("full"; "have enough") and stop responding for the food or water.

Schedules of Intermittent (or partial) reinforcement prevent satiation because they reinforce only a small fraction of the subject's (correct) responses.  Intermittent reinforcement uses a schedule that specifies which responses will get reinforcement.  The two most common kinds of schedules are ratio schedules and interval schedules.  Ratio schedules reinforce every nth response (every 2nd, 5th, 8th, 20th, etc. response).  Interval schedules reinforce the first response after a specified amount of time has passed since the last reinforcement (first response after 5 sec., 10 sec. 28 sec., etc. has passed)

Intermittent reinforcement makes an instrumental conditioned response much steadier and more resistant to extinction. This is true for humans as well as animals.  When subjects are switched from "continuous" reinforcement to extinction (no response reinforced), they extinguish quickly, often after less than 50 unreinforced responses. Extinction following intermittent reinforcement can take thousands, even tens of thousands, of responses.

Two processes contribute to the strong resistance to extinction that schedules of reinforcement produce. First, it is hard to detect the change from reinforcement to non-reinforcement. The switch from continuous reinforcement (every response followed by reinforcer) to extinction (no response followed by reinforcer) is quite obvious. But the switch from intermittent reinforcement (only a few responses followed by reinforcer) to extinction is hard to notice. Subjects do not expect reinforcement after most responses, so they don't when extinction has started.

Second, extinction is frustrating, and frustration is aversive (see asgn4x). When you learn to make a response for a reinforcement, you learn to expect that reinforcement. In extinction, the expected reinforcement fails to occur, which fits the proper definition of frustration (failing to receive an expected goal or reward). Therefore, each response during extinction is punished, which acts to suppress it. On an intermittent schedule of reinforcement, you learn not to expect reinforcement after most responses. Therefore, responses that don't produce reinforcement are not punished, so responding continues.

B. F. Skinner (1956) discovered this (then) surprising effect quite by accident. Early in his career, Skinner began his study of lever pressing by rats for food. One Saturday, he found he did not have enough food pellets made to last the weekend. He wanted to keep up the daily training of the rats, but making pellets was a long, tedious task. (In those days researchers made almost everything for themselves. Now they buy most of their supplies and equipment from commercial manufacturers.) So he decided to spread out the remaining pellets by reinforcing only the first response that occurred after one minute had passed since the last reinforcer had been delivered.

To his surprise, not only did his supply of pellets last, but the pressing behavior became stronger, steadier, and much more resistant to extinction, not weaker and less regular as one might expect. This accidental finding became a powerful research tool in behavioral analysis because it generates predictable behavior, which remains quite stable over long periods of time. This stability permits detailed study of individual subjects rather than averaging data from several subjects. Pharmaceutical companies have used this procedure to help identify potentially useful psychological active medications. For example, a signal that predicts a brief shock normally suppresses rats' normally steady responding for food reinforcement until after the shock has been delivered. Potential antianxiety drugs decrease or abolish this suppression.

Figure 1   Cumulative response curves generated by the four basic schedules of reinforcement.  The Y-axis is cumulative responses (each response moves the line up one step).  The blue slashes below each curve show when reinforcement occurred.  Note that the curve show many more responses than reinforcements,  Ratio schedules:  number of responses made determines when next response occurs.  Interval schedules:  passage of time determines when next response occurs.  Fixed schedules:  time or ratio remains the same.  Variable schedules:  time or ratio varies around an average value.

Figure 1 shows the response rates (responses/minute) on fixed and variable forms of the two basic schedules. The X-axis shows time.  Y-axis shows cumulative ( ~added up, as in cumulative grade point;  GPA is cumulative grade point divided by cumulative [total] hours) responses. The steeper the slop, the faster the responding; the straighter the line, the steadier the responding.

Fixed schedules deliver reinforcement on a constant (fixed) ratio or interval, eg., for the 20th response or for the first response after 28 seconds have passed since the last reinforcement.  Variable schedules deliver reinforcement on a schedule that varies around an average value (after every 5th, 10th, 20th or 50th response with an average of 20 responses, or after every 4th, 12th, 28th or 54th second with an average of 28 seconds).

Some schedules produce very steady rates of responding for many hours (the variable interval schedule in Figure 1-4u). Others produce very high rates of responding (the ratio schedules in Figure 1-4u). Others produce highly predictable variations in response rate (the fixed interval schedule in Figure 1-4u).

Each pattern reflects the reinforcement schedule.  Ratio schedules generate high rates of responding (steep slope on cumulative record) because the faster the subject wors , the sooner it earns the next reinforcer.  Interval schedules have shallow slopes, because reinforcement is limited by the scheduled time interval between reinforcements.  A fixed interval schedule produces a scalloped cumulative record (See Figure 1-3e), because a reinforcement signals that no response will be reinforced for several seconds.