05 - Learning

ENDURING ISSUES IN LEARNING

This chapter addresses how humans and other animals acquire new behaviors as a result of their experiences. Thus, it bears directly on the enduring issue of Stability versus Change (the extent to which organisms change over the course of their lives). The events that shape learning not only vary among different individuals (diversity-universality) but also are influenced by an organism's inborn characteristics (nature-nurture). Finally, some types of learning can affect our physical health by influencing how our body responds to disease (mind-body).

CLASSICAL CONDITIONING

Russian physiologist Ivan Pavlov (1849-1936) discovered classical (or Pavlovian) conditioning, a form of learning in which a response elicited by a stimulus becomes elicited by a previously neutral stimulus, almost by accident.

Classical conditioning is made up of four basic elements: the unconditioned stimulus, the unconditioned response, the conditioned stimulus, and the conditioned response. The unconditioned stimulus (US) is an event that automatically elicits a certain reflex reaction, which is the unconditioned response (UR). In Pavlov's studies, food in the mouth was the unconditioned stimulus, and salivation to it was the unconditioned response. The third element in classical conditioning, the conditioned stimulus (CS), is an event that is repeatedly paired with the unconditioned stimulus. At first, the conditioned stimulus does not elicit the desired response. But eventually, after repeatedly being paired with the unconditioned stimulus, the conditioned stimulus alone comes to trigger a reaction similar to the unconditioned response. This learned reaction is the conditioned response (CR).

Classical conditioning has been demonstrated in virtually every animal species, even cockroaches, bees, and sheep. You yourself may have inadvertently classically conditioned one of your pets. For instance, you may have noticed that your cat begins to purr when it hears the sound of the electric can opener running. For a cat, the taste and smell of food are unconditioned stimuli for a purring response. By repeatedly pairing the can opener whirring with the delivery of food, you have turned this sound into a conditioned stimulus that triggers a conditioned response.

Establishing a Classically Conditioned Response

It generally takes repeated pairings of an unconditioned stimulus (US) and a cue before the unconditioned response (UR) eventually becomes a conditioned response (CR). The likelihood or strength of the conditioned response (CR) increases each time these two stimuli are paired. This learning, however, eventually reaches a point of diminishing returns. The amount of each increase gradually becomes smaller, until finally no further learning occurs. The conditioned response is now fully established.

The spacing of pairings is also important in establishing a classically conditioned response. If pairings of the CS and US follow each other very rapidly, or if they are very far apart, learning the association is slower. If the spacing of pairings is moderate - neither too far apart nor too close together - learning occurs more quickly. It is also important that the CS and US rarely, if ever, occur alone. Pairing the CS and US only once in a while, called intermittent pairing, reduces both the rate of learning and the final strength of the learned response.

Classical Conditioning in Humans

Classical conditioning is as common in humans as it is in other animals. For example, some people learn fears through classical conditioning (EX: white rat, loud sound, Little Albert).

Perhaps you have noticed that the background music in films and TV shows becomes dark and "scary" when something frightening is about to happen. When the event actually occurs, your anxiety is already heightened. The scary music is the CS and your anxiety is the CR. A classic example of this is the film "Jaws" in which distinctive music (CS) was paired repeatedly with startling, gruesome shark attacks (US) that produced fear (UR). The conditioning worked much better than expected: the music (CS) did indeed come to elicit fear (UR), but many viewers also learned to associate simply being in the water (CS) with fear (O'Connor, 2013). As another example, many TV ads use humor to put you in a good mood. Then, at just the right moment, the ads show the sponsor's logo or name or product. By doing that repeatedly, the advertisers condition you to associate the sponsor's product (the CS) with positive emotions (the CR).

Several years after Little Albert was conditioned, psychologist Mary Cover Jones demonstrated a way that fears can be unlearned by means of classical conditioning (M. C. Jones, 1924). Her subject was a 3-year-old boy named Peter who, like Albert, had a fear of white rats. Jones paired the sight of a rat with an intrinsically pleasant experience - eating candy. While Peter sat alone in a room, a caged white rat was brought in and placed far enough away so that the boy would not be frightened. At this point, Peter was given candy to eat. On each successive day, the cage was moved closer, after which, Peter was given candy. Eventually, he showed no fear of the rat, even without any candy. In more recent times, research has shown that repeated exposure to violent video games desensitizes some players to violence and this, in turn, can lead to an increase in aggressive behavior (Anderson et al., 2010; Greitemeyer, 2014).

Psychiatrist Joseph Wolpe (1915-1997) adapted Jones's method to the treatment of certain kinds of anxiety (Wolpe, 1973, 1990). Wolpe reasoned that because irrational fears are learned (conditioned), they could also be unlearned through conditioning. He noted that it is not possible to be both fearful and relaxed at the same time. Therefore, if people could be taught to relax in fearful or anxious situations, their anxiety should disappear. Wolpe's desensitization therapy begins by teaching a system of deep-muscle relaxation. Then the person constructs a list of situations that prompt various degrees of fear or anxiety, from intensely frightening to only mildly so. A person with a fear of heights, for example, might construct a list that begins with standing on the edge of the Grand Canyon and ends with climbing two rungs on a ladder. While deeply relaxed, the person imagines the least distressing situation on the list first. If he or she succeeds in remaining relaxed, the person proceeds to the next item on the list, and so on until no anxiety is felt. In this way, classical conditioning is used to change an undesired reaction.

Classical Conditioning Is Selective

If people can develop phobias, or intense fears through classical conditioning, why don't we acquire phobias of virtually everything that is paired with harm? For example, many people get shocks from electric sockets, but almost no one develops a socket phobia. Why should this be the case?

Psychologist Martin Seligman (1971) has offered an answer: The key, he says, lies in the concept of preparedness. Some thigns readily become conditioned stimuli for fear responses because we are biologically prepared to learn those associations. Among the common objects of phobias are heights, snakes, and the dark. In our evolutionary past, fear of these potential dangers probably offered a survival advantage, and so a readiness to perceive such threats and to respond quickly with fear may have become "wired into" our species (Cole & Wilkins, 2013; LoBue, Rakison & DeLoache, 2010).

Preparedness also underlies conditioned taste aversion, a learned association between the taste of a certain food and a feeling of nausea and revulsion. Conditioned taste aversions are acquired very quickly. It usually takes only one pairing of a distinctive flavor and subsequent illness to develop a learned aversion to the taste of that food. Readily learning connections between distinctive flavors and illness has clear benefits. If we can quickly learn which foods are poisonous and avoid those foods in the future, we greatly increase our chances of survival. Other animals with a well-developed sense of taste, such as rats and mice, also readily develop conditioned taste aversions, just as humans do (Anderson, Varlinskaya, & Spear, 2010; Mita et al., 2014).

OPERANT CONDITIONING

Around the turn of the 20th century, while Pavlov was busy with his dogs, the American psychologist Edward Lee Thorndike (1874-1949) was using a "puzzle box," or simple wooden cage, to study how cats learn. Thorndike confined a hungry cat in the puzzle box, with food just outside where the cat could see and smell it. To get to the food, the cat had to figure out how to open the latch on the box door, a process that Thorndike timed. In the beginning, it took the cat quite a while to discover how to open the door. But on each trial, it took the cat less time, until eventually it could escape from the box in almost no time at all. Thorndike was a pioneer in studying the kind of learning that involves making a certain response due to the consequences it brings. This form of learning has come to be called operant or instrumental conditioning.

Elements of Operant Conditioning

One essential element in operant conditioning is emitted behavior. This is one way in which operant conditioning differs from classical conditioning. In classical conditioning, a response is automatically triggered by some stimulus, such as a loud noise automatically triggering fear. In this sense, classical conditioning is passive in that the behaviors are elicited by stimuli. In contrast, Thorndike's cats spontaneously tried to undo the latch on the door of the box. You spontaneously wave your hand to signal a taxi to stop. You voluntarily put money into machines to obtain food. These and similar actions are called operant behaviors because they involve "operating" on the environment.

A second essential element in operant conditioning is a consequence following a behavior. Thorndike's cats gained freedom and a piece of fish for escaping from the puzzle boxes. Consequences like this one, which increase the likelihood that a behavior will be repeated, are called reinforcers. In contrast, consequences that decrease the chances that a behavior will be repeated are called punishers. Imagine how Thorndike's cats might have acted had they been greeted by a large, snarling dog when they escaped from the puzzle boxes. Thorndike summarized the influence of consequences in his law of effect: Behavior that brings about a satisfying effect (reinforcement) is likely to be performed again, whereas behavior that brings about a negative effect (punishment) is likely to be suppressed. Contemporary psychologists often refer to the principle of reinforcement, rather than the law of effect, but the two terms mean the same thing.

Establishing an Operantly Conditioned Response

Because the behaviors involved in operant conditioning are voluntary, it is not always easy to establish an operantly conditioned response. The desired behavior must first be performed spontaneously in order for it to be rewarded and strengthened. Sometimes you can simply wait for this action to happen. Thorndike, for example, waited for his cats to trip the latch that opened the door to his puzzle boxes. Then he rewarded them with fish.

But when there are many opportunities for making irrelevant responses, waiting can be slow and tedious. If you were an animal trainer for a circus, imagine how long you would have to wait for a tiger to jump through a flaming hoop spontaneously so you could reward it. One way to speed up the process is to increase motivation. Even without food in sight, a hungry animal is more active than a well-fed one and so is more likely, just by chance, to make the response you're looking for. Another strategy is to reduce opportunities for irrelevant responses, as Thorndike did my making his puzzle boxes small and bare. Many researchers do the same thing by using Skinner boxes to train small animals in. A Skinner box (named after B. F. Skinner, another pioneer in the study of operant conditioning) is a small cage with solid walls that is relatively empty, except for a food cup and an activating device, such as a bar or a button. In this simple environment, it doesn't take long for an animal to press the button that releases food into the cup, thereby reinforcing the behavior.

Usually, however, the environment cannot be controlled so easily; hence a different approach is called for. Another way to speed up operant conditioning is to reinforce the successive approximations of the desired behavior. This approach is called shaping. To teach a tiger to jump through a flaming hoop, the trainer might first reinforce the animal simply for jumping on a pedestal. After that behavior has been learned, the tiger might be reinforced only for leaping from that pedestal to another. Next, the tiger might be required to jump through a hoop between the pedestals to gain a reward. And finally, the hoop is set on fire, and the tiger must leap through it to be rewarded.

As in classical conditioning, the learning of an operantly conditioned response eventually reaches a point of diminishing returns. After 25 trials, for instance, Thorndike's cats were escaping from the box no more quickly than they had been after 15 trials. The operantly conditioned response had then been fully established.

A Closer Look at Reinforcement

We have been talking about reinforcement as if all reinforcers are alike, but in fact this is not the case. Think about the kinds of consequences that would encourage you to perform some behavior. Certainly these include consequences that would give you something positive, like praise, recognition, or money. But the removal of some negative stimulus is also a good reinforce of behavior. When new parents discover that rocking a baby will stop the infant's persistent crying, they sit down and rock the baby deep into the night; the removal of the infant's crying is a powerful reinforcer.

These examples show that there are two kinds of reinforcers. Positive reinforcers, such as praise, add something rewarding to a situation, whereas negative reinforcers, such as stopping an aversive noise, subtract something unpleasant. Animals will learn to press bars and open doors not only to obtain food and water (positive reinforcement), but also to turn off a loud buzzer or an electric shock (negative reinforcement).

Both positive and negative reinforcement results in the learning of new behaviors or the strengthening of existing ones. Remember, in everyday conversation when we say that we have "reinforced" something, we mean that we have strengthened it. Similarly, in operant conditioning, reinforcement - whether positive or negative - always strengthens or encourages a behavior. A child might practice the piano because she or he receives praise for practicing (positive reinforcement) or because it gives her or him a break from doing tedious homework (negative reinforcement), but in either case the end result is a higher incidence of piano playing.

But what if a particular behavior is just accidentally reinforced because it happens by chance to be followed by some rewarding incident? Will the behavior still be more likely to occur again? B. F. Skinner (1948) showed that the answer is yes. He put a pigeon in a Skinner box and at fixed intervals dropped a few grains of food into the cup. The pigeon began repeating whatever it had been doing just before the food was given, such as standing on one foot. This action had nothing to do with getting the food, of course. But still the bird repeated it over and over again. Skinner called the bird's behavior superstitious, because it was learned in a way that is similar to how some human superstitions are learned. If you happen to be wearing an Albert Einstein T-shirt when you get your first A on an exam, you may come to believe that wearing this shirt was a factor. Even though the connection at first was pure coincidence, you may keep on wearing your "lucky" shirt to every test thereafter. Interestingly, there is evidence that such a superstition may actually improve your performance in the future by increasing your expectation that your efforts will be successful (Damisch, Stoberock, & Mussweiler, 2010; Michael, Garry, & Kirsch, 2012). In turn, the improved performance provides some positive reinforcement for continuing to engage in the superstitious behavior.

In the case of forming superstitions, reinforcement has an illogical effect on behavior, but that effect is generally harmless. Some psychologists believe that reinforcement can also lead inadvertently to negative results. They believe that offering certain kinds of reinforcers (candy, money, play time) for a task that could be intrinsically rewarding (that is, reinforcing in and of itself) can undermine the intrinsic motivation to perform it. People may begin to think that they are working only for the reward and lose enthusiasm for what they are doing. They may no longer see their work as an intrinsically interesting challenge in which to invest creative effort and strive for excellence. Instead, they may see work as a chore that must be done to earn some tangible payoff. This warning can be applied to many situations, such as offering tangible rewards to students for their work in the classroom, or giving employees a "pay for performance" incentive to meet company goals (Kohn, 1993; Rynes, Gerhart, & Parks, 2005).

Other psychologists, however, suggest that this concern about tangible reinforcers may be exaggerated. Although the use of rewards may sometimes produce negative outcomes, this is not always the case (Cameron, Banko, & Pierce, 2001). For example, children who were rewarded with stickers or praise for eating healthful vegetables that they initially disliked reported liking those vegetables more three months later. They also ate more of those vegetables when given a chance to eat as much or as little as they wished (Cooke et al., 2011). In fact, one extensive review of more than 100 studies showed that when used appropriately, rewards do not compromise intrinsic motivation, and under some circumstances, they may even help to encourage creativity (Eisenberger & Cameron, 1996; Selarta, NordstroKuvaas, & Takemura, 2008). For example, research has shown that rewarding highly creative behavior on one task often enhances subsequent creativity on other tasks (Eisenberger & Rhoades, 2001).

Punishment

Although we hate to be subjected to it, punishment is a powerful controller of behavior. After receiving a heavy fine for failing to report extra income to the IRS, we are less likely to make that mistake again. In this case, an unpleasant consequence reduces the likelihood that we will repeat a behavior. This is the definition of punishment.

Punishment is different from negative reinforcement. Reinforcement of whatever kind strengthens (reinforces) behavior. Negative reinforcement strengthens behavior by removing something unpleasant from the environment. In contrast, punishment adds something unpleasant to the environment; and as a result, it tends to weaken the behavior that caused it. If going skiing during the weekend rather than studying for a test results in getting an F, the F is an unpleasant consequence (a punisher) that makes you less likely to skip homework for ski time again.

The saying "spare the rod and spoil the child" suggests that physical punishment is an effective way of changing behavior. However, we can all think of instances when it doesn't seem to work. Children often continue to misbehave even after they have been punished repeatedly for a particular misbehavior.

Learned Helplessness

Have you ever met someone who has decided he will never be good at science? We have said that through avoidance training, people learn to prevent themselves from being punished, but what happens when such avoidance of punishment isn't possible? The answer is often a "giving-up" response that can generalize to other situations. This response is known as learned helplessness.

Martin Seligman and his colleagues first studied learned helplessness in experiments with dogs (Seligman & Maier, 1967). They placed two groups of dogs in chambers that delivered a series of electric shocks to the dogs' feet at random intervals. The dogs in the control group could turn off (escape) the shock by pushing a panel with their nose. The dogs in the experimental group could not turn off the shock - they were, in effect, helpless. Next, both the experimental and the control animals were placed in a different situation, one in which they could escape shock by jumping over a hurdle. A warning light always came on 10 seconds before each 50-second shock was given. The dogs in the control group quickly learned to jump the hurdle as soon as the warning light flashed, but the dogs in the experimental group didn't. These dogs, which had previously experienced unavoidable shocks, didn't even jump the hurdle after the shock started. They just lay there and accepted the shocks. Also, many of these dogs were generally listless, suffered loss of appetite, and displayed other symptoms associated with depression.

Many subsequent studies have shown that learned helplessness can occur both in animals and in humans. Once established, the condition generalizes to new situations and can be very persistent, even given evidence that an unpleasant circumstance can now be avoided. For example, when faced with a series of unsolvable problems, a college student may eventually give up trying and make only halfhearted efforts to solve new problems, even when the new problems are solvable. Moreover, success in solving new problems has little effect on the person's behavior. He or she continues to make only halfhearted tries, as if never expecting any success at all. Similarly, children raised in an abusive family, where punishment is unrelated to behavior, often develop a feeling of helplessness (C. Peterson & Bossio, 1989). Even in relatively normal setting outside their home, they often appear listless, passive, and indifferent. They make little attempt either to seek rewards or to avoid discomfort.

Shaping Behavioral Change Through Biofeedback

Patrick, an 8-year-old third grader, was diagnosed with attention-deficit disorder (ADD). He was unable to attend to what was going on around him, was restless, and was unable to concentrate. An EEG showed increased numbers of slow brain waves. After a course of 40 training sessions using special computer equipment that allowed Patrick to monitor his brain-wave activities, he learned how to produce more of the fast waves that are associated with being calm and alert. As a result, Patrick became much more "clued in" to what was going on around him and much less likely to become frustrated when things didn't go his way (Fitzgerald, 1999; Fuchs, Birbaumer, Lutzenberger, Gruzelier, & Kaiser, 2003; Monastra, 2008).

When operant conditioning is used to control certain biological functions, such as blood pressure, skin temperature, or heart rate, it is referred to as biofeedback. Insrtuments are used to measure particular biological responses - muscle contractions, blood pressure, and heart rate. Variations in the strength of the response are reflected in signals, such as light or tones. By using these signals, the person can learn to control the response through shaping. For example, Patrick learned to control his brain waves by controlling the movement of a Superman icon on a computer screen. When biofeedback is used to monitor and control brain waves, as in Patrick's case, it is referred to as neurofeedback (Hammond, 2011).

Biofeedback and neurofeedback have become well-established treatments for a number of medical problems, including migraine headaches (Kropp, Siniatchkin, & Gerber, 2005), hypertension (Reineke, 2008), and panic attacks (Meuret, Wilhelm, & Roth, 2004). Biofeedback has also been used by athletes, musicians, and other performers to control the anxiety that can interfere with their performance.

Biofeedback treatment does have some drawbacks. Learning the technique takes considerable time, effort, patience, and discipline. And it does not work for everyone. But it gives many patients control of their treatment, a major advantage over other treatment options, and it has achieved impressive results in alleviating certain medical problems.

FACTORS SHARED BY CLASSICAL AND OPERANT CONDITIONING

Despite the differences between classical and operant conditioning, these two forms of learning have many things in common. First, they both involve the learning of associations. In classical conditioning, it is a learned association between one stimulus and another, whereas in operant conditioning, it is a learned association between some action and a consequence. Second, the responses in both classical and operant conditioning are under control of stimuli in the environment. A classically conditioned fear might be triggered by the sight of a white rat; an operantly conditioned jump might be cued by the flash of a red light. In both cases, moreover, the learned responses to a cue can generalize to similar stimuli. Third, neither classically nor operantly conditioned responses will last forever if they aren't periodically renewed. This doesn't necessarily mean that they are totally forgotten, however. Even after you think that these responses have long vanished, either one can suddenly reappear in the right situation. And fourth, in both kinds of learning - classical and operant conditioning - new behaviors can build on previously established ones.

The Importance of Contingencies

Because classical and operant conditioning are both forms of associative learning, they both involve perceived contingencies. A contingency is a relationship in which one event depends on another. Graduating from college is contingent on passing a certain number of courses. In both classical and operant conditioning, perceived contingencies are very important.

Contingencies in Classical Conditioning

In classical conditioning, a contingency is perceived between the CS and the US. The CS comes to be viewed as a signal that the US is about to happen. This is why, in classical conditioning, the CS not only must occur in close proximity to the US, but also should precede the US and provide predictive information about it (Rescorla, 1966, 1967, 1988).

Scientists once believed that no conditioning would occur if the CS followed the US; this belief, however, turns out to be not entirely true. The explanation again lies in contingency learning. Imagine a situation in which a tone (the CS) always follows a shock (the US). This process is called backward conditioning. After a while, when the tone is sounded alone, the learner will not show a conditioned fear response to it. After all, the tone has never predicted that a shock is about to be given. But what the learner does show is a conditioned relaxation response to the sound of the tone, because the tone has served as a signal that the shock is over and will not occur again for some time. Again, we see the importance of contingency learning: The learner responds to the tone on the basis of the information that it gives about what will happen next.

Other studies similarly show that predictive information is crucial in establishing a classically conditioned response. In one experiment with rats, for instance, a noise was repeatedly paired with a brief electric shock until the noise soon became a conditioned stimulus for a conditioned fear response (Kamin, 1969). Then a second stimulus - a light - was added right before the noise. You might expect that the rat came to show a fear of the light as well, because it, too, preceded the shock. But this is not what happened. Apparently, the noise-shock contingency that the rat had already learned had a blocking effect on learning that the light also predicted shock. Once the rat had learned that the noise signaled the onset of shock, adding yet another cue (a light) provided no new predictive information about the shock's arrival, and so the rat learned to ignore the light. Classical conditioning, then, occurs only when a stimulus tells the learner something new or additional about the likelihood that a US will occur.

Contingencies in Operant Conditioning

Contingencies also figure prominently in operant conditioning. The learner must come to perceive a connection between performing a certain voluntary action and receiving a certain reward or punishment. If no contingency is perceived, there is no reason to increase or decrease the behavior.

But once a contingency is perceived, does it matter how often a consequence is actually delivered? When it comes to rewards, the answer is yes. Fewer rewards are often better than more. In the language of operant conditioning, partial or intermittent reinforcement results in behavior that will persist longer than behavior learned by continuous reinforcement. Why would this be the case? The answer has to do with expectations. When people receive only occasional reinforcement, they learn not to expect reinforcement with every response, so they continue responding in the hopes that eventually they will gain the desired reward. Vending machines and slot machines illustrate these different effects of continuous versus partial reinforcement. A vending machine offers continuous reinforcement. Each time you put in the right amount of money, you get something desired in return (reinforcement). If a vending machine is broken and you receive nothing for your coins, you are unlikely to put more money in it. In contrast, a casino slot machine pays off intermittently; only occasionally do you get something back for your investment. This intermittent payoff has a compelling effect on behavior. You might continue putting coins into a slot machine for a very long time even though you are getting nothing in return.

Psychologists refer to a pattern of reward payoffs as a schedule of reinforcement. Partial or intermittent reinforcement schedules are either fixed or variable, and they may be based on either the number of correct responses or the time elapsed between correct responses.

On a fixed-interval schedule, performance tends to fall off immediately after each reinforcement and then tends to pick up again as the time for the next reinforcement draws near. For example, when exams are given at fixed intervals - like a weekly quiz - students tend to decrease their studying right after one test is over and then increase studying as the next test approaches. On a variable-interval schedule the learner typically gives a slow, steady pattern of responses, being careful not to be so slow as to miss all the rewards. For example, if quizzes are given during a semester at unpredictable intervals, students have to keep studying at a steady rate, because on any given day there might be a test. On a fixed-ratio schedule, a brief pause after reinforcement is followed by a rapid and steady response rate until the next reinforcement. Finally, learners on a variable-ratio schedule tend not to pause after reinforcement and have a high rate of response over a long period of time. Because they never know when reinforcement may come, they keep on testing for a reward.

Extinction and Spontaneous Recovery

Another factor shared by classical and operant conditioning is that learned responses sometimes weaken and may even disappear. If a CS and a US are never paired again or if a consequence never follows a learned behavior, the learned association will begin to fade until eventually the effects of prior learning are no longer seen. This outcome is called extinction of a conditioned response.

Extinction and Spontaneous Recovery in Classical Conditioning

For an example of extinction in classical conditioning, let's go back to Pavlov's dogs. What would you predict happened over time when the dogs heard the bell (the CS), but food (the US) was no longer given? The conditioned response to the bell - salivation - gradually decreased until eventually it stopped altogether. The dogs no longer salivated when they heard the bell. Extinction had taken place.

Once such a response has been extinguished, is the learning gone forever? Pavlov trained his dogs to salivate when they heard a bell, then extinguished this conditioned response. A few days later, the dogs were exposed to the bell again in the laboratory setting. As soon as they heard it, their mouths began to water. The response that had been learned and then extinguished reappeared on its own with no retraining. This phenomenon is known as spontaneous recovery. The dogs' response was now only about half as strong as it had been before the extinction, and it was very easy to extinguish a second time. Nevertheless, the fact that the response occurred at all indicated that the original learning was not completely forgotten.

How can extinguished behavior disappear and then reappear later? The explanation is that extinction occurs because new learning interferes with a previously learned response. New stimuli in other settings come to be paired with the conditioned stimulus; and these new stimuli may elicit responses different from (and sometimes incompatible with) the original conditioned response. For example, if you take a break from watching the latest horror movies in theaters and instead watch reruns of classic horror films on television, these classic films may seem so amateurish that they make you laugh rather than scare you. Here you are learning to associate the scary music in such films with laughter, which in effect opposes your original fear response. The result is interference and extinction. Spontaneous recovery consists of overcoming this interference. For instance, if you return to the theater to see the latest horror movie, the conditioned response of fear to the scary music may suddenly reappear. It is as if the unconditioned stimulus of watching "up-to-date" horror acts as a reminder of your earlier learning and renews your previous classically conditioned response. Such "reminder" stimuli work particularly well when presented in the original conditioning setting.

Extinction and Spontaneous Recovery in Operant Conditioning

Extinction and spontaneous recovery also occur in operant conditioning. In operant conditioning, extinction happens as a result of withholding reinforcement. The effect usually isn't immediate. In fact, when reinforcement is first discontinued, there is often a brief increase in the strength or frequency of responding before a decline sets in. For instance, if you put coins in a vending machine and it fails to deliver the goods, you may push the button more forcefully and in rapid succession before you finally give up.

Just as in classical conditioning, extinction in operant conditioning doesn't completely erase what has been learned. Even though much time has passed since a behavior was last rewarded and the behavior seems extinguished, it may suddenly reappear. This spontaneous recovery may again be understood in terms of interference from new behaviors. If a rat is no longer reinforced for pressing a lever, it will start to engage in other behaviors - turning away from the lever, attempting to escape, and so on. These new behaviors will interfere with the operant response of lever pressing, causing it to extinguish. Spontaneous recovery is a brief victory of the original learning over interfering responses. The rat decides to give the previous "reward" lever one more try, as if testing again for a reward.

The difficulty of extinguishing an operantly conditioned response depends on a number of factors:

    • Strength of the original learning. The stronger the original learning, the longer it takes the response to extinguish. If you spend many hours training a puppy to sit on command, you will not need to reinforce this behavior very often once the dog grows up.

    • Pattern of reinforcement. As you learned earlier, responses that were reinforced only occasionally when acquired are usually more resistant to extinction than responses that were reinforced every time they occurred.

    • Variety of settings in which the original learning took place. The greater the variety of settings, the harder it is to extinguish the response. Rats trained to run several different types of alleys in order to reach a food reward will keep running longer after food is withdrawn than will rats trained in a single alley.

    • Complexity of the behavior. Complex behavior is much more difficult to extinguish than simple behavior is. Complex behavior consists of many actions put together, and each of those actions must be extinguished in order for the whole to be extinguished.

    • Learning through punishment versus reinforcement. Behaviors learned through punishment rather than reinforcement are especially hard to extinguish. If you avoid jogging down a particular street because a vicious dog there attacked you, you may never venture down that street again, so your avoidance of the street may never extinguish.

One way to speed up the extinction of an operantly conditioned response is to put the learner in a situation that is different from the one in which the response was originally learned. The response is likely to be weaker in the new situation, and therefore it will extinguish more quickly. Of course, when the learner is returned to the original learning setting after extinction has occurred elsewhere, the response may undergo spontaneous recovery, just as in classical conditioning. But now the response is likely to be weaker than it was initially, and it should be relatively easy to extinguish once and for all. You may have experienced this phenomenon yourself when you returned home for the holidays after your first semester in college. A habit that you thought you had outgrown at school may have suddenly reappeared. The home setting worked as a "reminder" stimulus, encouraging the response, just as we mentioned when discussing classical conditioning. Because you have already extinguished the habit in another setting, however, extinguishing it at home shouldn't be difficult.

Stimulus Control, Generalization, and Discrimination

The home setting acting as a :"reminder" stimulus is just one example of how conditioned responses are influenced by surrounding cues in the environment. This outcome is called stimulus control, and it occurs in both classical and operant conditioning. In classical conditioning, the conditioned response (CR) is under the control of the conditioned stimulus (CS) that triggers it. Salivation, for example, might be controlled by the sound of a bell. In operant conditioning, the learned response is under the control of whatever stimuli come to be associated with the delivery of reward or punishment. A leap to avoid electric shock might come under the control of a flashing light, for instance. In both classical and operant conditioning, moreover, the learner may respond to cues that are merely similar (but not identical) to the ones that prevailed during the original learning. This tendency to respond to similar cues is known as stimulus generalization.

Generalization and Discrimination in Classical Conditioning

There are many examples of stimulus generalization in classical conditioning. One example is the case of Little Albert, who was conditioned to fear white rats. When the experimenters later showed him a white rabbit, he cried and tried to crawl away, even though he had not been taught to fear rabbits. He also showed fear of other white, furry objects like cotton balls, a fur coat, and even a Santa Claus mask. Little Albert had generalized his learned reactions from rats to similar stimuli. In much the same way, a person who learned to feel anxious over math tests in grade school might come to feel anxious about any task involving numbers, even balancing a checkbook.

Stimulus generalization is not inevitable, however. Through a process called stimulus discrimination, learners can be trained not to generalize, but rather to make a conditioned response only to a single specific stimulus. This process involves presenting several similar stimuli, only one of which is followed by the unconditioned stimulus. For instance, Albert might have been shown a rat and other white, furry objects, but only the rat would be followed by a loud noise (the US). Given this procedure, Albert would have learned to discriminate the white rat from the other objects, and the fear response would not have generalized as it did.

Learning to discriminate is essential in everyday life. We prefer for children to learn not to fear every loud noise and every insect, but only those that are potentially harmful. Through stimulus discrimination, behavior becomes more finely tuned to the demands of our environment.

Generalization and Discrimination in Operant Conditioning

Stimulus generalization also occurs in operant conditioning. A baby who is hugged and kissed for saying "Mama" when he sees his mother may begin to call everyone "Mama." Although the person whom the baby sees - the stimulus - changes, he responds with the same word.

In operant conditioning, responses, too, can be generalized, not just stimuli. For example, the baby who calls everyone "Mama" may also call people "Nana." His learning has generalized to other sounds that are similar to the correct response. This is called response generalization. Response generalization doesn't occur in classical conditioning. If a dog is taught to salivate when it hears a high-pitched tone, it will salivate less when it hears a low-pitched tone, but the response is still salivation.

Just as discrimination is useful in classical conditioning, it is also useful in operant conditioning. Learning what to do has little value if you do not know when to do it. Learning that a response is triggered is pointless if you do not know which response is right. Discrimination training in operant conditioning consists of reinforcing only a specific, desired response and only and the presence of a specific stimulus. With this procedure, pigeons have been trained to peck at a red disc, but not at a green one. First they are taught to peck at a disc. Then they are presented with two disks, one red and one green. They get food when they peck at the red one, but not when they peck at the green. Eventually, they learn to discriminate between the two colors, pecking only at red.

New Learning Based on Original Learning

There are other ways, besides stimulus generalization and discrimination, that original learning can serve as the basis for new learning. In classical conditioning, an existing conditioned stimulus can be paired with a new stimulus to produce a new conditioned response. This is called higher-order conditioning. In operant conditioning, objects that have no intrinsic value can nevertheless become reinforcers because of their association with other, more basic reinforcers. These learned reinforcers are called secondary reinforcers.

Higher-Order Conditioning

Pavlov demonstrated higher-order conditioning with his dogs. After the dogs had learned to salivate when they heard a bell, Pavlov used the bell (without food) to teach the dogs to salivate at the sight of a black square. Instead of showing them the square and following it with food, he showed them the square and followed it with the bell until the dogs learned to salivate when they saw the square alone. In effect, the bell served as a substitute unconditioned stimulus, and the black square became a new conditioned stimulus. This procedure is known as higher-order conditioning not because it is more complex than other types of conditioning or because it incorporates any new principles, but simply because it is conditioning based on previous learning.

Higher-order conditioning is difficult to achieve because it is battling against extinction of the original conditioned response. The unconditioned stimulus no longer follows the original conditioned stimulus and that is precisely the way to extinguish a classically conditioned response. During higher-order conditioning, Pavlov's dogs were exposed to the square followed by the bell, but no food was given. Thus, the square became a signal that the bell would not precede food, and soon all salivation stopped. For higher-order conditioning to succeed, the unconditioned stimulus must be occasionally reintroduced. Food must be given once in a while after the bell sounds so that the dogs will continue to salivate when they hear the bell.

Secondary Reinforcers

Some reinforcers, such as food, water, and sex, are intrinsically rewarding in and of themselves. These are called primary reinforcers. No prior learning is required to make them reinforcing. Other reinforcers have no intrinsic value. They have acquired value only through association with primary reinforcers. These are the secondary reinforcers we mentioned earlier. They are called secondary not because they are less important, but because prior learning is needed before they will function as reinforcers.

For humans, money is one of the best examples of a secondary reinforce. Although money is just paper or metal, through its exchange value for primary reinforcers, it becomes a powerful reinforcer. Children come to value money only after they learn that it will buy such things as candy (a primary reinforcer). Then the money becomes a secondary reinforcer. And through the principle of higher-order conditioning, stimuli paired with a secondary reinforcer can acquire reinforcing properties. Checks and credit cards, for example, are one step removed from money, but they also can be highly reinforcing.

Summing Up

Classical and operant conditioning both entail forming associations between stimuli and responses, and perceiving contingencies between one event and another. Both are subject to extinction and spontaneous recovery, as well as to stimulus control, generalization, and discrimination. The main difference between the two is that in classical conditioning, the learner is passive and the behavior involved is usually involuntary, whereas in operant conditioning, the learner is active and the behavior involved is usually voluntary.

COGNITIVE LEARNING

Some psychologists insist that because classical and operant conditioning can be observed and measured, they are the only legitimate kinds of learning to study scientifically. But others contend that mental activities are crucial to learning and so can't be ignored. How do you grasp the layout of a building from someone else's description of it? How do you enter into memory abstract concepts like conditioning and reinforcement? You do all these things and many others through cognitive learning - the mental processes that go on inside us when we learn. Cognitive learning is impossible to observe and measure directly, but it can be inferred from behavior, and so it is also a legitimate topic for scientific study.

Latent Learning and Cognitive Maps

Interest in cognitive learning began shortly after the earliest work in classical and operant conditioning. In the 1930s, Edward Chance Tolman, one of the pioneers in the study of cognitive learning, argued that we do not need to show our learning in order for learning to have occurred. Tolman called learning that isn't apparent because it is not yet demonstrated latent learning. Tolman studied latent learning in a famous experiment (Tolman & Honzik, 1930).

Since Tolman's time, much work has been done on the nature of latent learning regarding spatial layouts and relationships. From studies of how animals or humans find their way around a maze, a building, or a neighborhood with many available routes, psychologists have proposed that this kind of learning is stored in the form of a mental image, or cognitive map. When the proper time comes, the learner can call up the stored image and put it to use.

In response to Tolman's theory of latent learning, Thorndike proposed an experiment to test whether a rat could learn to run a maze and store a cognitive image of the maze without experiencing the maze firsthand. He envisioned researchers carrying each rat through the maze in a small wire-mesh container and then rewarding the rat at the end of each trial as if it had run the maze itself. He predicted that the rat would show little or no evidence of learning as compared with rats that had learned the same maze on their own through trial and error. Neither he nor Tolman ever conducted the experiment.

Two decades later, however, researchers at the University of Kansas did carry out Thorndike's idea (McNamara, Long, & Wike, 1956). But instead of taking the passive rats through the "correct" path, they carried them over the same path that a free-running rat had taken in that maze. Contrary to Thorndike's prediction, the passenger rats learned the maze just as well as the free-running rats. They did, however, need visual cues to learn the maze's layout. If carried through the maze only in the dark, they later showed little latent learning.

More recent research confirms this picture of cognitive spatial learning. Animals show a great deal more flexibility in solving problems than can be explained by simple conditioning (Collett & Graham, 2004; Zentall, 2013). In experiments using rats in a radial maze, rats are able to recall which arms of the maze contain food, even when scent cues are removed (Grandchamp & Schenk, 2006). Moreover, when the configuration of the maze is repeatedly changed, the rats not only quickly adapt but also remember previous maze configurations (J. Tremblay & Cohen, 2005). Studies such as these suggest that the rats develop a cognitive map of the maze's layout (Save & Poucet, 2005). Even in rats, learning involves more than just a new behavior "stamped in" through reinforcement. It also involves the formation of new mental images and constructs that may be reflected in future behavior.

Insight and Learning Sets

During World War I, the German Gestalt psychologist Wolfgang Kor conducted a classic series of studies into another aspect of cognitive learning: sudden insight into a problem's solution. Outside a chimpanzee's cage, Kor placed a banana on the ground, not quite within the animal's reach. When the chimp realized that it couldn't reach the banana, it reacted with frustration. But then it started looking at what was in the cage, including a stick left there by Kor. Sometimes quite suddenly the chimp would grab the stick, poke it through the bars of the cage, and drag the banana within reach. The same kind of sudden insight occurred when the banana was hung from the roof of the cage, too high for the chimp to grasp. This time the cage contained some boxes, which the chimp quickly learned to stack up under the banana so that it could climb up to pull the fruit down. Subsequent subsidies have shown that even pigeons under certain conditions can display insight (Aust & Huber, 2006; Stephan & Bugnyar, 2013).

Previous learning can often be used to help solve problems through insight. This was demonstrated by Harry Harlow in a series of studies with rhesus monkeys (Harlow, 1949). Harlow presented each monkey with two boxes - say, a round green box on the left side of a tray and a square red box on the right side. A morsel of food was put under one of the boxes. The monkey was permitted to lift just one box; if it chose the correct box, it got the food. On the next trial the food was put under the same box (which had been moved to a new position), and the monkey again got to choose just one box. Each monkey had six trials to figure out that the same box covered the food no matter where that box was located. Then the monkeys were given a new set of choices - say, between a blue triangular box and an orange oval one - and another six trials, and so on with other shapes and colors of boxes. The solution was always the same: The food was invariably under only one of the boxes. Initially, the monkeys chose boxes randomly, sometimes finding the food, sometimes not. After a while, however, their behavior changed: In just one or two trials, they would find the correct box, which they chose consistently thereafter until the experimenter changed the boxes. They seemed to have learned the underlying principle - that the food would always be under the same box - and they used that learning to solve almost instantly each new set of choices given.

Harlow concluded that the monkeys "learned how to learn," that is, they had established a learning set regarding this problem: Within the limited range of choices available to them, they had discovered how to tell which box would give the reward. Similarly, Kor's chimps could be said to have established a learning set regarding how to get the food that was just out of reach. When presented with a new version of the problem, they simply called upon past learning in a slightly different situation (reaching a banana on the ground versus reaching one hanging from the ceiling). In both Harlow's and Kor's studies, the animals seemed to have learned more than just specific behaviors. They had apparently learned how to learn. More recent studies confirm learning sets can be formed by other species of primates such as capuchin and rhesus monkeys (Beran, 2008), and even by rats (Bailey, 2006).

Learning by Observing

The first time you drove a car you successfully turned the key in the ignition, put the car in gear, and pressed the gas pedal without having ever done any of those things before. How were you able to do that without step-by-step shaping of the correct behaviors? The answer is that you had often watched other people driving, a practice that made all the difference. There are countless things we learn by watching other people and listening to what they say. This process is called observational or vicarious learning, because although we are learning, we don't have to do the learned behaviors firsthand; we merely view or hear the modeled behavior. Observational learning is a form of "social learning," in that it involves interaction with other people. Psychologists who study it are known as social learning theorists.

Observational learning is very common. In fact, recent evidence shows that young children often "over imitate" - slavishly following what they are shown to do, even when that is not the most effective way to behave (Horner & Whiten, 2005; Zimmer, 2005). By watching other people who model new behavior we can learn such things as how to start a lawn mower and how to saw wood. Research has shown that we can even learn bad habits, such as smoking, by watching actors smoke in a movie (Dal Cin, Gibson, Zanna, Shumate, & Fong, 2007; Heatherton & Sargent, 2009). When the Federal Communications Commission (FCC) banned cigarette commercials on television, it was acting on the belief that providing models of smokers would prompt people to imitate smoking. It is hard for deaf children to learn spoken language because they have no auditory model of correct speech.

Of course, we do not imitate everything that other people do. Why are we selective in our imitation? There are several reasons (Bandura, 1977, 1986). First, we can't pay attention to everything going on around us. The behaviors we are most likely to imitate are those that are modeled by someone who commands our attention (as does a famous or attractive person, or an expert). Second, we must remember what a model does in order to imitate it. If a behavior isn't memorable, it won't be learned. Third, we must make an effort to convert what we see into action. If we have no motivation to perform an observed behavior, we probably won't show what we've learned. This is a distinction between learning and performance, which is crucial to social learning theorists: We can learn without any change in overt behavior that demonstrates our learning. Whether or not we act depends on our motivation.

One important motivation for acting is the kind of consequences associated with an observed behavior - that is, the rewards or punishments it appears to bring. These consequences do not necessarily have to happen to the observer. They may happen simply to other people whom the observer is watching. This is called vicarious reinforcement or vicarious punishment, because the consequences aren't experienced firsthand by the learner: They are experienced through other people. If a young teenager sees adults drinking and they seem to be having a great deal of fun, the teenager is experiencing vicarious reinforcement of drinking and is much more likely to imitate it.

The foremost proponent of social learning theory is Albert Bandura, who refers to his perspective as a social cognitive theory (Bandura, 1986, 2004). In a classic experiment, Bandura (1965) showed that people can learn a behavior without being reinforced directly for it and that learning a behavior and performing it are not the same thing. Three groups of nursery schoolchildren watched a film in which an adult model walked up to an adult-size plastic inflated doll and ordered it to move out of the way. When the doll failed to obey, the model became aggressive, pushing the doll on its side, punching it in the nose, hitting it with a rubber, kicking it around the room, and throwing rubber balls at it. However, each group of children saw a film with a different ending. Those in the model-rewarded condition saw the model showered with candies, soft drinks, and praise by a second adult (vicarious reinforcement). Those in the model-punished condition saw the second adult shaking a finger at the model, scolding and spanking him (vicarious punishment). And those in the no-consequence condition saw nothing happen to the model as a result of his aggressive behavior.

Immediately after seeing the film, the children were individually escorted into another room where they found the same large inflated doll, rubber balls, and mallet, as well as many other toys. Each child played alone for 10 minutes, while observers behind a one-way mirror recorded the number of imitated aggressive behaviors that the child spontaneously performed in the absence of any direct reinforcement for those actions. After 10 minutes, an experimenter entered the room and offered the child treats in return for imitating things the model had done. This was a measure of how much the child had previously learned from watching the model, but perhaps hadn't yet displayed.

All the children had learned aggressive actions from watching the model, even though they were not overly reinforced for that learning. When later offered treats to copy the model's actions, they all did so quite accurately. The children tended to suppress their inclination spontaneously to imitate an aggressive model when they had seen that model punished for aggression. This result was especially true of girls. Apparently, vicarious punishment provided the children with information about what might happen to them if they copied the "bad" behavior. Vicarious reinforcement similarly provides information about likely consequences, but in this study, its effects were not large. For children this age (at least those not worried about punishment), imitating aggressive behavior toward a doll seems to have been considered "fun" in its own right, even without being associated with praise and candy. This outcome was especially true for boys.

This study has important implications regarding how not to teach aggression unintentionally to children. Suppose that you want to get a child to stop hitting other children. You might think that slapping the child as punishment would change the behavior, and it probably would suppress it to some extent. But slapping the child also demonstrates that hitting is an effective means of getting one's way. So slapping not only provides a model of aggression; it also provides a model associated with vicarious reinforcement. Perhaps this is why children who experience corporal punishment are more likely to imitate the violent behavior of their parents when they become adults (Barry, 2007). You and the child would both be better off if the punishment given for hitting was not a similar form of aggression and if the child could also be rewarded for showing appropriate interactions with others (Gershoff & Bitensky, 2007).

Social learning theory's emphasis on expectations, insights, and information broadens our understanding of how people learn. According to social learning theory, humans use their powers of observation and thought to interpret their own experiences and those of others when deciding how to act. Moreover, human beings are capable of setting performance standards for themselves and then rewarding (or punishing) themselves for achieving or failing to achieve those standards as a way to regulate their own behavior. This important perspective can be applied to the learning of many different things, from skills and behavioral tendencies to attitudes, values, and ideas.

Cognitive Learning in Nonhumans

We have seen that classical and operant conditioning are no longer viewed as purely mechanical processes that can proceed without at least some cognitive activity. Moreover, animals are capable of latent learning, learning cognitive maps, and insight, all of which involve cognitive processes. Do nonhuman animals also exhibit other evidence of cognitive learning? The answer seems to be a qualified yes.

For example, in the wild, chimpanzees learn to use long sticks to fish for termites by watching their mothers (Lonsdorf, 2005; O'Malley, Wallauer, Murray, & Goodall, 2012). Capuchin monkeys have shown they can benefit from watching the mistakes of other monkeys that made unsuccessful attempts at opening a container (Kuroshima, Kuwahata, & Fujita, 2008). Some female dolphins in Australia cover their sensitive beaks with sponges when foraging for food on the sea floor, a skill they apparently learn by imitating their mothers (Kro et al., 2005). Meerkats have been observed teaching their young how to hunt and handle difficult prey (A. Thornton, 2008). And even rats that watch other rats try a novel or unfamiliar food without negative consequences show an increased tendency to eat the new food (Galef & Whiskin, 2004; Galef, Dudley, & Whiskin, 2008). These results, along with reports that animals as diverse as chickens and octopi, whales and bumblebees learn by watching others, further support the notion that nonhuman animals do indeed learn in ways that support the cognitive theory of learning.