An Introduction to Operant Conditioning (Instrumental Conditioning)
Operant conditioning, also known as Instrumental Conditioning, informs us of the interaction between environmental stimuli and our behaviours. The term ‘operant’ stems from the idea that the individual learns through responding, or operating on the environment. The basic premise of instrumental conditioning is when a particular action results in a positive outcome it will be repeated. Conversely, when a behaviour results in a negative outcome it will not be repeated.
Thorndike… Cats, Puzzles and the Law of Effect
Operant conditioning was discovered by Edward L. Thorndike (1874-1949) who placed a hungry cat inside a ‘puzzle box’. The cat was able to escape and eat some food once it opened the door using a latch. When the cat was first introduced to the puzzle box, its behaviours were erratic and random. However, eventually the cat would accidentally activate the latch and free itself. After successive trials, the cat became more and more efficient at triggering the latch in order to escape the box and eat dinner. Thorndike called this process ‘learning by trial and accidental success’.
In subsequent writings on the matter, Thorndike proposed the Law of Effect1 – a relation between a behaviour and its consequences. This is one of Thorndike’s most influential contributions to the science of Psychology and has acted as a catalyst for a grand literature of research over the last century. One of the main proponents of the Law of Effect was the eminent behaviourist, B.F.Skinner (1904-1990).
It was, indeed, Skinner who coined the term Operant Conditioning2, whilst at Harvard University. Where Thorndike worked with his cat, Skinner worked with rats.
BF Skinner’s Operant Chamber
To study operant conditioning, BF Skinner invented the operant chamber (or Skinner Box) to study operant conditioning in a laboratory setting. The operant chamber used for rats was a box with a speaker, lights, lever and food chute. An alternative chamber used for pigeons featured a plastic disk (for the pigeon to peck at) instead of the level, which the rats are required to pull.
Pulling the lever will activated the food chute and deliver a pellet of food to the rat inside the box. For Skinner’s studies, the rats were fed just once a day in advance of the study to ensure they were hungry, and therefore motivated to seek food.
Skinner proposed a three-part instrumental learning process, which he called the three-term contingency. The three stages are:
- Discriminative Stimulus – this is the preceding event that sets the process for responding (e.g. telephone rings)
- Response – the discriminative event demands a responds (e.g. pick up the phone)
- Favourable Consequences – the response to the discriminative stimulus results in a favourable outcome, which strengthens the relation between the stimulus and response (e.g. you enjoy talking to your friend on the phone)
Shaping Behaviour and Operant Conditioning
As Thorndike initially observed, before a behaviour has been associated with a response, the animal (and indeed humans) act randomly until accidentally discovering an appropriate behaviour for the desired outcome. Skinner developed a process for shaping a rat’s behaviour – essentially luring them towards the desired behaviour. The end goal is for the rat to learn that pressing the lever will trigger the delivery of food. In order to shape this behaviour, Skinner first provided food whenever the rat looked in the direction of the lever. Then food was delivered when the rat made a move towards the lever. Following this stage, the rat had to touch the lever for food to be delivered. And finally the rat would learn to pull the lever in order to deliver the food. This is known as shaping behaviour and can be observed in human behaviour as well.
Consider the example of a primary school classroom. A four-year-old child might be praised for a fairly crude drawing of a person (a stick figure with a smiling face). By the time the child is six, the drawing must be a higher standard in order to receive the same outcome (praise). And at the age of eleven the child’s drawing must be even more competent still. This continues in the example of art education right the way through to professional standards. Step-by-step we guide the behaviour we desire by providing enforcement for attempts at the behaviour along the development path.
You don’t always get the outcome you expect. For example, you don’t catch a fish every time you go fishing, but sometimes you do, and that is enough to keep trying. This suggests that there is a sophisticated probability calculation involved with instrumental conditioning. Intermittent reinforcement can be sub-divided into four types:
- Fixed-ratio schedule – for example, every ten times a behaviour is conducted the reward will be presented.
- Variable-ratio schedule – for example, the response will be delivered on average every ten attempts. This means that sometimes it will take 5 attempts and other times it will take 15.
- Fixed-interval schedule – for example, the response will only be delivered once every ten minutes, however many times the behaviour is conducted.
- Variable-interval schedule – for example, the response will be delivered on average once every ten minutes. This means that sometimes it will be more, sometimes less.
Reinforcement, Punishment and Extinction
This is the provision of a positive outcome as a result of a behaviour. For example, Skinner’s rats received positive reinforcement because they received food when they pressed the lever. For humans, food is also an effective positive reinforcer (it led to the theory of cupboard love in developmental psychology). Money and other social rewards are also effect positive reinforcements for humans.
Negative reinforcement refers to the removal of a negative stimulus. For example, if a loud noise is being played (causing distress) and pressing a lever can mute the noise for 60 seconds, the behaviour of pressing the lever will be reinforced due to the removal of a negative stimulus (the noise).
It’s important not to get confused between negative reinforcement and punishment. Negative reinforcement removes something negative, whereas punishment adds something negative. For example, if a parent wants to teach their child not to touch a hot stove, they might shout at them if they try to touch the hot stove. The parent is therefore adding a negative punishment in order to discourage the behaviour.
Extinction is a decrease in the frequency of a previously reinforced behaviour because the reinforcements have ceased to follow the behaviour. As behaviour that is no longer reinforced decreases in frequency it is said to extinguish. Once the person or animal stops performing the behaviour it has become extinct. Extinction is different from forgetting. Forgetting is a result of a lack of rehearsal, whereas extinction is due to a lack of reinforcement. A person might stop answering the phone if every time they pick it up the person at the other end of the call hangs up – thereby the lack of a positive reinforcement (an enjoyable conversation with a friend) leads to the extinction of the behaviour (answering the phone when it rings).