If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Prisoners' dilemma and Nash equilibrium

The "prisoner's dilemma" is a concept that describes a situation in which two people have competing incentives that lead them to choose a suboptimal outcome. In the classic example, two prisoners can each choose to confess or not to a crime, and their decisions will determine the length of their sentences. The best outcome for both is to stay silent, but the possibility that the other will confess leads them both to incriminate themselves. Created by Sal Khan.

Want to join the conversation?

Video transcript

On the same day, police have made two at first unrelated arrests. They arrest a gentleman named Al. And they caught him red handed selling drugs. So it's an open and shut case. And the same day, they catch a gentleman named Bill. And he is also caught red handed, stealing drugs. And they bring them separately to the police station. And they tell them, look, this is an open and shut case. You're going to get convicted for drug dealing and you're going to get two years. And they tell this to each of them individually. They were selling the same type of drugs, just happened to be that. But they were doing it completely independently. Two years for drugs is what's going to happen assuming nothing else. But then the district attorney has a chance to chat with each of these gentleman separately and while he's chatting with them he reinforces the idea this is an open and shut case for the drug dealing. They're each going to get two years, if nothing else happens. But then he starts to realize that these two characters look like-- he starts to have a suspicion, for whatever reason, that these were the two characters that actually committed a much more serious offense. That they had committed a major armed robbery a few weeks ago. And all the district attorney has to go on is his hunch, his suspicion. He has no hard evidence. So what he wants to do is try to get a deal with each of these guys so that they have an incentive to essentially snitch on each other. So what he tells each of them is, look, you're going to get two years for drug dealing. That's kind of guaranteed. But he says, look, if you confess and the other doesn't then you will get 1 year. And the other guy will get 10 years. So he's telling Al, look, we caught Bill, too, just randomly today. If you confess that it was you and Bill who performed that armed robbery your term is actually going to go down from two years to one year. But Bill is obviously going to have to spend a lot more time in jail. Especially because he is not cooperating with us. He is not confessing. But then, the other statement is also true. If you deny and the other confesses now it switches around. You will get 10 years, because you're not cooperating. And the other, your co-conspirator, will get a reduced sentence-- will get the one year. So this is like telling Al, look, if you deny that you were the armed robber and Bill snitches you out, then you're going to get 10 years in prison. And Bill's only going to get one year in prison. And if both of you essentially confess, you will both get three years. So this scenario is called the prisoner's dilemma. Because we'll see in a second there is a globally optimal scenario for them where they both deny and they both get two years. But we'll see, based on their incentives, assuming they don't have any unusual loyalty to each other-- and these are hardened criminals here. They're not brothers or related to each other in any way. They don't have any kind of loyalty pact. We'll see that they will rationally pick, or they might rationally pick, a non-optimal scenario. And to understand that I'm going to draw something called a payoff matrix. So let me do it right here for Bill. So Bill has two options. He can confess to the armed robbery or he can deny that he had anything-- that he knows anything about the armed robbery. And Al has the same two options. Al can confess and Al can deny. And since it's called a payoff matrix, let me draw some grids here. Let me draw some grids and let's think about all of the different scenarios and what the payoffs would be. If Al confesses and Bill confesses then we're in scenario four. They both get three years in jail. So they both will get three for Al and three for Bill. Now, if Al confesses and Bill denies, then we are in scenario two from Al's point of view. Al is only going to get one year. But Bill is going to get 10 years. Now, if the opposite thing happens, if Bill confesses and Al denies, then it goes the other way around. Al's going to get 10 years for not cooperating. And Bill's going to have a reduced sentence of one year for cooperating. And then if they both deny, they're in scenario one, where they're both just going to get their time for the drug dealing. So Al will get two years, and Bill will get two years. Now, I alluded to this earlier in the video. What is the globally optimal scenario for them? Well, it's this scenario, where they both deny having anything to do with the armed robbery. Then they both get two years. But what we'll see is actually somewhat rational, assuming that they don't have any strong loyalties to each other, or strong level of trust with the other party, to not go there. And it's actually rational for both of them to confess. And the confession is actually a Nash equilibrium. And we'll talk more about this, but a Nash equilibrium is where each party has picked a choice given the choices of the other party. So when we think of, or each party has to pick the optimal choice, given whatever choice the other party picks. And so from Al's point of view, he says, well, look, I don't know whether Bill is confessing or denying. So let's say he confesses. What's better for me to do? If he confesses, and I confess, then I get three years. If he confesses and I deny I get 10 years. So if he confesses, it's better for me to confess as well. So this is a preferable scenario to this one down here. Now, I don't know that Bill confessed. He might deny. If I assume Bill denied, is it better for me to confess and get one year or deny and get two years? Well, once again, it's better for me to confess. And so regardless of whether Bill confesses or denies, so this once again, the optimal choice for Al to pick, taking into account Bill's choices, is to confess. If Bill confesses, Al is better off confessing. And if Bill denies, Al is better off confessing. Now, we look at it from Bill's point of view. And it's completely symmetric. If Bill says, well, I don't know if Al is confessing or denying. If Al confesses, I can confess and get three years or I can deny and get 10 years. Well, three years in prison is better than 10. So I will go-- I would go for the three years if I know Al is confessing. But I don't know that Al is definitely confessing. He might deny. If Al is denying, I could confess and get one year or I could deny and get two years. Well, once again, I would want to confess and get the one year. So Bill, taking into account each of the scenarios that Al might take, it's always better for him to confess. And so this is interesting. They are rationally deducing that they should get to this scenario, this Nash equilibrium state, as opposed to this globally optimal state. They're both getting three years by both confessing as opposed to both of them getting two years by both denying. The problem with this one is this is an unstable state. If one of them assumes that the other one has-- if one of them assumes that they're somehow in that state temporarily, they say, well, I can always improve my scenario by changing what I want to do. If Al thought that Bill was definitely denying, Al could improve his circumstance by moving out of that state and confessing and only getting one here. Likewise, if Bill thought that maybe Al is likely to deny, he realizes that he can optimize by moving in this direction. Instead of denying, getting, two and two, he could move in that direction right over there. So this is an unstable optimal scenario. But this Nash equilibrium, this state right over here, is actually very, very, very stable. If they assume, it's better for each of them to confess regardless of what the other ones does. And assuming all of the other actors have chosen their strategy, there's no incentive for Bill. So if assuming everyone else has changed their strategy, you can only move in that direction. If you're Bill, you can go from the Nash equilibrium of confessing to denying, but you're worse off. So you won't want to do that. Or you could move in this direction, which would be Al changing his decision. But once again, that gives a worse outcome for Al. You're going from three years to 10 years. So this is the equilibrium state, the stable state, that both people will pick something that is not optimal globally.