Main content

## Nash equilibrium

Current time:0:00Total duration:9:21

# Prisoners' dilemma and Nash equilibrium

## Video transcript

On the same day,
police have made two at first unrelated arrests. They arrest a
gentleman named Al. And they caught him red
handed selling drugs. So it's an open and shut case. And the same day, they catch
a gentleman named Bill. And he is also caught red
handed, stealing drugs. And they bring them separately
to the police station. And they tell them, look,
this is an open and shut case. You're going to get
convicted for drug dealing and you're going
to get two years. And they tell this to
each of them individually. They were selling the
same type of drugs, just happened to be that. But they were doing it
completely independently. Two years for drugs
is what's going to happen assuming nothing else. But then the district
attorney has a chance to chat with each of
these gentleman separately and while he's
chatting with them he reinforces the idea this
is an open and shut case for the drug dealing. They're each going to get two
years, if nothing else happens. But then he starts to realize
that these two characters look like-- he starts to have a
suspicion, for whatever reason, that these were the two
characters that actually committed a much
more serious offense. That they had committed a major
armed robbery a few weeks ago. And all the district
attorney has to go on is his hunch, his suspicion. He has no hard evidence. So what he wants to do is
try to get a deal with each of these guys so that
they have an incentive to essentially
snitch on each other. So what he tells each
of them is, look, you're going to get two
years for drug dealing. That's kind of guaranteed. But he says, look,
if you confess and the other doesn't
then you will get 1 year. And the other guy
will get 10 years. So he's telling Al, look,
we caught Bill, too, just randomly today. If you confess that
it was you and Bill who performed that
armed robbery your term is actually going to go down
from two years to one year. But Bill is obviously
going to have to spend a lot
more time in jail. Especially because he is
not cooperating with us. He is not confessing. But then, the other
statement is also true. If you deny and the
other confesses now it switches around. You will get 10 years, because
you're not cooperating. And the other, your
co-conspirator, will get a reduced sentence--
will get the one year. So this is like
telling Al, look, if you deny that you
were the armed robber and Bill snitches
you out, then you're going to get 10 years in prison. And Bill's only going to
get one year in prison. And if both of you
essentially confess, you will both get three years. So this scenario is called
the prisoner's dilemma. Because we'll see
in a second there is a globally optimal scenario
for them where they both deny and they both get two years. But we'll see, based
on their incentives, assuming they don't have any
unusual loyalty to each other-- and these are hardened
criminals here. They're not brothers or related
to each other in any way. They don't have any
kind of loyalty pact. We'll see that they
will rationally pick, or they might rationally
pick, a non-optimal scenario. And to understand that I'm
going to draw something called a payoff matrix. So let me do it
right here for Bill. So Bill has two options. He can confess to
the armed robbery or he can deny that
he had anything-- that he knows anything
about the armed robbery. And Al has the same two options. Al can confess and Al can deny. And since it's called
a payoff matrix, let me draw some grids here. Let me draw some grids
and let's think about all of the different scenarios
and what the payoffs would be. If Al confesses and
Bill confesses then we're in scenario four. They both get three
years in jail. So they both will get three
for Al and three for Bill. Now, if Al confesses
and Bill denies, then we are in scenario two
from Al's point of view. Al is only going
to get one year. But Bill is going
to get 10 years. Now, if the opposite
thing happens, if Bill confesses and
Al denies, then it goes the other way around. Al's going to get 10
years for not cooperating. And Bill's going
to have a reduced sentence of one year
for cooperating. And then if they both deny,
they're in scenario one, where they're both just going
to get their time for the drug dealing. So Al will get two years,
and Bill will get two years. Now, I alluded to this
earlier in the video. What is the globally
optimal scenario for them? Well, it's this scenario,
where they both deny having anything to do
with the armed robbery. Then they both get two years. But what we'll see is
actually somewhat rational, assuming that they don't
have any strong loyalties to each other, or strong level
of trust with the other party, to not go there. And it's actually rational
for both of them to confess. And the confession is
actually a Nash equilibrium. And we'll talk more about
this, but a Nash equilibrium is where each party has
picked a choice given the choices of the other party. So when we think
of, or each party has to pick the optimal
choice, given whatever choice the other party picks. And so from Al's point of
view, he says, well, look, I don't know whether Bill
is confessing or denying. So let's say he confesses. What's better for me to do? If he confesses, and I confess,
then I get three years. If he confesses and I
deny I get 10 years. So if he confesses, it's better
for me to confess as well. So this is a preferable
scenario to this one down here. Now, I don't know
that Bill confessed. He might deny. If I assume Bill
denied, is it better for me to confess and get one
year or deny and get two years? Well, once again, it's
better for me to confess. And so regardless of whether
Bill confesses or denies, so this once again, the
optimal choice for Al to pick, taking into account Bill's
choices, is to confess. If Bill confesses, Al is
better off confessing. And if Bill denies, Al
is better off confessing. Now, we look at it from
Bill's point of view. And it's completely symmetric. If Bill says, well, I don't know
if Al is confessing or denying. If Al confesses, I can
confess and get three years or I can deny and get 10 years. Well, three years in
prison is better than 10. So I will go-- I would
go for the three years if I know Al is confessing. But I don't know that Al
is definitely confessing. He might deny. If Al is denying, I could
confess and get one year or I could deny
and get two years. Well, once again, I
would want to confess and get the one year. So Bill, taking into account
each of the scenarios that Al might take, it's always
better for him to confess. And so this is interesting. They are rationally
deducing that they should get to this scenario,
this Nash equilibrium state, as opposed to this
globally optimal state. They're both getting three years
by both confessing as opposed to both of them getting
two years by both denying. The problem with this one is
this is an unstable state. If one of them assumes that the
other one has-- if one of them assumes that they're somehow
in that state temporarily, they say, well, I can
always improve my scenario by changing what I want to do. If Al thought that Bill
was definitely denying, Al could improve
his circumstance by moving out of that state
and confessing and only getting one here. Likewise, if Bill thought that
maybe Al is likely to deny, he realizes that he can optimize
by moving in this direction. Instead of denying,
getting, two and two, he could move in that
direction right over there. So this is an unstable
optimal scenario. But this Nash equilibrium,
this state right over here, is actually very,
very, very stable. If they assume, it's
better for each of them to confess regardless of
what the other ones does. And assuming all
of the other actors have chosen their strategy,
there's no incentive for Bill. So if assuming everyone else
has changed their strategy, you can only move
in that direction. If you're Bill, you can go
from the Nash equilibrium of confessing to denying,
but you're worse off. So you won't want to do that. Or you could move in
this direction, which would be Al changing
his decision. But once again, that gives
a worse outcome for Al. You're going from three
years to 10 years. So this is the equilibrium
state, the stable state, that both people will pick
something that is not optimal globally.