Last week I wrote about the prisoner’s dilemma, and a centralized, Hobbesian solution to that—essentially, to get people to cooperate you have to bring in an outside authority, like a monarch. This is the decentralized solution.
The decentralized solution to the prisoner’s dilemma has three elements:
- The game is repeated an unknown number of times
- The strategy is reciprocity—if A cooperates, B does too. If A defects, B does too.
- The shadow of the future is sufficiently long.
That “unknown” bit is important. If people know the game’s gonna end, and they know when, there’s no reason to develop trust with the other person. “Unknown” can mean infinite iterations, or just a percentage chance each time that the game will be replayed.
So if you’re going to play forever (or potentially forever), two basics strategies are always defecting (“All-D”), or always cooperating (“All-C.”) These strategies are useful for reference points, but they aren’t actually practical, because they aren’t reciprocal strategy—they’re not based on what the other person is doing. And in a normal-form game, what the other person does, combined with what you do, determines your payoff.
One reciprocal strategy is “Tit-for-Tat”—whatever the player did last turn, do that this turn.
We’re going to focus on the “Grim Trigger” strategy. God, that sounds badass. With Grim Trigger, you start out by always cooperating, but if the other player ever defects, you switch to All-D.
So if you know the other player is playing Grim Trigger, whether or not you should screw them over depends on your payoffs, and your cooperation threshold, which is calculated like so:
Cooperation threshold = (Temptation – Reward)/(Temptation – Penalty)
CT = (T-R)/(T-P)
Okay, don’t panic. I’ll try to explain.
You can think of the cooperation threshold as a measure of the willpower required to cooperate. If the CT is high, it takes a lot of willpower to cooperate instead of defecting. So what the equation is saying is that the greater the difference between the Temptation outcome (achieved by screwing the other guy over) and the Reward outcome (achieved when both players cooperate), the higher the CT is—the more willpower it’ll take to cooperate. And, the greater the difference between the Temptation outcome and the Penalty outcome (achieved when both players defect, which is the risk you run if you try and screw someone over) the lower the cooperation threshold. And that makes sense—if the temptation is way higher than the reward for cooperation, then of course it’s hard to not pull one over on the other person. But if the penalty for both players defecting is incredibly low (like, for example, mutually assured destruction), then it doesn’t take much willpower for people to decide that cooperation is best.
So how do we apply this? We can use it to figure out which strategy is best for a repeated game with no known end. The question we’re asking here is: if you know the other player will play GT, would you do strictly better by playing GT or by playing some other strategy. Well, in an infinite game there are infinite strategies, so how do we possibly compare all these infinite strategies?
“Calm yourselves, students!”
There are really only two kinds of strategy if you’re playing against someone using Grim Trigger.
- Don’t defect first—in this case, both players will get the reward forever.
- Defect first—and if this is the case, the player may as well defect at the beginning. And after that you should defect forever after.
So against Grim Trigger, the two strategies are All-C and All-D.
And how do you compare these two strategies? In the canonical prisoner’s dilemma, your payoffs for All-C are 3 + 3 + 3 … forever. For All-D they’re 4 + 2 + 2 … forever. Add ‘em up, they both equal infinity—and as Professor Dion would say, “Do not F around with infinity!” Really, he wrote that on the board. So which is better? All-D is better for the first round of course, but it’s eventually worse. So really, it depends on how much you care about the future.
If your present value or weight for the payoff received in the next round is 0, you’re impatient, and you should play All-D. Numerically it means you value the next round’s payoff as payoff * 0 — so nothing. If you weigh them at 1, then you value them as payoff * 1—essentially, the same value as the payoff for the current round. This value can be represented by δ, where 0 ≤ δ ≤ 1. It can also represent the likelihood that the players will play again, on the same scale—so again, at δ = 0 (if you’ll never play again), you should play All-D. At δ = 1 (you will play forever), you’d prefer to play All-C.
And these deltas build on one another. So if your δ is 0.5, you value next round’s value for the round after that at 0.25 (or 0.5 * 0.5) So we can go on forever with this or we can make a rough model with the formula of R/(1- δ) for All-C, and T + SP/(1- δ) for All-D. So if we solve for δ, we get (trust me on this, I don’t want to write it all out) …
It’s our cooperation threshold!
So if the δ of each player is at least as high as the cooperation threshold, both players will cooperate.
Oh, right, the decentralized solution. Basically, hope that people value the future as much, or close to as much as the present, and have them play forever. They should cooperate. We made it through the Cold War without needing Klaatu to come down as the Leviathan and tell us what’s what, right?
“I’m sorry, I almost introduced discussion.”