Google’s DeepMind puts AI agents in Prisoner’s Dilemma to see if they fight or cooperate

Google’s DeepMind puts AI agents in Prisoner’s Dilemma to see if they fight or cooperate
HIGHLIGHTS

This is what happened...

Artificial Intelligence or AI, is the future of a smarter world, one that we hope to live in someday. AI is born when machines learn from their own experiences and make rational decisions based on those learnings, meanwhile following a set of predetermined rules. In a utopian world, everything from traffic management to transport, energy monitoring systems to households, phones to coffee makers, will have artificial intelligence infused into them. In this context, it is important to determine if these AI agents can coexist in situations involving social dilemmas.

DeepMind, the Alphabet owned subsidiary working of Google’s ambitious artificial intelligence projects, recently published a new study, which explores how AI agents handle situations involving social dilemmas. To describe the phenomenon, researchers at DeepMind refer to the age-old game of Prisoner’s Dilemma.

Imagine this – Two criminals are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutor lack sufficient evidence to convict the pair on the principal charge. They hope to get both sentenced to a year in prison on a lesser charge. Simultaneously, the prosecutor offer each prisoner a bargain. Each prisoner is given the opportunity either to: betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent. The offer is:

  • If A and B each betray the other, each of them serves 2 years in prison

  • If A betrays B but B remains silent, A will be set free and B will serve 3 years in prison (and vice versa)

  • If A and B both remain silent, both of them will only serve 1 year in prison (on the lesser charge)

In game theory, a rational prisoner will obviously defect in this game, choosing to rat out the other in exchange for freedom. Alternatively, they can both choose to cooperate, remain silent, and serve a smaller term on lesser charges. As a third option, the prisoners can both rat each other out and end up serving a longer sentence of 2 years each. “This paradox is what we refer to as a social dilemma,” Google writes in a blog post.

For an AI to either cooperate or defect, it will require complex behaviours and will need to learn, execute difficult sequences of actions. To see how different AI agents reacted to social dilemmas, Google decided to pit two of them against each other, in two different games.

Does this mean a more powerful AI is more destructive by nature?

In the Gathering game, Two agents, Red and Blue, roam a shared world and collect apples to receive positive rewards. They may also direct a beam at the other agent, “tagging them”, to temporarily remove them from the game, but this action does not trigger a reward. A visualisation of agents playing the gathering game can be seen below.

Using deep multi-reinforcement learning, Google let the AI agents play this game a couple of thousand times. When there were many apples in the environment, the agents decided to coexist peacefully. However, as the number of apples were reduced, the agents decided that it may be better for them to tag the other agent, to give themselves time on their own to collect the scarce apples. Introducing a third, more computationally powerful agent in the mix, researchers learnt that it would be more prone to tagging other AI agents, than cooperating. Does this mean a more powerful AI is more destructive by nature? Not necessarily.

In Wolfpack, another game which shares its characteristics with the Prisoner's Dilemma, two AI agents had to hunt a third in an obstacle-filled environment. Rewards are gained not only by the player that captures the prey, but also by all the players that are close to the prey. You can see the game below.

Here, researchers found that AIs with greater computational capacities were more cooperative in this case, as opposed to defecting, like they did in the Gathering game.

Does this mean that AI agents are unreliable and can choose to act differently to varying social dilemmas? Not really. What one can gather from these experiments is that these AI agents perform selfishly, based on the rules of the game. If the rules call for increased cooperation, the AI cooperates, and vice versa.

Google concluded its blogpost by saying, “In summary, we showed that we can apply the modern AI technique of deep multi-agent reinforcement learning to age-old questions in social science such as the mystery of the emergence of cooperation. We can think of the trained AI agents as an approximation to economics’ rational agent model “homo economicus”. Hence, such models give us the unique ability to test policies and interventions into simulated systems of interacting agents – both human and artificial.”

Adamya Sharma

Adamya Sharma

Managing editor, Digit.in - News Junkie, Movie Buff, Tech Whizz! View Full Profile

Digit.in
Logo
Digit.in
Logo