How Artificial Intelligence Libratus Beat Poker Pros
Carnegie Mellon University developed an Artificial Intelligence called Libratus. It made history by defeating four poker pros in 20-day poker event. The event aptly called “Brains Vs. Artificial Intelligence: Upping the Ante” was held at the Rivers Casino in Pittsburg. Libratus beat each player in Heads-up and after a total of 120,000 hands, Libratus led the pros by a collective $1,766,250 in chips. The pros — Dong Kim, Jimmy Chou, Daniel McAulay and Jason Les split a $200,000 prize based on their respective performances during the event. Measured in milli-big blinds per hand (mbb/hand), a standard used by imperfect-information game AI researchers, Libratus defeated the humans by 147 mbb/hand or 14.7 big blinds per hand.
Libratus’ computed its’ strategy using the Pittsburgh Supercomputing Center’s Bridges computer. The developers of Libratus — Tuomas Sandholm, professor of computer science, and Noam Brown, a Ph.D. student in computer science said, “The techniques in Libratus do not use expert domain knowledge or human data and are not specific to poker,” Sandholm and Brown write in the paper. “Thus, they apply to a host of imperfect-information games.” Libratus achieved this feat by breaking the game into smaller parts and by adjusting its’ strategy throughout the tournament. They stated that it was not a matter of luck that their AI performed so well in a complex game like NLHE which has more than 10 raised to the power of 161 (1 followed by 161 zeroes) information sets. To give some perspective, that’s more combinations than the number of atoms in the universe.
Till date AI programs have defeated the best human players in chess, jeopardy, checkers, and Go. All these games have an immense number of information sets, and at any given point, both the players know the exact state of the game. Poker, however, is different; there is hidden information as well as a bluff factor. “The best AI’s ability to do strategic reasoning with imperfect information has now surpassed that of the best humans,” Sandholm said.
“The computer can’t win at poker if it can’t bluff,” said Frank Pfenning, head of the Computer Science Department in CMU’s School of Computer Science. He added, “This new milestone in artificial intelligence has implications for any realm in which information is incomplete and opponents sow misinformation, Business negotiation, military strategy, cybersecurity and medical treatment planning could all benefit from automated decision-making using a Libratus-like AI.” Pfenning added “Developing an AI that can do that successfully is a tremendous step forward scientifically and has numerous applications. Imagine that your smartphone will someday be able to negotiate the best price on a new car for you. That’s just the beginning.”
Libratus strategy includes three main modules, the first of which computes probable outcomes on a hand. As mentioned earlier; the number of informational sets is in excess of 10 followed by 161 zeros. Now, it creates a detailed strategy for the early streets of the hand and a rudimentary strategy for later streets. This strategy is called the blueprint strategy. “Intuitively, there is little difference between a king-high flush and a queen-high flush,” Brown says. “Treating those hands as identical reduces the complexity of the game and, thus, makes it computationally easier.” Libratus also groups similar bet sizes.
In later streets, a second module constructs a new, detailed computational abstraction based on the hand. It also computes a strategy for this subgame in real-time that balances strategies across different subgames using the blueprint strategy for guidance.
The third module is designed to improve upon the existing blueprint strategy as the competition progresses. Sandholm said, “AIs use machine learning to find mistakes in the opponent’s strategy and exploit them. But that also opens the AI to exploitation if the opponent shifts strategy. Instead, Libratus’ self-improver module analyzes opponents’ bet sizes to detect potential holes in Libratus’ blueprint strategy. Libratus then adds these missing decision branches, computes strategies for them, and adds them to the blueprint.”
Libratus also updates its’ strategy for each hand in such a way that any late changes only improve the strategy. “After play ended each day, a meta-algorithm analyzed what holes the pros had identified and exploited in Libratus’ strategy,” Sandholm said. “It then prioritized the holes and algorithmically patched the top three using the supercomputer each night. This is very different than how learning has been used in the past in poker. Typically, researchers develop algorithms that try to exploit the opponent’s weaknesses. In contrast, here the daily improvement is about algorithmically fixing holes in our own strategy.” Sandholm also said “The end-game solver has a perfect analysis of the cards,” he said.
Libratus utilized the power of approximately 600 of Bridges’ 846 compute nodes. Bridges total speed is 1.35 petaflops, about 7,250 times as fast as a high-end laptop and its memory is 274 Terabytes, a typical high-end laptop has 16GB.
“The techniques that we developed are largely domain independent and can thus be applied to other strategic imperfect-information interactions, including nonrecreational applications,” Sandholm and Brown conclude. “Due to the ubiquity of hidden information in real-world strategic interactions, we believe the paradigm introduced in Libratus will be critical to the future growth and widespread application of AI.”