By experimenting, computers are figuring out how to do things that no programmer could teach them.
Inside a simple computer
simulation, a group of self-driving cars are performing a crazy-looking
maneuver on a four-lane virtual highway. Half are trying to move from
the right-hand lanes just as the other half try to merge from the left.
It seems like just the sort of tricky thing that might flummox a robot
vehicle, but they manage it with precision.
I’m watching the driving simulation at the biggest artificial-intelligence conference of the year, held in Barcelona this past December. What’s most amazing is that the software governing the cars’ behavior wasn’t programmed in the conventional sense at all. It learned how to merge, slickly and safely, simply by practicing. During training, the control software performed the maneuver over and over, altering its instructions a little with each attempt. Most of the time the merging happened way too slowly and cars interfered with each other. But whenever the merge went smoothly, the system would learn to favor the behavior that led up to it.
This approach, known as reinforcement learning, is largely how AlphaGo, a computer developed by a subsidiary of Alphabet called DeepMind, mastered the impossibly complex board game Go and beat one of the best human players in the world in a high-profile match last year. Now reinforcement learning may soon inject greater intelligence into much more than games. In addition to improving self-driving cars, the technology can get a robot to grasp objects it has never seen before, and it can figure out the optimal configuration for the equipment in a data center.
Reinforcement learning copies a very simple principle from nature. The psychologist Edward Thorndike documented it more than 100 years ago. Thorndike placed cats inside boxes from which they could escape only by pressing a lever. After a considerable amount of pacing around and meowing, the animals would eventually step on the lever by chance. After they learned to associate this behavior with the desired outcome, they eventually escaped with increasing speed.
Some of the very earliest artificial-intelligence researchers believed that this process might be usefully reproduced in machines. In 1951, Marvin Minsky, a student at Harvard who would become one of the founding fathers of AI as a professor at MIT, built a machine that used a simple form of reinforcement learning to mimic a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy Reinforcement Computer, or SNARC, consisted of dozens of tubes, motors, and clutches that simulated the behavior of 40 neurons and synapses. As a simulated rat made its way out of a virtual maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behavior.
There were few successes over the next few decades. In 1992, Gerald Tesauro, a researcher at IBM, demonstrated a program that used the technique to play backgammon. It became skilled enough to rival the best human players, a landmark achievement in AI. But reinforcement learning proved difficult to scale to more complex problems. “People thought it was a cool idea that didn’t really work,” says David Silver, a researcher at DeepMind in the U.K. and a leading proponent of reinforcement learning today.
That view changed dramatically in March 2016, however. That’s when AlphaGo, a program trained using reinforcement learning, destroyed one of the best Go players of all time, South Korea’s Lee Sedol. The feat was astonishing, because it is virtually impossible to build a good Go-playing program with conventional programming. Not only is the game extremely complex, but even accomplished Go players may struggle to say why certain moves are good or bad, so the principles of the game are difficult to write into code. Most AI researchers had expected that it would take a decade for a computer to play the game as well as an expert human.
I’m watching the driving simulation at the biggest artificial-intelligence conference of the year, held in Barcelona this past December. What’s most amazing is that the software governing the cars’ behavior wasn’t programmed in the conventional sense at all. It learned how to merge, slickly and safely, simply by practicing. During training, the control software performed the maneuver over and over, altering its instructions a little with each attempt. Most of the time the merging happened way too slowly and cars interfered with each other. But whenever the merge went smoothly, the system would learn to favor the behavior that led up to it.
This approach, known as reinforcement learning, is largely how AlphaGo, a computer developed by a subsidiary of Alphabet called DeepMind, mastered the impossibly complex board game Go and beat one of the best human players in the world in a high-profile match last year. Now reinforcement learning may soon inject greater intelligence into much more than games. In addition to improving self-driving cars, the technology can get a robot to grasp objects it has never seen before, and it can figure out the optimal configuration for the equipment in a data center.
Reinforcement learning copies a very simple principle from nature. The psychologist Edward Thorndike documented it more than 100 years ago. Thorndike placed cats inside boxes from which they could escape only by pressing a lever. After a considerable amount of pacing around and meowing, the animals would eventually step on the lever by chance. After they learned to associate this behavior with the desired outcome, they eventually escaped with increasing speed.
Some of the very earliest artificial-intelligence researchers believed that this process might be usefully reproduced in machines. In 1951, Marvin Minsky, a student at Harvard who would become one of the founding fathers of AI as a professor at MIT, built a machine that used a simple form of reinforcement learning to mimic a rat learning to navigate a maze. Minsky’s Stochastic Neural Analogy Reinforcement Computer, or SNARC, consisted of dozens of tubes, motors, and clutches that simulated the behavior of 40 neurons and synapses. As a simulated rat made its way out of a virtual maze, the strength of some synaptic connections would increase, thereby reinforcing the underlying behavior.
There were few successes over the next few decades. In 1992, Gerald Tesauro, a researcher at IBM, demonstrated a program that used the technique to play backgammon. It became skilled enough to rival the best human players, a landmark achievement in AI. But reinforcement learning proved difficult to scale to more complex problems. “People thought it was a cool idea that didn’t really work,” says David Silver, a researcher at DeepMind in the U.K. and a leading proponent of reinforcement learning today.
That view changed dramatically in March 2016, however. That’s when AlphaGo, a program trained using reinforcement learning, destroyed one of the best Go players of all time, South Korea’s Lee Sedol. The feat was astonishing, because it is virtually impossible to build a good Go-playing program with conventional programming. Not only is the game extremely complex, but even accomplished Go players may struggle to say why certain moves are good or bad, so the principles of the game are difficult to write into code. Most AI researchers had expected that it would take a decade for a computer to play the game as well as an expert human.
Comments
Post a Comment