Experts suggest that having AI systems try to outwit one another could help a person judge their intentions.
Someday, it might be perfectly normal to watch an AI system fight with itself.
The concept comes from researchers at OpenAI, a nonprofit founded by several Silicon Valley luminaries, including Y Combinator partner Sam Altman, LinkedIn chair Reid Hoffman, Facebook board member and Palantir founder Peter Thiel, and Tesla and SpaceX head Elon Musk.
The OpenAI researchers have previously shown that AI systems that train themselves can sometimes develop unexpected and unwanted habits. For example, in a computer game, an agent may figure out how to “glitch” its way to a higher score. In some cases it may be possible for a person to supervise the training process. But if the AI program is doing something impossibly complex, this might not be feasible. So the researchers suggest having two systems discuss a particular objective instead.
“We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences,” the researchers write in a blog post outlining the concept.