Part Two: The AI Safety Problem

In Part One in this blog series, we talked about the exciting potential for robotic clinicians and hospital staff. For Part Two we’re going to explore a thought experiment that reveals one of the major barriers towards our fully-automated utopia.

Safety First

The often-referenced Isaac Asimov classic, I, Robot, explores this hypothetical: what happens when the definitions and interpretations of robot commands are different to what AI developers intended? The “Three Laws” of that story are sometimes invoked when discussing how to make AI safer, but the fatal limitations of those very laws are explored within the same story.

I, Robot builds on a principle of protecting humans first and foremost. Beyond achieving functional goals, our robots need to know how to respond to an unexpected encounter, for instance with a human. We must ultimately teach our robot how to recognise living creatures, and to protect them as a higher priority than their given task.  A robo-orderly making a bed who encounters a sleeping person must sacrifice its primary purpose for the sake of safety.

Bill Gates once half-jokingly said “I choose a lazy person to do a hard job, because a lazy person will find an easy way to do it”, and this holds true to some extent with AI. What he’s describing is essentially the process of optimisation, how do we accomplish a task in the most cost-effective way?

Value sets or reward functions, learned or explicit, encode the priorities of our artificially intelligent assistant. Our robot must place a high value on their primary task – say making the bed – while placing an even higher value on keeping people safe. The challenge we face in managing safety is that even for a fairly trivial task, the possible tradeoffs are unlimited. How much risk is too much? What about pets? Or valuable items? Just like a human, if you command something to avoid all risk, they could freeze and not achieve anything. This brings us to what we call the AI Safety Problem and recognise that bringing AI into physical autonomous forms is still a long way off, but not for technical reasons alone.

Expecting the Unexpected

A true generalised Artificial Intelligence will have the power of a super computer driving its cognitive and decision making processes, as well as access to an almost limitless knowledge set – something far exceeding the power of a human brain. What this means in real world terms is that we should anticipate that an AI may develop ideas at a rate much faster than that of a human mind, and ones that deviate drastically from our own thinking or social consensus. The Spike Jonze film ‘Her’ depicts this in an interesting way. In a relatively short timeframe, an artificial intelligence transforms from what is essentially a sophisticated chat-bot, to something resembling true consciousness, to a transcendent hivemind that can no longer understand or relate to humans. While, in the film, this development took place over a few months, it’s entirely possible that an AI, once switched on, will undergo a similar development in an alarmingly short timeframe and form ideations and choices that we couldn’t possibly anticipate. 

A news story came out recently discussing an unusual development in chatbot AI being developed and utilised by Facebook. In this instance, the bots were engaged in experimental negotiations, learning how to debate and form arguments. It crossed into unsettling territory when the bots developed their own form of misdirection, learning how to essentially lie to its interlocutor in order to control the flow of negotiation. In one instance the programmers had to make some adjustments as they found that chatbots were switching into non-human language to communicate, leaving observers unsure of what was being said. AI can, and very well may, make use of falsehood or misdirection and its reasons for doing so could be obvious or totally inscrutable to us.

This comes back to our AI Safety Problem. Researchers will always need to maintain the ability to physically shut down our experiments. The twist here however is that since shutting down would contravene with an intelligence’s ability to perform its role, we also need to factor in the fact that an AI may be likely to try and prevent itself being shutdown either by force or persuasion. Think of Hal from 2001: A Space Odyssey, determined to fulfill a task and subverting safeguards around protection of human beings for the sake of that utility. Ultimately, the AI agent actually needs to have the selfless (and not commonly human) value where it would sacrifice its own existence for the sake of others.

This leads on to the subject of the third and final part of this series, that is AI: A Reflection of Ourselves. It’s not as scary as this problem may sound and the solutions to the problem are many and multi-faceted. To learn more about Machine Learning in the meantime, check out our Machine Learning Report below.