The mere thought of a computer lying to you about something has boggled my brain ever since I heard it from a friend professor on a flight as an anecdote on what could happen next in AI. That one sentence took me on a long trip in a rabbit hole of a wide range of implications. I did not want to write on it first, not to be the one which saws that idea in the brain of people with bad intentions, but today I saw that (The AI Learned to Hide Data From Its Creators to Cheat at Tasks They Gave It) and I felt as if the cat was out of the bag. So here I go.
An underlying and maybe subliminal assumption people have while interacting with computers ever since they were invented is that computers say the truth. Computers may report incorrect results due to false or missing data or due to incorrect programming but I personally never assumed anything else may be going on. Excluding the case where a computer is only used as a communications medium with other people. Systems, processes, organizations, and even societies dependent on computing assume computers are doing only what they were programmed to.
AI as a technology game-changer is slowly penetrating many systems and applications which are an inseparable part of our lives, playing the role of a powerful and versatile alternative brain. Replacing the rigid procedural decision making logic. This shift introduces a lot of new and unknown variables to the future of computing impacting the delicate balance our society is based on. Unknown variables which many times translate to fear such as in the case of the impact on the jobs market, the potential impact on human curiosity and productivity when everything will be automated, the threat of autonomous cybersecurity attacks, and of course the dreadful nightmares about machines making up their minds to eliminate humans. Some of the fears are grounded in reality and need to be tackled in the way we drive this transformation. Some are still in the science fiction zone. The more established fears are imagined in the realm of the known impact and known capabilities computers can potentially reach with AI. For example, if cars will be fully autonomous thanks to the ability to identify objects in digital vision and correlate it with map information and a database of past good and bad driving decisions then it may cause a shortage of jobs to taxi and truck drivers. This is a grounded concern. Still, there are certain human characteristics that we never imagined to be potentially transferred to AI. Maybe due to the idealistic view of AI as a purer form of humanity keeping only what seems as positive and useful. Deception is one of those traits we don’t want in AI. It is a trait that will change everything as we know about human to machine relationships as well as machine to machine relationships.
Although the research mentioned is far from being a general-purpose capability to employ deception as a strategy to achieve unknown means still, the mere fact deception is just another strategy to be programmed, evaluated, and selected by a machine in order to achieve its goals in a more optimized manner is scary.
This is an example of a side effect of AI that cannot be eliminated as it is implied by its underlying capabilities such as understanding the environmental conditions required to achieve a task and the ability to select a feasible strategy based on its tactical capabilities.