Something very strange and disturbing happened to me this week. If it was just relevant to me, it wouldn’t be that important (except perhaps to me), and I wouldn’t be writing this column about it. But it’s something that is likely more important and more ominous than we can even imagine.
There are already common fraudulent schemes being perpetrated by both telephone and internet. One know as the “Grandparent Scam” is particularly reprehensible, first because it is perpetrated on elderly people who are, in general, more susceptible to tech-savvy criminals and second because it is based on the manipulation of familial love, trust, and compassion. The criminal running the Grandparent Scam calls or emails the victim, pretending to represent a grandchild who is now in trouble with the law or who needs money for a hospital bill for an injury that can’t be discussed, say, with parents, because of the moral trouble that might ensue. They generally call late at night—say at four in the morning—because that adds to the confusion. The preferred mechanism of money movement is wire transfer—and that’s a warning: don’t transfer money by wire without knowing for certain who is receiving it, because once it’s gone, it’s not coming back.
Now what if it was possible to conduct such a scam using the actual voice of the hypothetical victim? Worse, what if was possible to do so with voice and video image, indistinguishable from the real thing? If we’re not at that point now (and we probably are) we will be within months.
In April of this year, a company called Coding Elite exposed an artificial intelligence (AI) program that to a substantial sample of my voice, which is easily accessible on the YouTube lectures and podcasts that I have posted over the last years. In consequence, they were able to duplicate my manner of speaking with exceptional precision, starting out by producing versions of me rapping Eminem songs such as Lose Yourself (which has now garnered 250,000 views) and Rap God (which has only garnered 17,000) as well as Rock Lobster (1400 views). They have done something similar with Bernie Sanders (singing Dancing Queen), Donald Trump (Sweet Dreams) and Ben Shapiro, who also delivered Rap God. The company has a model, the address of which you can find on their YouTube channel, which allows the use to make Trump, Obama, Clinton or Sanders say anything whatsoever.
I happen to think Rap God is an amazing piece of work, and when I first encountered my verbal avatar belting out the lyrics I thought that it was cool, in a teenage tech-geek sort of way. And I suppose it was. This caused quite a stir on the net in April, with media companies such as Forbes and Motherboard (a division of Vice) noting that the machine learning technology only required six hours of original audio (that is, actually generated by me) to produce its credible fakes, matching rhythm, stress, sound and prose intonation.
This week, however, a company called notjordanpeterson.com put an AI engine online that allows anyone to type anything and have it reproduced in my voice. It’s hard to get access to or use the site, at the moment, presumably because it is currently attracting more traffic than its servers can handle. [NOTE: As of August 23, this website posted the following announcement: In light of Dr. Peterson’s response to the technology demonstrated by this site, which you can read here, and out of respect for Dr. Peterson, the functionality of the site will be disabled for the time being.]
A variety of sites that pass themselves off as news portals—and sometimes are—have either reported this story straight (Sputnik News) or had a field day (Gizmodo) having me read, for example, the SCUM manifesto (hypothetically an acronym for Society for Cutting Up Men), a radical feminist rant by Valerie Solanos published in 1967. Solanos, by the way, later shot the artist Andy Warhol, an act, driven by her developing paranoia. He was seriously wounded, requiring a surgical corset to hold his organs in place for the rest of his life. TNW takes a middle path, reporting the facts of the situation with little bias but using the system to have me voice very vulgar phrases.
Some of you might know—and those of you who don’t should—that similar technology has also been developed for video. This was reported, for example, by BBC, as far back in July of 2017, who broadcast a speech delivered by an AI Obama, that was essentially indistinguishable from the real thing. Similar technology has been used, equally notoriously, to superimpose the faces of famous actresses on porn stars, while they perform their various sexual exploits (you can find this story covered, for example, on The Verge, Jan 24, 2018). Movies have also been reshot so that the main actor is transformed from someone unknown to someone with real box office draw. This has happened, for example, to Nicolas Cage, primarily on a YouTube site known as Derpfakes, a play on the phrase “Deep Fakes,” which is what the video recordings created fraudulently by AI have come to be known. More recently Ctrl Shift Face, a YouTube channel, posted a video showing Bill Hader transforming very subtly into Tom Cruise as he performs an impression of the latter on Dave Letterman’s show. It’s picked up four million views in a week. It’s important to note, by the way, that this ability is available to amateurs. I don’t mean people with no tech knowledge whatsoever, obviously—more that the electronic machinery that makes such things possible will soon be within the reach of everyone.
It’s hard to imagine a technology with more power to disrupt. I’m already in the position (as many of you soon will be as well) where anyone can produce a believable audio and perhaps video of me saying absolutely anything they want me to say. How can that possible be fought? More to the point: how are we going to trust anything electronically-mediated in the very near future (say, during the next Presidential election)? We’re already concerned, rightly or wrongly, with “fake news”—and that’s only news that has been slanted, arguably, by the bias of the reporter or editor or news organization. What do we do when “fake news” is just as real as “real news”? What do we do when anyone can imitate anyone else, for any reason that suits them?
And what of the legality of this process? It seems to me that active and aware lawmakers would take immediate steps to make the unauthorized production of AI Deep Fakes a felony offense, at least in the case where the fake is being used to defame, damage or deceive. And it seems to be that we should perhaps throw caution to the wind, and make this an exceptionally wide-ranging law. We need to seriously consider the idea that someone’s voice is an integral part of their identity, of their reality, of their person—and that stealing that voice is a genuinely criminal act, regardless (perhaps) of intent. What’s the alternative? Are we entering a future where the only credible source of information will be direct personal contact? What’s that going to do to mass media, of all types? Why should we not assume that the noise to signal ratio will creep so high that all political and economic information disseminated broadly will be rendered completely untrustworthy?
I can tell you from personal experience, for what that’s worth, that it is far from comforting to discover an entire website devoted to allowing whoever is inspired to do so produce audio clips imitating my voice delivering whatever content the user chooses—for serious, comic or malevolent purposes. I can’t imagine what the world will be like when we will truly be unable to distinguish the real from the unreal, or exercise any control whatsoever on what videos reveal about behaviors we never engaged in, or audio avatars broadcasting any opinion at all about anything at all. I see no defense, and a tremendously expanded opportunity for unscrupulous troublemakers to warp our personal and collective reality in any manner they see fit.
Wake up. The sanctity of your voice, and your image, is at serious risk. It’s hard to imagine a more serious challenge to the sense of shared, reliable reality that keeps us linked together in relative peace. The Deep Fake artists need to be stopped, using whatever legal means are necessary, as soon as possible.
Typo in fifth sentence. Know should be known.
Last US election is proof.
I don’t think the situation is as ominous as outlined, because we already have the same conundrum with written text.
Take for example this blog post, lets say I retweet it on my own website, how would people be able to tell they were written by the person I claim they were? By the style of writing? By alignment with say the the consistency with the picture they had of JB Peterson? That can all be faked by intelligent manipulation (by people or machines). I can tell you from personal experience, that taking him out of context has already been demonstrated by particularly reprehensible people from both sides of the political spectrum.
This suggest we will face the problems we have with text today with video and audio formats in the future. I would furthermore argue that we should not be able to restrict the way we can represent other people in our free speech and we should be very careful to put restrictions on it. We don’t want to stop the Deep Fake artists and risk throwing the baby out with the bathwater, but should instead focus on figuring out where exactly they go to far and be precise about that.
I’m not making the claim that we should not restrict them at all and its not like I am unsympathetic towards the point made here. I do want to point out however that Deep Fakes don’t merely have a tyrannical element to it, but also have an element of the wise king. Is it okay for example to test out a position in an argument by emulating the style of someone else? Can you use ordinary text for that? Is it allowed to emulate their voice? Their face? Are you allowed to use it humorously?
Wake up. Don’t follow a political message of someone just because you honor them, without thinking about it. And I know why people don’t want to think for themselves, it’s because it’s difficult.
I’ll stop now because I don’t know where people draw the line between what I believe to be somewhat humorous and being offensive. I don’t want to be impolite, which I am temperamentally prone to, to such an extent that I almost put a trigger warning in front of this. I want to apologize in case I hurt someone’s feelings now but someone I greatly admire told me not to.
If the end result of these “mimicking atrocities” is a return to a world where direct contact, or authorized video (verified by checksum technology maybe) is the only credible source of information, then hurrah.
Let’s face it, for too long, people have had a childlike dependence on what they see and hear, over what they experience.
I have been impacted tremendously by the work of Jordan Peterson. People around me and dependant on my actions have been even more positively affected.
Thank you
This new faking method just reduces the credibility of audio and video to that of traditional sources of information. To my knowledge, the reliability of information was not a major problem historically.
Double buffer to force me out to say according to certain abuse of sciecnes, and your way mixed but I RESPECTED it.