Generative AI is the next chapter of copyright problems for authors

Since OpenAI’s ChatGPT launched in November, the world has gone AI mad. While AI is certainly nothing new — suddenly it is everywhere. This is particularly true for generative AI, which is able to create images, the written word, and even videos from simple prompts.

Every big tech business scrambled to release competing chatbot products to rival OpenAI. A plethora of AI-written content began to flood social media, companies, and schools. Self-proclaimed AI gurus cropped up trying to sell people courses on how to best harness these new tools.

The list goes on.

One of the big conversations around AI has been ethics. As we have seen in the art world, many of the large language models (LLM) that AIs are trained in may not always be learning from consenting sources.

It’s something large businesses have tried to address, such as Adobe explaining how its own Firefly AI platform is only trained on open-source images and consenting Adobe customers. It is even taking this further by eventually bringing in a compensation initiative for its users.

Sarah Silverman sues OpenAI and Meta for copyright infringement

This conversation has come back to the forefront this week as news broke about OpenAI and Meta being sued by authors such as Sarah Silverman, over alleged copyright infringement.

Silverman, along with Christopher Golden and Richard Kadrey, have filed dual lawsuits claiming that the company’s large language models (LLMs) used their material without consent to train their respective AIs.

The suits allege that Open AI’s ChatGPT and Meta’s LLaMA were fed datasets that included illegal copies of the authors’ work. According to the authors, the datasets were scraped from shadow libraries such as Z-Library and Bibliotik. The issue with shadow libraries is that they can be used to distribute content that otherwise needs to be paid for, such as books or paywalled articles and academic papers.

The authors say in their claims that they did not consent for their books to be used as training material and are seeking statutory damages and restitution of profits across six counts of alleged copyright infringement.

“It’s part of a wider trend that is emerging and obviously AI is proliferating and there’s so much happening seemingly every day in this space,” Ben Hamilton, head of IP Practice and a partner at Hall & Wilcox, said in a call with SmartCompany.

“Getty has taken action in relation to its photographs in the United States. I’m also aware of a group of artists in California taking action against an AI platform in relation to their artistic work. And this one, of course, is to do with what we call literary works or written material.”

According to Hamilton, there’s a common thread between the various copyright cases being brought against various AI-related businesses — how the AIs are trained.

“The essence of copyright is where you have what we call an unauthorised reproduction… Just because it’s from the internet doesn’t mean it’s free to reproduce. Whether or not it is from a shadow library, or in the case of Getty where I think it is saying it’s from its own authorised library.”

Hamilton goes on to say that authors are within their rights to legally go after the shadow libraries that are offering their works to people without authorisation.

“Any shadow platform is potentially there to be attacked from a copyright infringement perspective. But seemingly what is probably happening here is these AI platforms are really the ones that are representing, a bigger issue for these copyright owners.”

Generative AI is the next edition old problem for authors online

Shadow libraries and torrent sites are certainly not a new issue for authors. But the rise in popularity and accessibility of generative AI has made it an even bigger issue.

“I hate it. It terrifies me and infuriates me,” Jenna Guillaume, author of You Were Made For Me and The Deep End, said in a phone call with SmartCompany.

“It really hurts authors. We basically pick over scraps as it is. Every reader counts and in Australia if they can’t afford our books they can go to the library and get them and we still get payment from that, which is really important. The lending rights payments that come through at the end of the financial year — I know a lot of authors who really rely on them.

“Anything that undermines that or damages that is damaging. We deserve to be paid for our work and this inhibits our ability to be able to work.”

Jodi McAlister is an Australian romantic fiction author and academic. Like many Australian authors, she doesn’t make a full-time living from her books.

“I think the mean wage in Australia is eighteen thousand dollars a year and a lot of people are earning much, much less than that. I’m not sure whether or not say, Tim Winton or Trent Dalton responded to that survey, but if they did, it would have skewed the data,” McAlister said to SmartCompany.

The Can I Steal You For A Second? author also details how deeply book piracy was already impacting authors.

“There was an author many years ago who deliberately leaked an early copy of her manuscript where the first four chapters were perfect. I can’t remember what she did but after that, it was wrong, bad and weird,” McAlister laughed.

“And she did it because she was losing so many sales from piracy.”

With the rise of generative AI, authors feel that their livelihoods could be further impacted.

The Australian Society of Authors (ASA) recently published findings from a survey it conducted on AI. Of the 208 Australian authors who responded, 74% expressed “significant concern” that generative AI tools were a threat to writing and illustrating professionals. 51% also thought that AI would have a negative impact on their future income.

The data also revealed a fear from 72% of the authors that publishers themselves will start using AI to create books Unsurprisingly, 92% felt there should be a code of conduct when it comes to AI in the publishing industry and that when it has been used, it should be disclosed to the readers.

It’s worth noting that copyright does work slightly differently in Australia versus the US.

“Under the US jurisdiction, they have a concept that they call ‘fair use’. If it can be established that the unauthorised use is a ‘fair use’ then there won’t be an infringement,” Hamilton said.

“In Australia, the rough equivalent is referred to as ‘fair dealing’. It’s a similar concept but way more rigid and codified. Much more black letter law. There’s far less flexibility to come under the fair dealing exemption as there is in the US fair use exemption.”

There’s some humour to be had, and some regulations

But while writers are in a tricky spot with AI right now, some inadvertent revenge has at least been enacted — which also reveals just how far LLM scraping may be going.

Back in May it was revealed that Sudowrite — which utilises ChatGPT’s GPT-3 model — knew some words and phrases such as ‘knotting’ that are uniquely specific to the Omegaverse, or A/B/O, which stands for Alpha, Beta and Omega.

This is a deeply specific sub-genre of erotic fiction and world-building. There’s a lot more nuance, explanations and language to the Omegaverse, none of which we suggest googling on your work laptop.

The only way that Sudowrite would be able to generate Omegaverse responses is by learning from the platforms that host this fiction, such as Archive Of Our Own.

While this may not help with copyright infringement or loss in sales, it is at least a very funny example of what can happen when LLMs are able to scrape the internet without any regulation in place. After all, school-aged kids can readily access these generative AI tools.

Unsurprisingly, similar to artists we spoke to last year, authors do see regulation as an important step to protect their intellectual property from AI.

“It would have to be opt-in rather than opt-out. Your work shouldn’t be able to be chewed up without your permission,” McAlister said.

“And if work is going to be used like this, it has to be consensually and with compensation.”