Getty Images/iStockphoto
Large language models are disrupting the publishing industry, from spam submissions to garbage books.
AI is, in theory, poised to disrupt work as we know it now. But it’s still facing the same problem every buzzy new tech product before it has faced: The VC funding is there, but the long-term business model is not, particularly for individuals. What do you do with a large language model AI at this stage, when all you know for sure is that it will produce text to order, in varying degrees of accuracy?
One fairly straightforward response is to try to sell that text. Preferably, you would want to sell it someplace where it doesn’t matter whether it’s accurate or not, or even where inaccuracy could become fiction and hence valuable: the book market. The book market is also, conveniently, the last textual medium where users are still in the habit of paying directly (even just a tiny bit). Publishing is currently the weak point that bad-faith AI users are trying to infiltrate.
Legally speaking, you can’t sell AI-generated text, because text generated by machines is not subject to copyright (with some exceptions). Nevertheless, the scammers and grifters who circulate along publishing’s underbelly are integrating AI into their existing scams and grifts. Publishers are reportedly investigating ways of using AI in discreet, closed-door meetings. And authors are on the alert for anything that looks like a smoking gun to take down what many of them believe to be an existential threat to their craft.
It started in January, when science fiction magazines reported that they were being flooded with AI-generated submissions. Editors believed “side hustle” influencers were recommending that their followers use AI to generate short stories and then sell them, apparently under the belief that short story writers pull in big bucks. In December 2022, explained Clarkesworld editor Neil Clarke, the magazine received 50 fraudulent submissions; in the first half of February 2023, they received almost 350.
By July, the Author’s Guild was becoming concerned. Large language models are trained off large piles of text. A Meta white paper named one popular corpus used to train large language models; that corpus includes text scraped from so-called “shadow libraries,” large collections of pirated books. How was that not copyright infringement?
“We understand that many of the books used to develop AI systems originated from notorious piracy websites,” the Author’s Guild wrote in an open letter to the CEOs of various AI companies. “It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited.”
The letter went on to call for the CEOs to get permission for their use of copyrighted material for AI programming, compensate writers for past and ongoing use of their work in training AI, and compensate them further for the use of their work under AI output.
The Guild had reason to be concerned. The same kind of side-hustle influencer who advised their audience to start sending AI-generated stories to literary magazines has also begun advising their audience to start selling AI-generated ebooks on Amazon.
“Making money with Amazon KDP is a numbers game,” advises one such post. “Clever side hustlers can target a particular niche, and leverage AI to produce multiple books quickly while slowly racking in those sweet royalties.”
“Targeting a particular niche” can sometimes get very specific — as specific as, say, “targeting the niche of people interested in a particular author’s books by pretending to be that author.” In August, the writer Jane Friedman reported that “garbage books” she’d never seen before were getting sold on Amazon under her name and had been added to her Goodreads profile. The books read, she said, exactly like what ChatGPT spits out when prompted with her name. If it was, that means an AI trained on Friedman’s corpus (without compensating her) was now generating new text to be sold under her name (again without compensating her).
“Whoever’s doing this is obviously preying on writers who trust my name and think I’ve actually written these books,” Friedman wrote.
Neither of these schemes is precisely new. There have been “garbage books” for sale on Amazon for a long time: plagiarized books and books with stolen text run through Google Translate a few times and books with straight gobbledygook as the text. It’s not unheard of for those books to have the byline of a legitimate author, all the better to trick unsuspecting readers into buying them. Likewise, people have sent plagiarized submissions to literary magazines for a long time.
What’s new right now is the scale of the operation. AI makes it easy for scammers and side hustlers to do their work in massive quantities.
In July, authors Christopher Golden and Richard Kadrey joined Sarah Silverman in filing a class action lawsuit against OpenAI and Meta, alleging that the companies used multiple books, including Silverman’s memoir, as part of their training sets.
Authors, Geraldine Brooks declared at the Martha’s Vineyard Book Festival this month, “are the ones who should be going on strike.” She was increasingly concerned that none of her contracts had any language in them about AI.
It was in the midst of this increasingly agitated atmosphere that the website Prosecraft emerged into the spotlight in early August. A product of software company Shaxpir that went live in 2019, Prosecraft ranks books based on how many words they have, how often they use passive voice, how often they use adjectives, and the vividness of their language. Its database includes analytics for many books already under copyright, although it does not include their text.
“This company Prosecraft appears to have stolen a lot of books, trained an AI, and are now offering a service based on that data,” wrote novelist Hari Kunzru on Twitter.
Prosecraft doesn’t use AI. It uses an algorithm without any generative AI properties. It’s also not particularly profitable. According to creator Benji Smith, it “has never generated any income.” Still, authors en masse saw it as just more of the same urgent threat they were already facing: a slick tech interface no one asked for, all its value scraped from their own work, without their permission. Facing a virulent outcry on social media, Smith took Prosecraft down.
Meanwhile, the New York Times reports that about 50 companies that actually do use AI to create, package, edit, and market books have launched over the past year. An irony here is that publishing is a business of notoriously low margins, and those margins are getting smaller. A 2018 Authors Guild survey found that the median annual income for authors was $6,080, down from $12,850 in 2007. It also found that only 21 percent of full-time published authors derived 100 percent of their individual income from book-related income, and for those who did, the median income was $20,300.
The people who tell our stories are already stretched very, very thin. As a culture, we have spent decades undervaluing their labor, treating writing as a passion project that does not deserve remuneration rather than skilled labor that ought to come with a paycheck.
Now, AI has become a powerful tool for grifters to use to try to vacuum up the little money we do award to writers. The side hustle hustles on.