The AI boom is here, and so are the lawsuits • Repithwin News

Sam Altman, CEO of OpenAI. | Photo by Kevin Dietsch/Getty Images

What can Napster tell us about the future?

That was quick: Artificial intelligence has gone from science fiction to novelty to Thing We Are Sure Is the Future. Very, very fast.

One easy way to measure the change is via headlines — like the ones announcing Microsoft’s $10 billion investment in OpenAI, the company behind the dazzling ChatGPT text generator, followed by other AI startups looking for big money. Or the ones about school districts frantically trying to cope with students using ChatGPT to write their term papers. Or the ones about digital publishers like CNET and BuzzFeed admitting or bragging that they’re using AI to make some of their content — and investors rewarding them for it.

“Up until very recently, these were science experiments nobody cared about,” says Mathew Dryhurst, co-founder of the AI startup Spawning.ai. “In a short period of time, [they] became projects of economic consequence.”

Then there’s another leading indicator: lawsuits lodged against OpenAI and similar companies, which argue that AI engines are illegally using other people’s work to build their platforms and products. This means they are aimed directly at the current boom of generative AI — software, like ChatGPT, that uses existing text or images or code to create new work.

Last fall, a group of anonymous copyright owners sued Open AI and Microsoft, which owns the GitHub software platform, for allegedly infringing on the rights of developers who’ve contributed software to GitHub. Microsoft and OpenAI collaborated to build GitHub Copilot, which says it can use AI to write code.

And in January, we saw a similar class-action suit filed (by the same attorneys) against Stability AI, the developer of the AI art generator Stable Diffusion, alleging copyright violations. Meanwhile, Getty Images, the UK-based photo and art library, says it will also sue Stable Diffusion for using its images without a license.

It’s easy to reflexively dismiss legal filings as an inevitable marker of a tech boom — if there’s hype and money, lawyers are going to follow. But there are genuinely interesting questions at play here — about the nature of intellectual property and the pros and cons of driving full speed into a new tech landscape before anyone knows the rules of the road. Yes, generative AI now seems inevitable. These fights could shape how we use it and how it affects business and culture.

We have seen versions of this story play out before. Ask the music industry, which spent years grappling with the shift from CDs to digital tunes, or book publishers who railed against Google’s move to digitize books.

The AI boom is going to “trigger a common reaction among people we think of as creators: ‘My stuff is being stolen,’” says Lawrence Lessig, the Harvard law professor who spent years fighting against music labels during the original Napster era, when he argued that music owners were using copyright rules to quash creativity.

In the early 2000s, tussles over digital rights and copyrights were a sidelight, of concern to a relatively small slice of the population. But now everyone is online — which means that even if you don’t consider yourself a “creator,” stuff you write or share could become part of an AI engine and used in ways you’d never imagine.

And the tech giants leading the charge into AI — in addition to Microsoft, both Google and Facebook have made enormous investments in the industry, even if they have yet to bring much of it in front of the public — are much more powerful and entrenched than their dot-com boom counterparts. Which means they have more to lose from a courtroom challenge, and they have the resources to fight and delay legal consequences until those consequences are beside the point.

AI’s data-fueled diet

The tech behind AI is a complicated black box, and many of the claims and predictions about its power may be overstated. Yes, some AI software seems to be able to pass parts of MBA and medical licensing tests, but they’re not going to replace your doctor or CFO quite yet. They are also not sentient, despite what a befuddled Googler might have said.

But the basic idea is relatively straightforward: Engines like the ones built by OpenAI ingest giant data sets, which they use to train software that can make recommendations or even generate code, art, or text.

In many cases, the engines are scouring the web for these data sets, the same way Google’s search crawlers do, so they can learn what’s on a webpage and catalog it for search queries. In some cases, such as Meta, AI engines have access to huge proprietary data sets built in part by the text, photos, and videos their own users have posted on their platforms — though a Meta spokesperson says that data is used to help refine recommendations, not to build AI products like a ChatGPT-esque engine. Other times, the engines will also license data, like Meta and OpenAI have done with the photo library Shutterstock.

Unlike the music piracy lawsuits at the turn of the century, no one is arguing that AI engines are making bit-for-bit copies of the data they use and distributing them under the same name. The legal issues, for now, tend to be about how the data got into the engines in the first place and who has the right to use that data.

AI proponents argue that 1) engines can learn from existing data sets without permission because there’s no law against learning, and 2) turning one set of data — even if you don’t own it — into something entirely different is protected by the law, affirmed by a lengthy court fight that Google won against authors and publishers who sued the company over its book index, which cataloged and excerpted a huge swath of books.

The arguments against the engines seem even simpler: Getty, for one, says it is happy to license its images to AI engines, but that Stable Diffusion builder Stability AI hasn’t paid up. In the OpenAI/Microsoft/GitHub case, attorneys argue that Microsoft and OpenAI are violating the rights of developers who’ve contributed code to GitHub, by ignoring the open source software licenses that govern the commercial use of that code.

And in the Stability AI lawsuit, those same lawyers argue that the image engine really is making copies of artists’ work, even if the output isn’t a mirror image of the original. And that their own output competes with the artists’ ability to make a living.

“I’m not opposed to AI. Nobody’s opposed to AI. We just want it to be fair and ethical — to see it done right,” says Matthew Butterick, a lawyer representing plaintiffs in the two class-action suits.

And sometimes the data question changes depending on whom you ask. Elon Musk was an early investor in OpenAI — but once he owned Twitter, he said he didn’t want to let OpenAI crawl Twitter’s database.

Not surprising, as I just learned that OpenAI had access to Twitter database for training. I put that on pause for now.

Need to understand more about governance structure & revenue plans going forward.

OpenAI was started as open-source & non-profit. Neither are still true.

— Elon Musk (@elonmusk) December 4, 2022

What does the past tell us about AI’s future?

Here, let’s remember that the Next Big Thing isn’t always so: Remember when people like me were earnestly trying to figure out what Web3 really meant, Jimmy Fallon was promoting Bored Ape NFTs, and FTX was paying millions of dollars for Super Bowl ads? That was a year ago.

Still, as the AI hype bubble inflates, I’ve been thinking a lot about the parallels with the music-versus-tech fights from more than two decades ago.

Briefly: “File-sharing” services blew up the music industry almost overnight, because they gave anyone with a broadband connection the ability to download any music they wanted, for free, instead of paying $15 for a CD. The music industry responded by suing the owners of services like Napster, as well as ordinary users like a 66-year-old grandmother. Over time, the labels won their battles against Napster and its ilk, and, in some cases, their investors. They also generated tons of opprobrium from music listeners, who continued to not buy much music, and the value of music labels plummeted.

But after a decade of trying to will CD sales to come back, the music labels eventually made peace with the likes of Spotify, which offered users the ability to subscribe to all-you-can-listen-to service for a monthly fee. Those fees ended up eclipsing what the average listener would spend a year on CDs, and now music rights and the people who own them are worth a lot of money.

So you can imagine one outcome here: Eventually, groups of people who put things on the internet will collectively bargain with tech entities over the value of their data, and everyone wins. Of course, that scenario could also mean that individuals who put things on the internet discover that their individual photo or tweet or sketch means very little to an AI engine that uses billions of inputs for training.

It’s also possible that the courts — or, alternatively, regulators who are increasingly interested in taking on tech, particularly in the EU — enforce rules that make it very difficult for the likes of OpenAI to operate, and/or punish them retroactively for taking data without consent. I’ve heard some tech executives say they’d be wary of working with AI engines for fear of ending up in a suit, or being required to unwind work they’d made with AI engines.

But the fact that Microsoft, which certainly knows about the dangers of punitive regulators, just plowed another $10 billion into OpenAI suggests that the tech industry figures the reward outweighs the risk. And that any legal or regulatory resolution will show up long, long after the AI winners and losers will have been sorted out.

A middle ground, for now, could be that people who know and care about this stuff take the time to tell AI engines to leave them alone. The same way people who know how webpages are made know that “robots.txt” is supposed to tell Google not to crawl your site.

Spawning.Ai has built “Have I Been Trained,” a simple tool that’s supposed to tell if your artwork has been consumed by an AI engine, and gives you the ability to tell engines not to inhale it in the future. Spawning co-founder Dryhurst says the tool won’t work for everyone, or every engine, but it’s a start. And, more important, it’s a placeholder as we collectively figure out what we want AI to do, and not do.

“This is a dress rehearsal and opportunity to establish habits that will prove to be crucial in the coming decades,” he told me via email. “It’s hard to say if we have two years or 10 years to get it right.”