AI and Copyright: Expanding copyright hurts everyone

36 points by mooreds 6 hours ago

Let’s play out what the successful story looks like, for example in music generation:

* thousands of composers and musicians “contribute” their works for free to OpenAI and their ilk.

* models are trained and can produce movie scores, personalized sound tracks, etc

* the market for composers dwindles to a small sliver. Few if any choose to pursue it.

* OpenAI et al have a de facto monopoly on music creation.

* Soon the art of composition is lost. There are a few who can make a living as composers by selling their persona rather than their music. They actually just use OpenAI to write it for them.

Is that the future we want? Inevitable.

fragmede 2 hours ago

http://Suno.ai is already there.
- drivebyhooting 2 hours ago
  
  Might be a skill issue but my prompt was not followed.

freejazz 6 hours ago

How do they get to the conclusion that AI uses are protected under the fair use doctrine and anything otherwise would be an "expansion" of copyright? Fairly telling IMO

zrm 5 hours ago

AI training and the thing search engines do to make a search index are essentially the same thing. Hasn't the latter generally been regarded as fair use, or else how do search engines exist?
- justapassenger 4 hours ago
  
  Most important part of fair use is does it harm the market for the original work. Search helps to brings more eyes to the original work, llms don't.
  - tpmoney 4 hours ago
    
    The fair use test (in US copyright law) is a 4 part test under which impact on the market for the original work is one of 4 parts. Notably, just because a use has massively detrimental harms to a work's market does not in and of itself constitute a copyright violation. And it couldn't be any other way. Imagine if you could be sued for copyright infringement for using a work to criticize that work or the author of that work if the author could prove that your criticism hurt their sales. Imagine if you could be sued for copyright infringement because you wrote a better song or book on the same themes as a previous creator after seeing their work and deciding you could do it better.
    Perhaps famously, emulators very clearly and objectively impact the market for a game consoles and computers and yet they are also considered fair use under US copyright law.
    No one part of the 4 part test is more important than the others. And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.
    
    willis936 3 hours ago
    
    1. Character of the use. Commercial. Unfavorable.
    2. Nature of the work. Imaginative or creative. Unfavorable.
    3. Quantity of use. All of it. Unfavorable.
    4. Impact on original market. Direct competition. Royalty avoidance. Unfavorable.
    Just because the courts have not done their job properly does not mean something illegal is not happening.
    
    tpmoney 2 hours ago
    
    All of these apply to emulators.
    * The use is commercial (a number of emulators are paid access, and the emulator case that carved out the biggest fair use space for them was Connectix Virtual Game Station a very explicitly commercial product)
    * The nature of the work is imaginative and creative. No one can argue games and game consoles aren't imaginative and creative works.
    * Quantity of use. A perfect emulator must replicate 100% of the functionality of the system being emulated, often times including bios functionality.
    * Impact on market. Emulators are very clearly in direct competition with the products they emulate. This was one of Sony's big arguments against VGS. But also just look around at the officially licensed mini-retro consoles like the ones put out by Nintento, Sony and Atari. Those retro consoles are very clearly competing with emulators in the retro space and their sales were unquestionably affected by the existence of those emulators. Royalty avoidance is also in play here since no emulator that I know of pays licensing fees to Nintendo or Sony.
    So are emulators a violation of copyright? If not, what is the substantial difference here? An emulator can duplicate a copyrighted work exactly, and in fact is explicitly intended to do so (yes, you can claim its about the homebrew scene, and you can look at any tutorial on setting up these systems on youtube to see that's clearly not what people want to do with them). Most of the AI systems are specifically programmed to not output copyrighted works exactly. Imagine a world where emulators had hash codes for all the known retail roms and refused to play them. That's what AI systems try to do.
    Just because you have enumerated the 4 points and given 1 word pithy arguments for something illegal happening does not mean that it is. Judge Alsup laid out a pretty clear line of reasoning for why he reached the decision he did, with a number of supporting examples [1]. It's only 32 pages, and a relatively easy read. He's also the same judge that presided over the Oracle v. Google cases that found Google's use of the java APIs to be fair use despite that also meeting all 4 of your descriptions. Given that, you'll forgive me if I find his reasoning a bit more persuasive than your 52 word assertion that something illegal is happening.
    [1]: https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/...
  - AnthonyMouse 3 hours ago
    
    It seems like you're responding to a question about training by talking about inference. If you train an LLM because you want to use it to do sentiment analysis to flag social media posts for human review, or Facebook trains one and publishes it and others use it for something like that, how is that doing anything to the market for the original work? For that matter, if you trained an LLM and then ran out of money without ever using it for anything, how would that? It should be pretty obvious that the training isn't the part that's doing anything there.
    And then for inference, wouldn't it depend on what you're actually using it for? If you're doing sentiment analysis, that's very different than if you're creating an unlicensed Harry Potter sequel that you expect to run in theaters and sell tickets. But conversely, just because it can produce a character from Harry Potter doesn't mean that couldn't be fair use either. What if it's being used for criticism or parody or any of the other typical instances of fair use?
    The trouble is there's no automated way to make a fair use determination, and it really depends on what the user is doing with it, but the media companies are looking for some hook to go after the AI companies who are providing a general purpose tool instead of the subset of their "can't get blood from a stone" customers who are using that tool for some infringing purpose.
- asdefghyk 4 hours ago
  
  re ".....AI training and the thing search engines do to make a search index are essentially the same thing. ...."
  Well, AI training has annoyed LOTS people. Overloaded websites.. Done things just because they can . ie Facebook sucking up content of lots pirate books
  Since this AI race started our small website is constantly over run by bots and it is not usable by humans because of the load.. NEWER HAD this problem before AI , when just access by search engine indexing .....
  - AnthonyMouse 3 hours ago
    
    This is largely because search engines are a concentrated market and AI training is getting done by everybody with a GPU.
    If Google, Bing, Baidu and Yandex each come by and index your website, they each want to visit every page, but there aren't that many such companies. Also, they've been running their indexes for years so most of the pages are already in them and then a refresh is usually 304 Not Modified instead of them downloading the content again.
    But now there are suddenly a thousand AI companies and every one of them wants a full copy of your site going back to the beginning of time while starting off with zero of them already cached.
    Ironically copyright is actually making this worse, because otherwise someone could put "index of the whole web as of some date in 2023" out there as a torrent and then publish diffs against it each month and they could all go download it from each other instead of each trying to get it directly from you. Which would also make it easier to start a new search engine.
- Kye 5 hours ago
  
  There was a relatively tiny but otherwise identical uproar over Google even before they added infoboxes that reduced the number of people who clicked through.
  - AnthonyMouse 2 hours ago
    
    > There was a relatively tiny but otherwise identical uproar over Google even before they added infoboxes that reduced the number of people who clicked through.
    But is that because it isn't fair use or because of the virulent rabies epidemic among media company lawyers?
    
    Kye 2 hours ago
    
    This was normal people, as much as bloggers on the pre-social media early web could be considered normal.
    
    AnthonyMouse an hour ago
    
    Normal people that aren't media companies were objecting to search engines indexing websites? That seems more likely to have been media companies using the fact that they're media companies to get people riled up over a thing the company is grumpy about.
  - tpmoney 4 hours ago
    
    There was also the lawsuit against google for the Google Scholar project, which is not only very similar to how AI use ingest copyright material, but even more than AI actually reproduced word for word (intentionally so) snippets of those works. Google Scholar is also fair use.
Kim_Bruning 2 hours ago

I think your question was supposed to be rhetorical, but I think it's safe to assume that the answer is that they're lawyers. They've read the law, and read through a large number of cases to see how judges have interpreted it over the past century or so.
turtletontine 4 hours ago

Basically, it’s an open question that courts have yet to decide. But the idea is that it’s fair use until courts decide otherwise (or laws decide otherwise, but that doesn’t seem likely). That’s my understanding, but I could be wrong. I expect we’ll see more and more cases about this, which is exactly why the EFF wants to take a position now.
They do link to a (very long) article by a law professor arguing that data mining is fair use. If you want to get into the weeds there, knock yourself out.
https://lawreview.law.ucdavis.edu/sites/g/files/dgvnsk15026/...
- dragonwriter 3 hours ago
  
  > Basically, it’s an open question that courts have yet to decide.
  While it hasn't either been ruled on or turned away at the Supreme Court yet, a number of federal trial courts have found training AI models from legally-acquired materials to be fair use (even while finding, in some of those and other cases, that pirating to get copies to then use in training is not and using models as a tool to produce verbatim and modified similar-medium copies of works from the training material is also not.)
  I’m not aware of any US case going the other way, so, while the cases may not strictly be precedential (I think they are all trial court decisions so far), they are something of a consistent indicator.
  - AnthonyMouse 3 hours ago
    
    > even while finding, in some of those and other cases, that pirating to get copies to then use in training is not
    I still don't get this one. It seems like they've made a ruling with a dependency on itself without noticing.
    Suppose you and some of your friends each have some books, all legally acquired. You get them together and scan them and do your training. This is the thing they're saying is fair use, right? You're getting together for the common enterprise of training this AI model on your lawfully acquired books.
    Now suppose one of your friends is in Texas and you're in California, so you do it over the internet. Making a fair use copy is not piracy, right? So you're not making a "pirated copy", you're making a fair use copy.
    They recognize that one being fair use has something to do with the other one being, but then ignore the symmetry. It's like they hear the words "file sharing" and refuse to allow it to be associated with something lawful.
- 1gn15 2 hours ago
  
  > Basically, it’s an open question that courts have yet to decide.
  This is often repeated, but not true. Multiple US and UK courts have repeatedly ruled that it is fair use.
  Anthropic won. Meta won. Just yesterday, Stability won against Getty.
  At this point, it's pretty obvious that it's legal, considering that media companies have lost every single lawsuit so far.
  https://fixvx.com/technollama/status/1985653634390995178
tpmoney 4 hours ago

They probably get to that conclusion because the courts have rules that AI uses are protected under fair use, and so yes changing that would be an expansion of copyright.
amelius 4 hours ago

Not the EFF I once knew. Are they now pro-bigtech?
- bigbadfeline 4 hours ago
  
  > Not the EFF I once knew. Are they now pro-bigtech?
  There's nothing pro-bigtech in this proposal. Big tech can afford the license fees and lawsuits... and corner the market. The smaller providers will be locked out if an extended version of the already super-stretched copyright law becomes the norm.
- dragonwriter 3 hours ago
  
  They’ve always been anti-expansive-copyright, which has historically aligned with much (but not all) of big tech, and against big content/media.
  A lot of the people that were anti-expansive-copyright only because it was anti-big-media have shifted to being pro-expansive-copyright because it is perceived as being anti-big-tech (and specifically anti-AI).
- slyall 4 hours ago
  
  They have always been anti-bigcontent. Maybe you are the one who has changed

mwkaufma 6 hours ago

"Here's What to Do Instead" misleading title, no alternatives suggested. Just hand-wavey GenAI agitprop.

rapjr9 5 hours ago

These are their alternatives:
What neither Big Tech nor Big Media will say is that stronger antitrust rules and enforcement would be a much better solution. What’s more, looking beyond copyright future-proofs the protections. Stronger environmental protections, comprehensive privacy laws, worker protections, and media literacy will create an ecosystem where we will have defenses against any new technology that might cause harm in those areas, not just generative AI.
- bgwalter 5 hours ago
  
  None of their alternatives will work or solve the problems that creatives face or the problem that people cannot think for themselves any longer (as seen by the downvoting in this submission).
  - doormatt 4 hours ago
    
    >the problem that people cannot think for themselves any longer (as seen by the downvoting in this submission).
    Quite an interesting take to assume that everyone who disagrees with you cannot think for themselves.
    
    AnthonyMouse 4 hours ago
    
    I read that as the problem that people are relying on LLMs to do things for them rather than actually learning the thing themselves. Which is real but it's unclear what it has to do with copyright.

bgwalter 5 hours ago

EFF is bought and paid for. Not once does this piece mention that "AI" and humans are different and that a digital combine harvester mowing down and ingesting the commons does not need the same rights as a human.

It is not fair use when the entire output is made of chopped up quotes from all of humanity. It is not fair use when only a couple of oligarchs have the money and grifting ability to build the required data centers.

This is a another in the long lists of institutions that have been subverted. ACLU and OSI are other examples.

OkayPhysicist 5 hours ago

What definition of "sufficiently transformative" doesn't include "a book about wizards" by some process being used to make "a machine that spits out text"? A magazine publisher has a more legitimate claim against the person making a ransom letter: at least the fonts are copied verbatim.
There are legitimate arguments to be made about whether or not AI training should be allowed, but it should take the form of new legislation, not wild reinterpretations of copyright law. Copyright law is already overreaching, just imagine how goddawful companies could be if they're given more power to screw you for ever having interacted with their "creative works".
- bgwalter 5 hours ago
  
  We did have that. In some EU countries, during the cassette tape and Sony Walkman era, private individuals were allowed to make around 5 copies for friends from a legitimate source.
  Companies were not allowed to make 5 trillion copies.
  - warkdarrior 3 hours ago
    
    I am pretty sure companies keep one copy of each item.
tpmoney 4 hours ago

> It is not fair use when only a couple of oligarchs have the money and grifting ability to build the required data centers.
Seems like a good argument to not lock down the ability to create and use AI models only to those with vast sums of money able to pay extortionist prices from copyright holders. And let's be clear, copyright holders will happily extort the hell out of things if they can, for an example of this we can look to the number of shows and movies that have had to be re-edited in the modern era because there are no streaming rights to the music they used.
nickthegreek 4 hours ago

> EFF is bought and paid for.
by whom?

free_bip 5 hours ago

One of the few times I vehemently disagree with the EFF.

The problem is this article seems to make absolutely no effort to differentiate legitimate uses of GenAI (things like scientific and medical research) from the completely illegitimate uses of GenAI (things like stealing the work of every single artist, living and dead, for the sole purpose of making a profit)

One of those is fair use. The other is clearly not.

Calavar 5 hours ago

What happens when a researcher makes a generative art model and publicly releases the weights? Anyone can download the weights and use it to turn a quick profit.
Should the original research use be considered legitimate fair use? Does the legitimacy get 'poisoned' along the way when a third party uses the same model for profit?
Is there any difference between a mom-and-pop restaurant who uses the model to make a design for their menu versus a multi-billion dollar corp that's planning on laying off all their in house graphic designers? If so, where in between those two extremes should the line be drawn?
- free_bip 4 hours ago
  
  I'm not a copyright attorney in any country, so the answer (assuming you're asking me personally) is "I don't know and it probably depends heavily on the specific facts of the case."
  If you're asking for my personal opinion, I can weigh in on my personal take for some fair use factors.
  - Research into generative art models (the kind which is done by e.g. OpenAI, Stable Diffusion) is only possible due to funding. That funding mainly comes from VC firms who are looking to get ROI by replacing artists with AI[0], and then debt financing from major banks on top of that. This drives both the market effect factor and the purpose/character of use factor, and not in their favor. If the research has limited market impact and is not done for the express purpose of replacing artists, then I think it would likely be fair use (an example could be background removal/replacement).
  - I don't know if there are any legal implications of a large vs. small corporation using a product of copyright infringement to produce profit. Maybe it violates some other law, maybe it doesn't. All I know is that the end product of a GenAI model is not copyrightable, which to my understanding means their profit potential is limited as literally anyone else can use it for free.
  [0]: https://harlem.capital/generative-ai-the-vc-landscape/
tpmoney 4 hours ago

At what point do you cross the line from "legitimate use of a work" to illegitimate use?
If I take my legally purchased epub of book and pipe it through `wc` and release the outputs, is that a violation of copyright? What about 10 books? 100? How many books would I have to pipe through `wc` before the outputs become a violation of copyright?
What if I take those same books and generate a spreadsheet of all the words and how frequently they're used? Again, same question, where is the line between "fine" and "copyright violation"?
What if I take that spreadsheet, load it into a website and make a javascript program that weights every word by count and then generates random text strings based on those weights? Is that not essentially an LLM in all but usefulness? Is that a violation of copyright now that I'm generating new content based on statistical information about copyright content? If I let such a program run long enough and run on enough machines, I'm sure those programs would generate strings of text from the works that went into the models. Is that what makes this a copyright violation?
If that's not a violation, how many other statistical transformation and weighting models would I have to add to my javascript program before it's a violation of copyright? I don't think it's reasonable to say any part of this is "clearly not" fair use, no matter how many books I pump into that original set of statistics. And at least so far, the US courts agree with that.
- free_bip 4 hours ago
  
  I think your analogy is a massive stretch. `wc` is neither generative nor capable of having market effect.
  Your second construction is generative, but likely worse than a Markov chain model, which also did not have any market effect.
  We're talking about the models that have convinced every VC it can make a trillion dollars from replacing millions of creative jobs.
  - tpmoney 3 hours ago
    
    It's not a stretch because I'm not claiming they're the same thing, I'm incrementally walking the tech stack to try and find where we would want to draw the line. If things something has to be generative in order to be a violation, that (for all but the most insane definitions of generative) clears `wc`, but what about publishing the DVD or BluRay encryption keys? Most of the "hacker" communities pretty clearly believe that isn't a violation of copyright. But is it a violation of copyright to distribute that key and also software that can use that key to make a copy of a DVD? If not, why? Is it because the user has to combine the key, with the software and specifically direct that software to make a copy of which the copy is a violation of copyright but not the software and key combination?
    If that's the combination of the decryption key and the software that can use that key to make a copy of a DVD is not a violation of copyright, does that imply that distributing a model and a piece of software separately that can use that model is also not a copyright violation? If it is a violation, what makes it different from the key + copy software combo?
    If we decide that generative is a necessary component, is the line just whenever the generative model becomes useful? That seems arbitrary and unnecessarily restrictive. Google Scholar is an instructive example here, a search database that scanned many thousands of copyright materials, digitized them and then made that material searchable to anyone and even (intentionally) displayed verbatim copies (or even images) of parts of the work in question. This is unquestionably useful for people, and also very clearly producing portions of copyrighted works. Should the court cases be revisited and Google Scholar shut down for being useful?
    If market effect is the key thing, how do we square that with the fact that a number of unquestionably market impacting things are also considered fair use. Emulators are the classic example here, and certainly modern retro gaming OSes like Recalbox or Retropie have measurable impacts on the market for things like nostalgia bait mini SNES and Atari consoles. And yet, the emulators and their OS's remain fair use. Or again, lets go back to the combination of the DVD encryption keys and something like handbrake. Everyone knows exactly what sort of copyright infringement most people do with those things. And there are whole businesses dedicated to making a profit off of people doing just that (just try and tell anyone with a straight face that Plex servers are only being used to connect to legitimate streaming services and stream people's digitized home movies).
    My point is that AI models touch on all of these sorts of areas that we have previously carved out as fair use, and AI models are useful tools that don't (despite claims to the contrary) clearly fall afoul of copyright law. So any argument that they do needs to think about where we draw the lines and what are the factors that make up that decision. So far the courts have found training an AI model with legally obtained materials and distributing that model to be fair use, and they've explained how they got to that conclusion. So an argument to the contrary needs to draw and different line and explain why the line belongs there.