OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

Good. I hope this is what happens.

LLM algorithms can be maintained and sold to corpos to scrape their own data so they can use them for in house tools, or re-sell them to their own clients.
Open Source LLMs can be made available for end users to do the same with their own data, or scrape whats available in the public domain for whatever they want so long as they don't re-sell
Altman can go fuck himself

No amigo, it's not fair if you're profiting from it in the long run.

These fuckers are the first one to send tons of lawyers whenever you republish or use any IP of them. Fuck these idiots.

Good. If I ever published anything, I would absolutely not want it to be pirated by AI so some asshole can plagiarize it later down the line and not even cite their sources.

This is basically a veiled admission that OpenAI are falling behind in the very arms race they started. Good, fuck Altman. We need less ultra-corpo tech bro bullshit in prevailing technology.

Do you promise?!?!

I have conflicting feelings about this whole thing. If you are selling the result of training like OpenAI does (and every other company), then I feel like it’s absolutely and clearly not fair use. It’s just theft with extra steps.

On the other hand, what about open source projects and individuals who aren’t selling or competing with the owners of the training material? I feel like that would be fair use.

What keeps me up at night is if training is never fair use, then the natural result is that AI becomes monopolized by big companies with deep pockets who can pay for an infinite amount of random content licensing, and then we are all forever at their mercy for this entire branch of technology.

The practical, socioeconomic, and ethical considerations are really complex, but all I ever see discussed are these hard-line binary stances that would only have awful corporate-empowering consequences, either because they can steal content freely or because they are the only ones that will have the resources to control the technology.

Oh no, not the plagiarizing machine! How are rich hacks going to feign talent now? Pay an artist for it?! Crazy!

Open can suck some dick.

Fuck these psychos. They should pay the copyright they stole with the billions they already made. Governments should protect people, MDF

TLDR: "we should be able to steal other people's work, or we'll go crying to daddy Trump. But DeepSeek shouldn't be able to steal from the stuff we stole, because China and open source"

At the end of the day the fact that openai lost their collective shit when a Chinese company used their data and model to make their own more efficient model is all the proof I need they don't care about being fair or equitable when they get mad at people doing the exact thing they did and would aggressively oppose others using their own work to advance their own.

Good

Sounds fair, shut it down.

Why training openai with literally millions of copyrighted works is fair use, but me downloading an episode of a series not available in any platform means years of prison?

Fuck OpenAI for stealing the hard work of millions of people

Good. Fuck AI

If giant megacorporations can benefit by ignoring copyright, us mortals should be able to as well.

Until then, you have the public domain to train on. If you don't want AI to talk like the 1920s, you shouldn't have extended copyright and robbed society of a robust public domain.

To be fair, they’re not wrong. We need to find a legal comprise that satisfies everyone

Sounds good, fuck em

Vote pirate party.

What's wrong with the sentiment expressed in the headline? AI training is not and should not be considered fair use. Also, copyright laws are broken in the west, more so in the east.

We need a global reform of copyright. Where copyrights can (and must) be shared among all creators credited on a work. The copyright must be held by actual people, not corporations (or any other collective entity), and the copyright ends after 30 years or when the all rights holders die, whichever happens first. That copyright should start at the date of initial publication. The copyright should be nontransferable but it should be able to be licensed to any other entity only with a majority consent of all rights holders. At the expiration of the copyright the work in question should immediately enter the public domain.

And fair use should be treated similarly to how it is in the west, where it's decided on a case-by-case basis, but context and profit motive matter.

Musk has an AI project. Techbros have deliberately been sucking up to Trump. I’m pretty sure AI training will be declared fair use and copyright laws will remain the same for everybody else.

If I had to pay tuition for education (buying text books, pay for classes and stuff), then you have to pay me to train your stupid AI using my materials.

As an artist, kindly get fucked ass hole. I'd like compensation for all the work of mine you stole.

I love your name

"How are we supposed to win the race if we can't cheat?!"

Depends on if you consider teaching "cheating." Current AI is just learning material, similar to a human but at much faster rates and with a larger brain. Someone IS going to develop this tech. If you pay attention to the space at all, you'd know how rapidly it is developing and how much the competition in the space is heating up internationally. The East tends to have much more of a feeling of responsibility to the state, so if the state uses "their stuff" to train this extraordinarily powerful technology then they are going to be ok with that because it enhances their state in the world. The West seems to have more of an issue with this, and if you force the West to pay billions or trillions of dollars for everything to teach this system, then it simply either won't get done or will get done at a pace that puts the West at a severe disadvantage.

In my view, knowledge belongs to everyone. But I also don't want people more closely aligned with my ideals to be hobbled in the area of building these ultimate knowledge databases and tools. It could even be a major national security threat to not let these technologies develop in the way they need to.
- If the rules are "You gotta pay for the book" and they don't pay for the book, they broke the rules, that's what I consider cheating. I don't necessarily agree with the rule, I disagree with cheating. This is, of course, relative, as truth and morality in general are.

“The plagiarism machine will break without more things to plagiarize.”

In the early 80s I used to have fantasies about having a foster ~~robot~~ android that my family was teaching how to be a person. Oh the amusing mix-ups we got into! We could just do that. Train on experiential reality instead of on the dim cultural reflection of reality.

Edit: "robot" means "slave"

Good, end this AI bullshit, it has little upsides and a metric fuckton of downsides for the common man

It has some great upsides. But those upsides can be trained on specific information that they pay for instead of training AI on people's stuff who didn't consent.
You don't want to all become literally socially and mentally retarded together but apart?

Okay.

It was fun while it lasted.

For someone.

I presume.

But I can't pirate copyrighted materials to "train" my own real intelligence.

That's because the elites don't want you to think for yourself, and instead are designing tools that will tell you what to think.
Now you get why we were all told to hate AI. It's a patriot act for copywrite and IP laws. We should be able too. But that isn't where our discussions were steered was it
- It's copyright, not copywrite---you know, the right to copy. Copywriting is what ad people do. And what does this have to do with the PATRIOT Act?
- Man, what if we abolished copyright, but also banned gen AI completely. I think that would be the funniest answer.
True!
you can, however, go to your local library and read any book ever written for free
- Unless it's deemed a "bad" one by your local klanned karenhood and removed from the library for being tOo WoKe
- any book ever written
  
  Damn! Which library are you going to?!
- So can the AI

I am good with that.

dont threaten me with a good time

The only way this would be ok is if openai was actually open. make the entire damn thing free and open source, and most of the complaints will go away.

Truly open is the only way LLMs make sense.

They're using us and our content openly. The relationship should be reciprocal. Now, they need to somehow keep the servers running.

Perhaps a SETI like model?
- I mean, make em non profit (or not for profit) and perfecly good with that. Also open source the model so I can run it on my own hardware if I want to.

This is exactly what social media companies have been doing for a while (it’s free, yes) they use your data to train their algorithms to squeeze more money out of people. They get a tangible and monetary benefit from our collective data. These AI companies want to train their AI on our hard work and then get monetary benefit off of it. How is this not seen as theft or even if they are not doing it just yet…how is it not seen as an attempt at theft?

How come people (not the tech savvy) are unable to see how they are being exploited? These companies are not currently working towards any UBI bills or policies in governments that I am aware of. Since they want to take our work, and use it to get rich and their investors rich why do they think they are justified in using people’s work? It just seems so slime-y.

Capital calls its own theft "innovation" and that of the individual "crime".
They're actually not making money. They're losing money. Yes yes, I know they're raising billions of dollars, but that goes into the training of the these models which requires manpower and a massive amount of compute and energy. Yeah, they tend to charge to use it (but also offer free tiers) but this is to put back into training.

Here's the thing. The cat is out of the bag. It's coming one way or another, and it will either be by us, or it will be by not us.

I'd rather it be us. Id rather us not be so selfish and rather us be willing to contribute to this ultimate tool for the betterment of all.

Oh no anyway.jpg

"We can't succeed without breaking the law. We can't succeed without operating unethically."

I'm so sick of this bullshit. They pretend to love a free market until it's not in their favor and then they ask us to bend over backwards for them.

Too many people think they're superior. Which is ironic, because they're also the ones asking for handouts and rule bending. If you were superior, you wouldn't need all the unethical things that you're asking for.

Sounds like you are describing the orange baboon in the white house.
- these kinds of asshats are all the same. Only difference is the size of the hat.

Why does Sam have such a punchable face?

all billionaires do
- let's have a tier list of billionaires by face punchability.
- Yeah but his especially, it's so squishy.
- Keep your filthy paws away from my boy George Lucas
Cosmic justice?

over it is then. Buh bye!

Fine by me. Can it be over today?

I'll get the champagne for us and tissues for Sam.
- Unfortunately, the tissues have a 1000% tarrif. Perhaps sandpaper will do?
- Shit, save your $$$ and get some GPUs since the market would crash.
- I'll bring the meth

fucking thank goodness

Training that AI is absolutely fair use.

Selling that AI service that was trained on copyrighted material is absolutely not fair use.

Agreed... although I would go a step further and say distributing the LLM model or the results of use (even if done without cost) is not fair use, as the training materials weren't licensed.
- Ultimatelly it's "Doing Research that advances knowledge for everybody" that should be allowed free use of copyrighted materials, whils activities for direct or indirect commercial gains (included Research whose results are Patented and then licensed for a fee) should not, IMHO.

Business that stole everyone's information to train a model complains that businesses can steal information to train models.

Yeah I'll pour one out for folks who promised to open-source their model and then backed out the moment the money appeared... Wankers.

Oops, oh well. I very much hope it's over, asshole.

It will never be over. We will either be the ones dominant in this area, or it won't be us. If it's not us, well, the consequences could be dire.
- I fail to see the significance of not being dominant in bullshit generation, which is OpenAIs specialty.
  
  Non-LLM machine learning is more interesting, but "write me a poem about how you're my loving ai waifu" is just not a strategic resource.

So pirating full works for commercial use suddenly is "fair use", or what? Lets see what e.g. Disney says about this.

I'll take him seriously if & when OpenAI lives up to its name.

Copyrights should have never been extended longer than 5 years in the first place, either remove draconian copyright laws or outlaw LLM style models using copyrighted material, corpos can't have both.

So these companies are against what you call draconian, but you also disagree with these companies? Everyone here is so fucking short sighted, it's insane to me.
- The fact that you can't distinguish between being against something vs. being against a double-standard is insane to me.
Bro, what? Some books take more than 5 years to write and you want their authors to only have authorship of it for 5 years? Wtf. I have published books that are a dozen years old and I'm in my mid-30s. This is an insane take.
- The one I thought was a good compromise was 14 years, with the option to file again for a single renewal for a second 14 years. That was the basic system in the US for quite a while, and it has the benefit of being a good fit for the human life span--it means that the stuff that was popular with our parents when we were kids, i.e. the cultural milieu in which we were raised, would be public domain by the time we were adults, and we'd be free to remix it and revisit it. It also covers the vast majority of the sales lifetime of a work, and makes preservation and archiving more generally feasible.
  
  5 years may be an overcorrection, but I think very limited terms like that are closer to the right solution than our current system is.
- You don't have to stop selling when a book becomes public domain, publishers and authors sell public domain/commons books frequently, it's just you won't have a monopoly on the contents after the copyright expires.
I think copyright lasting 20 years or so is not unreasonable in our current society. I'd obviously love to live in a society where we could get away with lower. As a compromise, I'd like to see compulsory licensing applied to all copyrighted work. (E.g., after n years, anyone can use it if they pay royalties and you can't stop them; the amount of royalties gradually decreases until it's in the public domain.)
Using existing data on recordings and books we obtain a point estimate of around 15 years for optimal copyright term
- Thanks that's very insightful and I'll amend my position to 15 years 5 may be just a little zealous. 100 year US copyrights have been choking innovation due to things like Disney led trade group lobbyists, 15 years would be a huge boost to many creators being able to leverage more IPs and advancements being held in limbo unused or poorly used by corpo entities.
I agree that copyright is far too long, but at 5 years there's hardly incentive to produce. You could write a novel and have it only starting to get popular after 5 years.
- You don't have to stop selling when it becomes public domain, people sell books, movies, music, etc that are all in the public domain and people choose it over free versions all the time because of convenience, patroning arts, etc.
Send This comment To the top
I think 5 years is a bit short.
the issue is that foreign companies aren't subject to US copyright law, so if we hobble US AI companies, our country loses the AI war

I get that AI seems unfair, but there isn't really a way to prevent AI scraping (domestic and foreign) aside from removing all public content on the internet

Slave owners might go broke after abolition? 😂

I'm going to have to remember this

What a giant load of crap.

No, actually they've just finally admitted that they can't improve them any further because there's not enough training data in existence to squeeze any more demonizing returns out of.

I'm fine with this. "We can't succeed without breaking the law" isn't much of an argument.

Do I think the current copyright laws around the world are fine? No, far from it.

But why do they merit an exception to the rules that will make them billions, but the rest of us can be prosecuted in severe and dramatic fashion for much less. Try letting the RIAA know you have a song you've downloaded on your PC that you didn't pay for - tell them it's for "research and training purposes", just like AI uses stuff it didn't pay for - and see what I mean by severe and dramatic.

It should not be one rule for the rich guys to get even richer and the rest of us can eat dirt.

Figure out how to fix the laws in a way that they're fair for everyone, including figuring out a way to compensate the people whose IP you've been stealing.

Until then, deal with the same legal landscape as everyone else. Boo hoo

🌏👨‍🚀🔫👨‍🚀🌌
I also think it's really rich that at the same time they're whining about copyright they're trying to go private. I feel like the 'Open' part of OpenAI is the only thing that could possibly begin to offset their rampant theft and even then they're not nearly open enough.
- They are not releasing anything of value in open source recently.
  
  Sam altman said they were on the wrong side of history about this when deepseek released.
  
  They are not open anymore I want that to be clear. They decided to stop releasing open source because 💵💵💵💵💵💵💵💵.
  
  So yeah I can have huge fines for downloading copyrighted material where I live, and they get to make money out of that same material without even releasing anything open source? Fuck no.

That's a good litmus test. If asking/paying artists to train your AI destroys your business model, maybe you're the arsehole. ;)

Not only that, but their business model doesn't hold up if they were required to provide their model weights for free because the material that went into it was "free".
- There's also an argument that if the business was that reliant on free things to start with, then it shouldn't be a business.
  
  No-one would bat their eyes if the CEO of a real estate company was sobbing that it's the end of the rental market, because the company is no longer allowed to get houses for free.
- even the top phds can learn things off the amount of books that openai could easily purchase, assuming they can convince a judge that if the works aren't pirated the "learning" is fair use. however, they're all pirating and then regurgitating the works which wouldn't really be legal even if a human did it.
  
  also, they can't really say how they need fair use and open standards and shit and in the next breathe be begging trump to ban chinese models. the cool thing about allowing china to have global influence is that they will start to respect IP more... or the US can just copy their shit until they do.
  
  imo that would have been the play against tik tok etc. just straight up we will not protect the IP of your company (as in technical IP not logo, etc.) until you do the same. even if it never happens, we could at least have a direct tik tok knock off and it could "compete" for american eyes rather than some blanket ban bullshit.
This particular vein of "pro-copyright" thought continuously baffles me. Copyright has not, was not intended to, and does not currently, pay artists.

Its totally valid to hate these AI companies. But its absolutely just industry propaganda to think that copyright was protecting your data on your behalf
- Copyright has not, was not intended to, and does not currently, pay artists.
  
  You are correct, copyright is ownership, not income. I own the copyright for all my work (but not work for hire) and what I do with it is my discretion.
  
  What is income, is the content I sell for the price acceptable to the buyer. Copyright (as originally conceived) is my protection so someone doesn't take my work and use it to undermine my skillset. One of the reasons why penalties for copyright infringement don't need actual damages and why Facebook (and other AI companies) are starting to sweat bullets and hire lawyers.
  
  That said, as a creative who relied on artistic income and pays other creatives appropriately, modern copyright law is far, far overreaching and in need of major overhaul. Gatekeeping was never the intent of early copyright and can fuck right off; if I paid for it, they don't get to say no.
- Copyright has not, was not intended to, and does not currently, pay artists.
  
  Wrong in all points.
  
  Copyright has paid artists (though maybe not enough). Copyright was intended to do that (though maybe not that alone). Copyright does currently pay artists (maybe not in your country, I don't know that).
Interesting copyright question: if I own a copy of a book, can I feed it to a local AI installation for personal use?

Can a library train a local AI installation on everything it has and then allow use of that on their library computers? <— this one could breathe new life into libraries
- First off, I'm by far no lawyer, but it was covered in a couple classes.
  
  According to law as I know it, question 1 yes if there is no encryption, and question 2 no.
  
  In reality, if you keep it for personal use, artists don't care. A library however, isn't personal use and they have to jump through more hoops than a circus especially when it comes to digital media.
  
  But you raise a great point! I'd love to see a law library train AI for in-house use and test the system!
No, it means that copyrights should not exist in the first place.

I mean, if they are allowed to go forward then we should be allowed to freely pirate as well.

In the end, we're just training some non-artifical intelligence.
- Yeah, you can train your own neural network on pirated content, all right, but you better not enjoy that content at the same time or have any feelings while watching it, because that's not covered by "training".
Don't worry: the law will be very carefully crafted so that it will be legal only if they do it, not us.

Where are the copyright lawsuits by Nintendo and Disney when you need them lol

Come on guys, his company is only worth $157 billion.

Of course he can't pay for content he needs for his automated bullshit machine. He's not made of money!

Company burning stacks of hundred dollar bills to generate power to run hallucination machine worth $157 billion. What a world.

Good.

Fuck Sam Altman's greed. Pay the fucking artists you're robbing.

That sounds like a you problem.

"Our business is so bad and barely viable that it can only survive if you allow us to be overtly unethical", great pitch guys.

I mean that's like arguing "our economy is based on slave plantations! If you abolish the practice, you'll destroy our nation!"

Good point. I've never seen it framed this way before. Poignant.
- Thanks, heh, I just came back to look at what I'd written again, as it was 6am when I posted that, and sometimes I say some stupid shit when I'm still sleepy. Nice to know that I wasn't spouting nonsense.

So pirating full works suddenly is fair use, or what?

Only if you're doing it to learn, I guess

Wait until all those expensive scientific journals hear about this

What if we had taken the billions of dollars invested in AI and invested that into public education instead?

Imagine the return on investment of the information being used to train actual humans who can reason and don’t lie 60% of the time instead of using it to train a computer that is useless more than it is useful.

But you have to pay humans, and give them bathroom breaks, and allow them time off work to spend with their loved ones. Where's the profit in that? Surely it's more clever and efficient to shovel time and money into replacing something that will never be able to practically develop beyond current human understanding. After all, we're living in the golden age of humanity and history has ended! No new knowledge will ever be made so let's just make machines that regurgitate our infallible and complete knowledge.

God forbid you offer to PAY for access to works that people create like everyone else has to. University students have to pay out the nose for their books that they "train" on, why can't billion dollar AI companies?

looks good

If artificial intelligence can be trained on stolen information, then so should be "natural" intelligence.

Oh, wait. One is owned by oligarchs raking in billions, the other just serves the plebs.

couldnt' have said it better...the irony...

If everyone can 'train' themselves on copyrighted works, then I say "fair game.''

Otherwise, get fucked.

Over in the US, that's giving China the advantage in AI development. Won't happen.

It's it's like USA adopting China's IP laws.

If I'm using "AI" to generate subtitles for the "community" is ok if i have a large "datastore" of "licensable media" stored locally to work off of right?

If your business model only works if you break the Law, that mean's you're just another Organised Crime group.

Organized crime exists to make money; the way OpenAI is burning through it, they're more Disorganized Crime

Gentlemen, this is democracy manifest!

What is the charge, officer? Eating a meal? A succulent Chinese meal?

Look we may have driven Aaron Swartz to suicide for doing basically the same thing on a smaller scale, but dammit we are getting very rich of this. And, if we are getting rich, then it is okay to break the law while actively fucking over actually creative people. Trust us. We are tech bros and we know what is best for you is for us to become incredibly rich and out of touch. You need us.

In case anyone is unfamiliar, Aaron Swartz downloaded a bunch of academic journals from JSTOR. This wasn't for training AI, though. Swartz was an advocate for open access to scientific knowledge. Many papers are "open access" and yet are not readily available to the public.

Much of what he downloaded was open-access, and he had legitimate access to the system via his university affiliation. The entire case was a sham. They charged him with wire fraud, unauthorized access to a computer system, breaking and entering, and a host of other trumped-up charges, because he...opened an unlocked closet door and used an ethernet jack from there. The fucking Secret Service was involved.

https://en.wikipedia.org/wiki/Aaron_Swartz#Arrest_and_prosecution

The federal prosecution involved what was characterized by numerous critics (such as former Nixon White House counsel John Dean) as an "overcharging" 13-count indictment and "overzealous", "Nixonian" prosecution for alleged computer crimes, brought by then U.S. Attorney for Massachusetts Carmen Ortiz.

Nothing Swartz did is anywhere close to the abuse by OpenAI, Meta, etc., who openly admit they pirated all their shit.
- You're correct that their piracy was on a much more egregious scale than what Aaron did, but they don't openly admit to their piracy. Meta just argued that it isn't piracy because they didn't seed.
  
  Edit: to be clear. I don't think that Aaron Swartz did anything wrong. Unlike the chatGPT, meta, etc.

Sam Altman is a grifter, but on this topic he is right.

The reality is, that IP laws in their current form hamper innovation and technological development. Stephan Kinsella has written on this topic for the past 25 years or so and has argued to reform the system.

Here in the Netherlands, we know that it's true. Philips became a great company because they could produce lightbulbs here, which were patented in the UK. We also had a booming margarine business, because we weren't respecting British and French patents and that business laid the foundation for what became Unilever.

And now China is using those exact same tactics to build up their industry. And it gives them a huge competitive advantage.

A good reform would be to revert back to the way copyright and patent law were originally developed, with much shorter terms and requiring a significant fee for a one time extension.

The current terms, lobbied by Disney, are way too restrictive.

I totally agree. Patents and copyright have their place, but through greed they have been morphed into monstrous abominations that hold back society. I also think that if you build your business on crawled content, society has a right to the result to a fair price. If you cannot provide that without the company failing, then it deserves to fail because the business model obviously was built on exploitation.
- I agree, which is why I advocate for reform, not abolishment.
  
  Perhaps AI companies should pay a 15% surcharge on their services and that money goes directly into the arts.
That's not fair to change the system only when businesses require it. I received a fuckin' letter from a government entity where I live for having downloaded the trash tier movie "Demolition".

I agree copyright and patents are bad but it's so infuriating that only the rich and powerful can choose not to respect it.

So I think openAI has to pay because as of now that shitty copyright and patent system is still there and has hurt many individuals around the world.

We should try to change the laws for copyright but after the big businesses pay their due.
But Sam is talking about copyright and all your examples are patents
- It just so happens that in AI it's about copyright and with margarine (and most other technologies) it's about patents.
  
  But the point is the same. Technological development is held back by law in both cases.
  
  If all IP laws were reformed 50 years ago, we would probably have the technology from 2050, today.
- It's all the same shit. No patents and copyrights should exist.
I mean, I'd say there's a qualitative difference between industrial products and a novel, for example.
Lmao Sam Altman doesn't want tbe rules chanhed for you. He wants it changed for him.

You will still be beholden to the laws.

That's like calling stealing from shops essential for my existence and it would be "over" for me if they stop me. The shit these clowns say is just astounding. It's like they have no morals and no self awareness and awareness for people around them.

That's like calling stealing from shops essential for my existence and it would be "over" for me if they stop me.

What's really fucked up is that for some people this is not far from their reality at all
In America, companies have more rights than the human person.

If companies say that they need to do something to survive, that makes it ok. If a human needs to do something to survive, that's a crime.

Know the difference. (/s)
I think they are either completely delusional, or they know very well how important AI is for the government and the military. The same cannot be said for regular people and their daily struggles.
It’s like stealing from shops except the shops didn’t lose anything. You’re up a stolen widget, but they have just as many as before.
Copyright should not exist in the first place.
- It should exist

Come on bro, let us pirate bro, just one more ngram of books bro

Suddenly millions of people are downloading to "train their AI models".

Sounds like another way of saying "there actually isn't a profitable business in this."

But since we live in crazy world, once he gets his exemption to copyright laws for AI, someone needs to come up with a good self hosted AI toolset that makes it legal for the average person to pirate stuff at scale as well.

I mean, pirating media at scale for your own consumption can be considered "training of a neural network" as well..
- First step, be a business. Second step, accept Trump's dick in your ass. Congratulations, here's your "get out of jail free" card.
- Also, pirating media at scale isn't that hard to do right now anyway lol

For Sam:

If training an ai on copyrighted material is fair use, then piracy is archiving

I'm fine with that haha

Good if AI fails because it can't abuse copyright. Fuck AI.

*except the stuff used for science that isn't trained on copyrighted scraped data, that use is fine

Yeah unfortunately we’ve started calling any LLM “AI”
- In ye old notation ML was a subset of AI, and thus all LLM would be considered AI. It's why manual decision trees that codify get NPC behaviour are also called AI, because it is.
  
  Now people use AI to refer only to generative ML, but that's wrong and I'm willing to complain every time.

So Deepmind is good to train on your models then right?

Oh, so now you're just going to surrender our precious natural resources to the Imperialist Chinese?!

Guys, I think we've got a Wumao over here. Someone get what's left of the FBI to arrest him and show his ass the fucking door.

I hope generative AI obliterates copyright. I hope that its destruction is so thorough that we either forget it ever existed or we talk about it in disgust as something that only existed in stupider times.

Thing is that copywrite did serve a purpose and was for like 20 years before disney got it extended to the nth degree. The idea was the authors had a chance to make money but were expected to be prolific enough to have more writings by the time 20 years was over. I would like to see with patents that once you get one you have a limited time to go to market. Maybe 10 years and if you product is ever not available for purchase (at a cost equivalent to the average cost accounted for inflation or something) you lose the patent so others can produce it. So like stop making an attachment for a product and now anyone can.
- "Thing is, land ownership also served a purpose before lord's/landlord's/capitalists decided to expand it to the point of controlling and dictating the lives of serfs/renters/workers. "
  
  Creation's are not that of only the individual creator, they come from a common progress, culture, and history. When individual creator's copyright their works and their works become a major part of common culture they slice up culture for themselves, dictating how it may be used against the wishes of the masses. Desiring this makes them unworthy of having any cultural control IMO. They become just as much of an authoritarian as a lord, landlord, or capitalist.
  
  In fact, I'd go so far as to say that copyright also harms individual creators once culture has been carved up: Producing brand new stories inevitably are in some way derivative of previous existing works so because they are locked out of the existing IP unless they sign a deal with the devil they're usually doomed to failure due to no ability to have a grip on cultural relevance.
  
  Now, desiring the ability to make a living being an individual creator? That's completely reasonable. Copyright is not the solution however.
- The problem with these systems is that the more they are bureaucratized and legalized, the more publishing houses and attorney's offices will ultimately dictate the flow of lending and revenue. Ideally, copywrite is as straighforward as submitting a copy of your book to the Library of Congress and getting a big "Don't plagiarize this" stamp on it, such that works can't be lifted straight from one author by another. But because there's all sorts of shades of gray - were Dan Brown and JK Rowling ripping off the core conceits of their works, or were religious murder thrillers and YA wizard high school books simply done to death by the time they went mainstream? - a lot of what constitutes plagarism really boils down to whether or not you can afford extensive litigation.
  
  And that's before you get into the industrialization of ghostwriters that end up supporting "prolific" writers like Danielle Steele or Brian Sanderson or R.L. Stein. There's no real legal protection for staff writers, editors, and the like. The closest we've got is the WGA, and that's more exclusive to Hollywood.
Interesting take. I'm not opposed, but I feel like the necessary reverse engineering skill base won't ramp up enough to deal with SAS and holomorphic encryption. So, in a world without copyright, you might be able to analog hole whatever non-interactibe media you want, but software piracy will be rendered impossible at the end of the escalation of hostilities.

Copyright is an unnatural, authoritarian-imposed monopoly. I doubt it will last forever.
- Copyright is a good idea. It was just stretched beyond all reasonable expectations. Copyright should work like Patents. 15 years. You get one, and only one, 15 year extension. At either the 15 or 30 year mark, the work enters the public domain.
I find that very unlikely to happen. If AI is accepted as fair use by the legal system, then that means they have a motive to keep copyright as restrictive as possible; it protects their work but allows them to use every one else's. If you hate copyright law (and you should) AI is probably your enemy, not your ally.
- I suspect your assessment is at best subconsciously biased and at worst in bad faith. You'll need to elaborate on the mechanism of how they'd "keep copyright as restrictive as possible" in a world where it is not possible to copyright AI generated works.

Let's say I write a book.

If I don't want people copying it, people shouldn't be copying it. I don't care if it's been 500 years. It's my book.

This is a weird thread. Lots of people for artists losing control of their creations quickly while simultaneously against artist creations being used by others without consent. Just my perspective but why should artists lose control of their own creations at all? The problem in copyright is tech companies doing patent thickets; not artists.

Even artistic creations held by corporations. Waiting for Marvel stuff to hit public domain to publish a bunch of Marvel novels since they can't protect their creations any more? Why is that acceptable? If someone creates something and doesn't want it stolen, I don't give a fuck what the law says, stealing it is theft. The thief should instead be using Marvel stuff as inspiration as they make their own universe; not just waiting an amount of time before stealing someone else's creation without consent. It isn't holding progress back at all to make novel artistic creations instead of steal others. Art = very different from tech.

when I publish a book, to steal it is consenting to be Luigi'd; no matter how long ago it came out.

So, did we win?

Why does Sam keep threatening us with a good time?

Whoever brings Aaron Swartz back gets to violate all the copyright laws

Aaron Swartz was 100% opposed to all copyright laws, you remember that yah?
- Yes, and he killed himself after the FBI was throwing the book at him for doing exactly what these AI assholes are doing without repercussion
- I'm not just a copyright abolitionnist, I also abhor all intellectual property. Yes, even trademsrk
- And he also said "child pornography is not necessarily abuse."
  
  In the US, it is illegal to possess or distribute child pornography, apparently because doing so will encourage people to sexually abuse children.
  
  This is absurd logic. Child pornography is not necessarily abuse. Even if it was, preventing the distribution or posession of the evidence won't make the abuse go away. We don't arrest everyone with videotapes of murders, or make it illegal for TV stations to show people being killed.
  
  Wired has an article on how these laws destroy honest people's lives.
  
  https://web.archive.org/web/20130116210225/http://bits.are.notabug.com/
  
  Big yikes from me whenever I see him venerated.

Good riddance. This version of AI is just a glorified search engine anyways

And it's not even a good search engine either. It just spits out sarcastic jokes from barely up voted reddit posts.

Then die. I don't know what else to tell you.

If your business model is predicated on breaking the law then you don't deserve to exist.

You can't send people to prison for 5 years and charge them $100,000 for downloading a movie and then turn around and let big business do it for free because they need to "train their AI model" and call one of thief but not the other...

Absolutely. But in this case the law is also shit and needs to be reformed. I still want to see Altman fail, because he's an asshole. But copyright law in its current form is awful and does hold back society.
If your business model is predicated on breaking the law then you don’t deserve to exist.

All of Wall Street sweating nervously
The law isn't automatically moral.

This issue just exposes how ridiculous copyright law is and how much it needs to be changed. It exists specifically to allow companies to own, for hundreds of years, intellectual property.

It was originally intended to protect individual artists but has slowly mutated to being a tool of corporate ownership and control.

But, people would rather use this as an opportunity to dunk on companies trying to develop a new technology rather than as an object lesson in why copyright rules are ridiculous.
- I don't disagree but the idea being that the law is made by supposedly moral men and that law is at least moral within the perspective and context of society at the time.
It's literally worse than piracy, since the AI companies are also trying to sell shittier versions of the works they copy from

Like selling camrips except done by multi-billion dollar companies ripping off individuals and stores are trying to put them right next to the original DVDs in the store

I'm fine for them to use copyrighted material, provided that everyone can do the same without reprecautions Fuck double standards. Fuck IP. People should have access to knowledge without having to pay.

PS. I know this might be an unpopular opinion

Edit: typos

On the other side, creators should be paid for their labor.
- I couldn't agree more. The thing with IP is that it tends to last almost forever, thus it almost never enters public domain, at least in a man's lifetime. The result is it stifles innovation and prevents knowledge NAD entertainment to the masses. Lastly almost always, it's not the creator that benefits of it, rather than a huge corp

Technofascism on its way to legalize my 30TB trove of backups

I need a seamstress AI to take over 10 million seamstress robots so I don't have to pay 100million seamstresses for fruit of the loom underwear.... Could you tech it how to do double well and then back up at each end with some zigzags? For free? I mean everyone knows zigzag!

Apparantly their trying to get Deepseek banned again, really doesn't like competition this guy.

This sounds like socialism is good for capitalists

Oh no! How will I generate a picture of Sam Altman blowing himself now!?

Photoshop, just like the rest of us.
Wdym? He removed his rib or something?
- I was thinking more of a Sam 1 and Sam 2 type situation.

I don’t think they’re wrong in saying that if they aren’t allowed to train on copyrighted works then they will fall behind. Maybe I missed it in the article, but Japan for example has that exact law (use of copyright to train generative AI is allowed).

Personally I think we need to give them somewhat of an out by letting them do it but then taxing the fuck out of the resulting product. “You can use copyrighted works for training but then 50% of your profits are taxed”. Basically a recognition that the sum of all copyrighted works is a societal good and not just an individual copyright holders.

https://jackson.dev/post/generative-ai-and-copyright/

No, taxes implies a monopoly on the training data. The government profits. The rights holders get nothing back.

If private data is deemed public for AI training then the results of that training (code+weights+source list) should also be deemed public.
- fully agree, the only way I'm ok with fair use for AI is if the resulting product is public use. Even if they want to charge for the product to use their frontend, give the ability to use the system local (if your system can support it) much like how most self hosting software does it
50% is too little if you want to allow that

Good. Fuck off.

So all I need to do if I get caught torrenting a movie is say that im training an LLM for subtitles?

I find it odd that Lemmy users are so adverse to tech.

Sorry to say, but he's right. For AI to truly flourish in the West, it needs access to all previously human made information and media.

And as the rest of the conversation points out, if it's so important that for profit corporations can ignore copyright law, there is no justifying reason for the same laws to apply to any other content creators or consumers. Corporations are the reason copyright law is so draconic and stiffles innovation on established ideas, so to unironically say it makes their business model unsustainable is just rich.
Then it's a good thing they won't get it.
For a lot of things to truly flourish, copyright law has to be appended. But the exception is made specifically for AI because that's the thing billionaires can afford to develop while the rest cannot. This is a serious driver for inequality, and it is not normal some people can twist the law as they see fit.

"How am I supposed to make any money if I can't steal all of my products to sell back to the world that produced them?"

Yeah, fuck that. The whole industry deserves to die.

Sadly this comes down to OpenAI petitioning Trump, and expecting trump to do anything that could stop a scam like AI is pointless.

If AI gets to use copyrighted material for free and makes a profit off of the results, that means piracy is 1000% Legal. Excuse me while I go and download a car!!

No, stop! You wouldn't!
- I would, and a house. I'm a menace!
All you have to do is present credible evidence that these companies are distributing copyrighted works or a direct substitute for those copyrighted works. They have filters to specifically exclude matches though, so it doesn’t really happen.

It's so wild how laws just have no idea what to do with you if you just add one layer of proxy. "Nooo I'm not stealing and plagerizing, it's the AI doing it!"

Porque no los dos?

The ai race is over AND we abolish the copyright bullshit laws we have now?

I always felt using publicly available but copyrighted works could be ok but only if the model is publicly available as well

My main takeaway is that some contrived notion of "national security" has now become an acceptable justification for business decisions in the US.

Perhaps this is just a problem with the way the model works. Always requiring new data and unable to use current data, to ponder and expand upon while making new connections about ideas that influenced the author… LLM’s are a smoke and mirrors show, not a real intelligence.

They do seem fundamentally limited somehow. With all the bazillion watts they are cheap imitation at best compared to mere 20 Watts of human brain
- or it might be playing dumb...

Then perish, OpenAI. If your only innovation is a legal loophole then you did nothing.

🤞🤞 🙏

How many pages has a human author read and written before they can produce something worth publishing? I’m pretty sure that’s not even a million pages. Why does an AI require a gazillion pages to learn, but the quality is still unimpressive? I think there’s something fundamentally wrong with the way we teach these models.

To be fair, that's all they have to go on. If a picture's worth a thousand words, how many pages is a lifetime (or even a childhood) of sight and sound?
- That’s a good point. A human author would be influenced by life in general, not just the books.
Because an AI is not a human brain?

It's impressive how the technology have advanced in the last years. But obviously it is not a human brain.
Why does an AI require a gazillion pages to learn, but the quality is still unimpressive?

Because humans learn how to read and interpret those pages in school. Give that book to a toddler and not much will happen other than some bite marks.

AI needs to learn the language structure, grammar, math, logic, reasoning, problem solving and much more before it can even be trained with anything useful. Humans take years to acquire those skills, AI takes more content but can do that training much faster.

Maybe it is the wrong way to train machines but for now we have not invented robot schools yet so it's the best we got.

By the way, I still think companies should be banned from training with copyrighted content and user data behind closed doors. Keep your models in public domain or get out.
The more important question is: Why can a human absorb a ton of material in their learning without anyone crying about them "stealing"? Why shouldn't the same go for AI? What's the difference? I really don't understand the common mindset here. Is it because a trained AI is used for profit?
- Is it because a trained AI is used for profit?
  
  Absolutely. But especially because it skews the market dynamic. Copyright doesn't exist for moral reasons but financial reasons.
- There is a difference between me reading a book and learning from it and one of the biggest companies in the world pirating millions of books for their business. And it really gets bad when normal users are getting sued for tenthousands of dollars when they download a book or a MP3 and Meta is getting defended for doing the same thing, but in a much larger scale.
  
  Yes, we know that copyright is broken. But if it is broken, it has to be broken for all
- What you're talking about is if AI is actually inventing new work (imo, yes it is), but that's not the issue.
  
  The issue is these models were trained on our collective knowledge & culture without permission, then sold back to us.
  
  Unless they use only proprietary & public training data, every single one of these models should be open sourced/weighted & free for anyone to use, like libraries.
- I’ve been thinking about that as well. If an author has bought 500 books, and read them, it’s obviously going to influence the books they write in the future. There’s nothing illegal about that. Then again, they did pay for the books, so I guess that makes it fine.
  
  What if they got the books from a library? Well, they probably also paid taxes, so that makes it ok.
  
  What if they pirated those books? In that case, the pirating part is problematic, but I don’t think anyone will sue the author for copying the style of LOTR in their own works.
- It is because a human artist is usually inspired and uses knowledge to create new art and AI is just a mediocre mimic. A human artist doesn't accidentally put six fingers on people on a regular basis. If they put fewer fingers it is intentional.

Its simple really. We need to steal from humans who make things to train our computers to make things so the computers can replace the humans who make things and we dont want to pay the humans who made the original things because they will be replaced soon enough anyway. Easy peasy. Do you guys even capitalism?

His personal race is over? Oooohhhh, so sorry for him.

AI is not over at all. Maybe he himself will not become the ruler of the world now. No loss.

Yeah, China sure as shit isn't going to lose sleep over a US Copyright case.

National security my ass. More like his time span to show more dumb "achievements" while getting richer depends on it and nothing else

Corporations trying to profit by closing off vast tracts of human output are bumping into other corporations trying to mine it for profit.

Fair use doesn't mean shit if you're a pirate.

Arr, matey.

Guarantee their plan is to blow through copyright laws to create a monopoly fiefdom, close the door behind them and demand that copyright is used to protect the work their LLM creates.

yeah thats crazy

Good, fuck "AI" fuck copyright, fuck patents, fuck proprietary closed-source software, fuck capitalism, fuck billionaires, and fuck you, Sam, in particular.

AI always been about using stolen stuff

He means development of AI in public view is over. Governments will continue without regard for copyright protections until we are all dead.

Alright, I confess! Almost all of my training in computer programming came from copyrighted material. Put the cuffs on me!

You were trained and learned and are able to create new things.

AI poorly mimics thngs it has seen before.
- The issue being raised is copyright infringement, not the quality of the results. Writers "borrow" each other's clever figures of speech all the time without payment or attribution. I'm sure I have often copypasted code without permission. AI does nothing on its own, it's always a tool used by human beings. I think the copyright argument against AI is built on a false distinction between using one tool vs another.
  
  My larger argument is that nobody has an inherent right to control what everybody else does with some idea they've created. For many thousands of years people saw stuff and freely imitated it. Oh look, there's an "arch" - I think I'll build a building like that. Oh look, that tribe uses that root to make medicine, let's do the same thing. This process was known as "the spread of civilization" until somebody figured out that an authority structure could give people dibs on their ideas and force other people to pay to copy them. As we evolve more capabilities (like AI) I think it's time to figure out another way to reward creators without getting in the way of improvement, instead of hanging onto a "Hey, that's Mine!" mentality that does more to enrich copy producers than it does to enrich creators.

Sad to see you leave (not really, tho'), love to watch you go!

Edit: I bet if any AI developing company would stop acting and being so damned shady and would just ASK FOR PERMISSION, they'd receive a huge amount of data from all over. There are a lot of people who would like to see AGI become a real thing, but not if it's being developed by greedy and unscrupulous shitheads. As it stands now, I think the only ones who are actually doing it for the R&D and not as eye-candy to glitz away people's money for aesthetically believable nonsense are a handful of start-up-likes with (not in a condescending way) kids who've yet to have their dreams and idealism trampled.

In Spain we trained an AI using a mix of public resources available for AI training and public resources (legislation, congress sessions, etc). And the AI turned out quite good. Obviously not top of the line, but very good overall.

It was a public project not a private company.
But what data would it be?

Part of the "gobble all the data" perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.

OTOH maybe there's probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the "expert system" with clear guardrails, the image generator that only does seaside background landscapes but can't generate a cat to save its life, the LLM that's a prettified version of a knowledgebase search and NOTHING MORE
- You've highlighted exactly why I also fundamentally disagree with the current trend of all things AI being for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn't even be an issue in the first place... It'd be like literally giving someone access to a public library.
  
  Edit: but to focus on this specific instance, where we have to deal with the here-and-now, I could see them receiving, say, 60-75% of what they have now, hassle-free. At the very least, and uniformly distributed. Again, AI development isn't what irks most people, it's calling plagiarism generators and search engine fuck-ups AI and selling them back to the people who generated the databases - or, worse, working toward replacing those people entirely with LLMs! - they used for those abhorrences.
  
  Train the AI to be factually correct instead and sell it as an easy-to-use knowledge base? Aces! Train the AI to write better code and sell it as an on-board stackoverflow Jr.? Amazing! Even having it as a mini-assistant on your phone so that you have someone to pester you to get the damned laundry out of the washing machine before it starts to stink is a neat thing, but that would require less advertising and shoving down our throats, and more accepting the fact that you can still do that with five taps and a couple of alarm entries.
  
  Edit 2: oh, and another thing which would require a buttload of humility, but would alleviate a lot of tension would be getting it to cite and link to its sources every time! Have it be transformative enough to give you the gist without shifting into plagiarism, then send you to the source for the details!

They are US based right?

So they literally do whatever they want anyway regardless of what any law might say.

Race over, eh? Welp, see ya later!

Oh no! How could we ever live without AI?

🏴‍☠️

He's right tho. China don't care. You think the west will be able to outcompete China with such limitations?

And the end result is the same, no one was compensated and a dictatorship is running one of the most important new IT tools.

Ip should solely be with the creator and not the corporation that owns that creator. A lot of problems in stems is IP held hostage by the corporations and by publishing companies of research papers

Is that a promise?

As far as the ai industry has already broken copyright laws. It will not be actually intelligent for a long time. Just like crypto this seems like a global scam that has squandered resources for a dream of a free workforce. Instead of working together to try and create an ai there are lots of technology companies doing the same ineffective bull 🤔

Oh yes. Deepseek can quote from copyright sources. So can openAI models, but they are programmed not to.

Facebook trained on the torrent of Annas archive.

The copyright horse has left the stable.

Arr, matey.

He's afraid of losing his little empire.

OpenAI also had no clue on recreation the happy little accident that gave them chatGPT3. That's mostly because their whole thing was using a simple model and brute forcing it with more data, more power, more nodes and then even more data and power until it produced results.

As expected, this isn't sustainable. It's beyond the point of decreasing returns. But Sam here has no idea on how to fix that with much better models so goes back to the one thing he knows: more data needed, just one more terabyte bro, ignore the copyright!

And now he's blaming the Chinese into forcing him to use even more data.

Please, let it be over. Idiotic "ai"....

chatgpt is stagnant, the newest model was lackluster despite using way more resources and costing a shitload of cash, Altman is floundering and on his way out he’s going to try do some lobbying bullshit

Copyright is bullshit and honestly if it disappeared it would help small creators more than anything but openai is not a small creator and guaranteed they will lobby for only huge corps like them to get such an exception. You and I will still get sued to shit by disney or whoever for daring to make $500 off of a shitty project that used some sample or something while meta and openai get free reign to steal the entirety of humanity’s creative output with no recompense

Idk about that, but openai is probably over

Yes, please.

Are they listening to themselves?

Many of you are completely two-faced on copyright laws.

So what Altman is saying here is that without the low hanging fruit of human generated training data, the AI race is over.

He's either full of shit or this AI bubble is about to burst.

Stop the count!

We need to annect Austria, Czechoslovak Republic and Poland otherwise China will do it first.
Hail Hydra

Oh noooooooooooo.

/s.

Strange that no one mentioned openai making money off copyrighted works.

Okay, bye!

I mean if they pay for it like everyone else does I don't think it is a problem. Yes it will cost you billions and billions to do it correctly, but then you basically have the smartest creature on earth (that we know of) and you can replicate/improve on it in perpetuity. We still will have to pay you licensing fees to use it in our daily lives, so you will be making those billions back.

Now I would say let them use anything that is old and freeware, textbooks, etc. government owned stuff - we sponsored it with our learning, taxes - so we get a percentage in all AI companies. Humanity gets a 51% stake in any AI business using humanity's knowledge, so we are then free to vote on how the tech is being used and we have a controlling share, also whatever price is set, we get half of it back in taxes at the end of the year. The more you use it the more you pay and the more you get back.

They're unprofitable as it is already. They're not going to be able to generate enough upfront capital to buy and then enclose all of humanity's previous works to then sell it back to us. I also think it would be heinous that they could enclose and exploit our commons in this manner. It belongs to all of us. Sure train it and use it, but also release it open (or the gov can confiscate it, fine with that as well). Anything but allowing those rat-snakes to keep it all for themselves.
- They can be even more unprofitable like Amazon was for years and years - and now they print money. I don't think it's a bad model, but it's gonna come down to just a couple governments/companies having powerful AIs where we are not needed anymore - so if it's privately owned it would spell doom for the human species or at least a huge portion of it, potential enslavement as well.
If it costs billions and billions, then only a handful of companies can afford to build an AI and they now have a monopoly on a technology that will eventually replace a chunk of the workforce. It would basically be giving our economy to Google.
- Yep exactly, that's why you make it people owned. What is your alternative ? They do have companies/governments that can afford it even at these steep prices.
The owners of the copyrighted works should be paid in perpetuity too though, since part of their work goes into everything the AI spits out.
- I don't see why I'm downvoted for this, but I don't agree with this opinion - it's like teaching a human being. If you buy everything once it's still a hell of a bill - we are talking all books, all movies, all games, all software, all memes, all things - 1 of each is still trillions if you legally want to train your new thing on it.