It's so ridiculous when corporations steal everyone's work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it's somehow illegal, unethical, immoral and what not.
Using publically available data to train isn't stealing.
Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can't use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.
They want to kill the open-source scene and are manipulating you to do so. Don't build their moat for them.
OpenAI is definitely not the one arguing that they have stole data to train their AIs, and Disney will be fine whether AI requires owning the rights to training materials or not. Small artists, the ones protesting the most against it, will not. They are already seeing jobs and commission opportunities declining due to it.
Being publicly available in some form is not a permission to use and reproduce those works however you feel like. Only the real owner have the right to decide. We on the internet have always been a bit blasé about it, sometimes deservedly, but as we get to a point we are driving away the very same artists that we enjoy and get inspired by, maybe we should be a bit more understanding about their position.
That depends on what your definition of "publicly available" is. If you're scraping New York Times articles and pulling art off Tumblr then yeah, it's exactly stealing in the same way scihub is. Only difference is, scihub isn't boiling the oceans in an attempt to make rich people even richer.
We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.
Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?
Scientific research papers are generally public too, in that you can always reach out to the researcher and they'll provide the papers for free, it's just the "corporate" journals that need their profit off of other peoples work...
Yeah, by using the argument you just gave as an excuse to "launder" copyleft works in the training data into permissively-licensed output.
Including even a single copyleft work in the training data ought to force every output of the system to be copyleft. Or if it doesn't, then the alternative is that the output shouldn't be legal to use at all.
The point is the entire concept of AI training off people's work to make profit for others is wrong without the permission of and compensation for the creator regardless if it's corporate or open source.
If you can’t afford to pay the authors of the data required for your project to work, then that sucks for you, but doesn’t give you the right to take anything you want and violate copyright.
Making a data agnostic model and releasing the source is fine, but a released, trained model owes royalties to its training data.
True, Big Tech loves monopoly power. It's hard to see how there can be an AI monopoly without expanding intellectual property rights.
It would mean a nice windfall profit for intellectual property owners. I doubt they worry about open source or competition but only think as far as lobbying to be given free money. It's weird how many people here, who are probably not all rich, support giving extra money to owners, merely for owning things. That's how it goes when you grow up on Ayn Rand, I guess.
This is the hardest thing to explain to people. Just convert it into a person with unlimited memory.
Open AI is sending said person to view every piece of human work, learns and makes connections, then make art or reports based on what you tell/ask this person.
Sci-Hub is doing the same thing but you can ask it for a specific book and they will write it down word for word for you, an exact copy.
Both morally should be free to do so. But we have laws that say the sci-hub human is illegally selling the work of others. Whereas the open ai human has to be given so many specific instructions to reproduce a human work that it’s practically like handing it a book and it handing the book back to you.
Cue the Max Headroom episode where the blanks (disconnected people) are chased by the censors because the blanks steal cable so their children can watch the educational shows and learn to read, and they are forced to use clandestine printing presses to teach them.
what's this? an anti-corporate message that sneers at cable TV companies??? CANCEL THAT SHOW!!!
that show was so amazingly prescient: the theme of the first episode was how advertising literally kills its viewers and the news covers things up. No wonder they didn't get renewed. ;)
What really breaks the suspension of disbelief in this reality of ours is that fucking advertising is the most privacy invasive activity in the world. Seriously, even George Orwell would call bullshit on that.
The amount of advertisements you have to consume weather you consent or not is wild. Billboards on roads, bus banners, marquees, you have no choice unless you don't leave you house, and then you're still subject to ads, just ones you sort of consented to by buying TV or Internet service.
Road billboards are always a trip when I visit the US. Not only do they have everything on them from Jesus to abortion to guns they are also incredibly distracting physically, especially at night.
Agreed. I hate ads passionately. Ive been able to eliminate every source of ads from inside my house except websites, but I immediately back any site that won't do simple or reading view.
Every moment of my attention taken by some stupid billboard or hearing tvs at a gas station I had to stop at is a moment I could have been thinking about something better. Or nothing, which sometimes would be nice.
Make the AI folks use public domain training data or nothing and maybe we'll see the "life of the author + 75 years" bullshit get scaled back to something reasonable.
Exactly this. I can't believe how many comments I've read accusing the AI critics of holding back progress with regressive copyright ideas. No, the regressive ideas are already there, codified as law, holding the rest of us back. Holding AI companies accountable for their copyright violations will force them to either push to reform the copyright system completely, or to change their practices for the better (free software, free datasets, non-commercial uses, real non-profit orgs for the advancement of the technology). Either way we have a lot to gain by forcing them to improve the situation. Giving AI companies a free pass on the copyright system will waste what is probably the best opportunity we have ever had to improve the copyright system.
19 years. It wasn't life of the author either. It was 19 years after creation date plus an option to renew for another 19 at the end of that period. It was sensible. That's why we don't do it anymore.
AFAIK the individual researchers who get their work pirated and put on Sci-Hub don’t seem to particularly mind.
Why would they?
They don't get paid when people pay for articles.
Back before everyone left twitter, the easiest way to get a paywalled study was hit up to be of the authors, they can legally give a copy to anyone, and make no money from paywalls
Also, no researcher would even exist if grad students had to pay for the papers they read and cite. A lot of people is not fortunate enough to have access to these publications through their uni. Heck, even when I had it, I'd still go to sci-hub just for the sake of convenience.
Like a lot of services nowadays, they offer a mediocre service and still charge for it.
Not necessarily. They often do not own the copyright, so then it depends on fair use exceptions. The real owners have gone after authors, which may be the reason they don't make their articles downloadable by default.
Academics don't care because they don't get paid for them anyway. A lot of the time you have to pay to have your paper published. Then companies like Elsevier just sit back and make money.
I follow a few researchers with interesting youtube channels, and they often mention that if you ask them or their colleagues for a publication of theirs, chances are they'll be glad to send it to you.
A lot of them love sharing their work, and don't care at all for science journal paywalls.
Other than be happy for that attention and being curious of what extra things you can find in their field, they get quoted and that pushes their reputation a little higher. Locking up works heavily limits that, and the only reason behind that is a promise of a basic quality control when accepting works - and it's not ideal, there are many shady publications. Other than that it's cash from simple consumers, subscriptions money from institutes for works these company took a hold of and maybe don't have physical editions anymore just because, return to fig. 1, they depend on being published and quoted.
Don't mind? Hell, we want people to read that shit. We don't profit at all if it's paywalled, it hurts us and hurts science in general. This is 100% the wishes of scientific for profit journals.
I’m starting to think the term “piracy” is morally neutral. The act can be either positive or negative depending on the context. Unfortunately, the law does not seem to flow from morality, or even the consent of the supposed victims of this piracy.
The morals of piracy also depend on the economic system you're under. If you have UBI, the "support artists" argument is far less strong, because we're all paying taxes to support the UBI system that enables people to become skilled artists without worrying about starving or homelessness - as has already happened to a lesser degree before our welfare systems were kneecapped over the last 4 decades.
But that's just the art angle, a tonne of the early-stage (i.e. risky and expensive) scientific advancements had significant sums of government funding poured into them, yet corporations keep the rights to the inventions they derive from our government funded research. We're paying for a lot of this stuff, so maybe we should stop pretending that someone else 'owns' these abstract idea implementations and come up with a better system.
Yes it is, and that's the problem. I work my butt off to identify mechanisms to reduce musculoskeletal injury risk, and then to maintain my employment, I have to hand the rights to that work to a private organization that profits over it. To make matters worse, I then do the work to ensure the quality of other publications for the journal through the peer review process and am not compensated for it.
this is because the technocrats are allowed to steal from you, but when you steal from them what they've stolen from actual researchers that's a problem
There are no technocrats. Just oligarchs, that titan newer industries. Same as the old boss. Don't give them more credit than that. It's evil capitalism. Lump them with bankers, not UX designers imho
This is different. AI as a transformative tech is going to usher the US economy into the next boom of prosperity. The AI revolution will change the world and allow people to decide if they want to work for money or not (read UBI). In case you haven't caught on, am being sarcastic.
All this despite ChatGPT being a total complete joke.
Honestly couldn't tell if you were being sarcastic or not because Poes law until I saw your note.
If all the wealth created by these sorts of things didn't funnel up to the 0.01% then yeah. It could usher in economic changes that help bring about greater prosperity in the same way mechanical automation should have.
Unfortunately it's just going to be another vector for more wealth to be removed from your average American and transferred to a corporation
Oh OpenAI is just as illegal as SciHub. More so because they're making money off of stolen IP. It's just that the Oligarchs get to pick and choose. So of course they choose the arrangement that gives them more control over knowledge.
It's great how for most of us we're taught that just changing the order of words is still plagerism. For them they frequently end up using the exact same words as other things and people still argue it somehow is intelligent and somehow not plagerism.
OpenAI isn't really proven as legal. They claim it is, and it's very difficult to mount a challenge, but there definitely is an argument that they have no fair use protection - their "research" is in fact development of a commercial product.
Using it to train is a grey area, if you paid for the works. If you didn't, it's still illegal
What it does is output copyrighted works which is copyright infringement. That is the legal issue. It's very easy to prompt it into giving full copyright text they never even paid to look at, let alone give to other people.
"AI" can't even handle switching synonyms to make it technically different like a college kid cheating on an essay
Their argument is that the copying to their training database is "research". This would be a legal fair use of unauthorised copying. However, normally with research you make a prototype, and that prototype is distinctly different from the final commercial product. With LLM's the prototype is the finished commercial product, they keep adding to it, thus it isn't normal fair use.
When a court considers fair use, the first step is the type of use. The exemptions are education, research, news, comment, or criticism. Next, they consider the nature of the use, in particular whether it is commercial. Calling their copying "research" is a bit of a stretch - it's not like they're writing academic papers and making their data publicly available for review from other scientists - and their use is absolutely commercial. However, it needs to go before a judge to make the decision and it's very difficult for someone to show a cause of action, if only because all their copying is done secretly behind closed doors.
The output of the AI itself is a bit more difficult. The database ChatGPT runs off of does not include the whole works it learned from - it's in the training database where all the copying occurs. However, ChatGPT and other LLM's can sometimes still manage to reproduce the original works, and arguably this should be an offense. If a human being reads a book and then later writes a story that replicates significant parts of the book, then they would be guilty of plagiarism and copyright infringement, regardless of whether they genuinely believe they were coming up with original ideas.
That’s a pretty strong accusation. You seem to like to wade through people’s post history but to my cursory glance nothing would indicate this poster is a troll.
You understand AI posts frequently surface on this platform and people will engage with those posts even if they disagree with you?
Yeah, realistically what will happen is china will get far ahead in natural language computing which will benefit it's economy, everyone who demanded chat GPT be stopped because they're scared of change will demand the government do something to catch up and they'll write exemptions into the law.
More likely they'll realise this is the obvious way things will go and the only legislation will make it harder for open source and community run ai, but hopefully not significantly.
The next round of ai gen will start reaching consumer space soon. CAD, especially for electronics and structural design (E.g. creating the right amount of supports to hold a given load). They'll scare a few corporations into pushing for legislation but I think the utility will be far clearer.
Doubt it. GenAI requires a shitton of resources, both in storage and for processing. Training a GenAI requires clusters upon clusters of NPUs and/or GPUs, even more than crypto miners and 3D renderers. The full storage requirements are proportional to the amount of training data you give, so expect them at least to be dozens of gigabytes long.
I doubt AI companies do it "for science" (yeah, right) so if they're shit down by a court of law they'll just shut the thing down. They can upload the code somewhere, but without training data their engine is useless.
The IP system, which goes to great lengths to block things like open-access scientific publications, is borked borked borked borked borked.
If OpenAI and other generative AI projects are the means by which we finally break it so we can have culture and a public domain again, well, we had to nail Capone with tax evasion.
Yes, industrialists want to use AI [exactly they way they want to use every other idea -- plausible or not] to automate more of their industries so they can pay fewer people less money for more productivity. And this is a problem of which generative AI figures centrally, but it's not really all that new, and eventually we're going to have to force our society to recognize that it works for the public and not money. I don't think AI is going to break the system and lead us to communist revolution ( The owning class will tremble...! ) But eventually it will be 1789 all over again. Or we'll crush the fash and realize the only way we can get the fash to not come back is by restoring and extending FDR's new deal.
I am skeptical the latter can happen without piles of elite heads and rivers of politician blood.
We need to ban the publishing business from academic stuff. Have the Universities host a site that's free access. They can also better run the peer review system and the journals would also also no longer control what research sees the light of day even behind a paywall.
How would you publish if you're not a part of a major research institution? Los Alamos National Lab could host its own papers just fine, but what about small-time labs? I know of at least one person who doesn't even officially work in science but publishes original research they do in their free time.
The journal system still provides a service, even if they over-charge for access. The peer review system has value. Imagine if there was zero barrier to publish. As a reader, you'd have to wade through piles of trash to find decent science.
Where would you find it all? Currently we use journal aggregators, whose service also has value and costs money. Are you really going to go to every university's website looking for research relevant to your area? We could do that again, but with everyone responsibile for publishing their own work, well, who gets indexed with the aggregators?
The problem isn’t just publishing though, it’s academia as well. Scientists are incentivized to publish in “prestigious” closed access journals such as Nature. They are led to believe it’s better for their career than publishing in open access journals such as PLOS One. As such, groundbreaking papers often get paywalled. Universities then feel obligated to pay outrageous subscription fees to access them.
Yeah but we got all the dys without any of the topia. I was promised high quality prosthetics, neon blinkenlights, and the right to bear arms. We've got like 15% of the appropriate level of any of those.
Yeah, but did SciHub pay Nigerians a pittance to look at and read about child rape? Because- wait, I have no idea what I'm even arguing. Fuck OpenAI though.
OpenAI did those subhuman training of ChatGPT in Kenya, not Nigeria. And since the Kenyan govt is a western lapdog these days, nothing would ever come out of that.
Kind of a strawman, I'd like everything to be FOSS, and if we keep Capitalism (which we shouldn't), it should be HEAVILY regulated not the laissez-faire corporatocracy / oligarchy we have now.
I don't want any for-profit capitalists to have any control of AI. It should all be owned by the public and all productive gains from it taxed at 100%. But open source AI models, right on.
A website where you can download paywalled scientific literature. Most scientific literature is paywalled by publishers, and costs a real significant amount to read (like 30-50$ per article if you don't have a subscription).
Scihub basically just pirates it. And has been shut down several times. But as most scientific studies are already laid with public money, scihub isn't that unethical at all.
Lots of scientists will just send you their article if you email them. They don’t get the money when you pay to read it - often they pay to submit. Reviewing journal articles is a privilege and doesn’t get you paid. The prestige of a scientific article is from the number of times people have cited it. The only “harm” done is that the publisher doesn’t get to make 100% profit for doing nothing.
Journal publishing is mostly a way to extract money from universities. Elsevier and its ilk name whatever price they think a research university can afford.
A.I. doesn't violate copywrite laws. It is the data-mining done to train A.I. and the regurgitation of said data in the responses that ultimately violate these laws. A model trained on privately owned, properly licensed, or exclusively public works wouldn't be a problem.
Even then, I would argue that lack of attribution is a bigger problem than merely violating copywrite. A big part of the LLM mystique is in how it can spit out a few lines of Shakespeare without accreditation and convince its users that its some kind of master poet.
Copywrite law is stupid and broken. But plagarism is a problem in its own right, as it seeks to effectively sell people their own creative commons at an absurd markup.
A model trained on privately owned, properly licensed, or exclusively public works wouldn’t be a problem.
This is how we end up with only corpo owned AIs being allowed to exist imo, places like stock photo sites are the only ones with large enough repositories of images to train AI that they have all the legal rights to
The way I see it, either generative AI is legal, free for everyone to run locally, and the created works are public domain, OR, everyone pays $20/mo to massive faceless corpos for the rest of their lives to have the privilege of access to it because they're the only ones who own all (or have enough money to license) the IP needed to train them
Yes, because 1:1 duplication of copy written works violates copyright, but summaries of those works and relaying facts stated in those works is perfectly legal (by an ai or not).
If you mean by "perfectly legal" a fair use claim, then could you please explain how a commercial for-profit company using the works, sometimes echoing verbatim results, is infringing on the copyrights in a fair use manner?
I do not mean a fair use claim. To quote the copyright office "Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed" source
Facts and ideas cannot be copy written, so what I was specifically referring to is that if I or an AI read a paper about jellyfish being ocean creatures, then later talk about jellyfish being ocean creatures, there's no restrictions on that whatsoever as long as we don't reproduce the paper word by word.
Now, most of the time AI summarizes things or collects facts, and since those themselves cannot be protected by copyright it's perfectly legal. On the occasion when AI spits out copy written work then that's a gray area and liability if any will probably decided in the courts.