Reddit Inc. has signed a contract allowing a company to train its artificial intelligence models on the social media platform’s content, according to people familiar with the matter, as it nears the potential launch of its long-awaited initial public offering.
There's so much actually great content posted across reddit over the years, it blows my mind that people decided that was something that needed to be mentioned all the time.
Why wouldn't they enhance it themselves, like Twitter has been doing for months? Once they make signing in mandatory and implement per-user rate limits the information will disappear from the internet and will only be available to people who are paying in some way.
I'd be very surprised if comments weren't versioned in some way, so even if you delete or rewrite that data, it's probably still there and a part of training data.
They said years ago that they only kept one previous version, which is why everyone overwrote and then deleted their stuff.
It's possible that reddit changed that, but honestly? That requires a level of foresight that I believe is entirely beyond spez. He didn't foresee AI products, he literally paid all the bandwidth for them to harvest the data, he didn't foresee changes to API pricing, he didn't foresee the protests, how long they'd last, or how many people just walked away.
Hell, in the previous big "closed subs" protest they'd never even considered a moderator rebellion: once the mods took the subs private, the admins were accidentally locked out as well - they had to negotiate to get them re-opened while they worked on backdoor changes that wouldn't break reddit.
I just don't see them having the foresight to add in preservation code, nor to allocate the database and storage space to keep up with it. I think if you overwrote and then deleted your stuff, reddit doesn't have it anymore. Of course, it's still out there, in Google's cache and the internet archive and all the other snapshots she preservation schemes and the data already harvested for the various AIs, but at least it's no longer indeed reddit's control, and they won't be able to profit from it.
That's what I just did with my account of 10 years. I had all comments overwritten with gibberish and purged them a few days later. I'll send them a final DSGVO request and delete it afterwards.
Done it a few months ago but then again if I was working at reddit and in charge of preparing the dataset to feed to the llm, I'd give it access to both a recent one and a snapshot from before July 2023 (or whenever shit hit the fan and we all came to lemmy), most edits would have been made in protest. And AI can figure out which ones by itself
FYI: reddit orphans content. In other words your posts/comments are undeletable.
I found instances of such late last year by way of search results. I clicked a username to see more posts by that account. The only content on their profile page was a final deletion message about the API changes.
Their post history was discoverable by using " site:reddit.com" on Google. All of their posts/comments still show up under their username instead of the normal [deleted]. Clicking the username takes you to their empty profile page.
So what we know from this now is that reddit has been saving original submissions. Whereas before their claim was that only the last edits are stored. Which is why the deletion scipts became a thing. People took it on good faith that we could delete our posts. At some point they stopped doing that. Or perhaps it was all a lie the whole time. Who knows.
This is how I cleaned (most) of my old posts: Searched them via Google. As they’re posted under my username I was able to change them into nonsense before deleting then. Even though they never appeared under my profile anymore.
If you're talking about Glaze or Nightshade, those techniques are not proven to be particularly effective. Lots of people want them to work but that doesn't make it so.
This is going to produce the saltiest AI the world has ever seen.
"Hey reddit ai , give me an idea how to balance my budget and pay my student debt, mate". "here ya go, I got a noose for you. Also I'm not your mate, dude".
" Hey reddit ai, draw for me a house with a genz family". Here ya go. " Hey reddit ai, why did you show me a pic of a highway ramp with homeless people?"
It's a power grab. They'll justify control over the training data with intellectual property which keeps it out of the hands of everyday people but they stole the "intellectual property" from us in the first place. Then they'll control the "means of generation".
I'm just wondering what the hell they're expecting to teach this AI off Reddit user data. Reddit has been scraped for years and most of what could have been learned has already been used I'd assume.
Reddit has been more and more video based lately and it feels to me like it is becoming a less algorithmic version of TikTok.
Spez has spent his entire time as CEO chasing the latest tech shiny - reddit crypto, reddit NFTs, reddit video a la TikTok. And he's always managed to do it after the craze has started to peak. So now he's chasing the latest shiny, reddit AI, starting a full year after everyone else already released their products.
He's late yet again, and he's proven repeatedly that's he's failed to understand reddit's greatest strengths and value. This "reddit AI content" and the IPO is his last chance to get some value out of reddit, and his last chance to make money for nothing because no one is going to hire him in a leadership role ever again. I just wonder if he's smart enough to understand that, or whether he's just hoping to get enough money to fully build out and stock his personal doomsday bunker.
To have better bots so they can advertise via posts and make it seem like a human recommended it. That's why a lot of people started using Reddit in the first place. Myself included. Only time I use it now it's when a search result takes me there. Can't remember the last time it was a useful result though.