Thanks to Samantha Cole at 404 Media, we are now aware that Automattic plans to sell user data from Tumblr and (which is the host for my blog) for “AI” products. In respon…
They understand consent just fine, it's just that it's well within their best interests to pretend they do not until the user base is motivated to manifest financial or legal reprocussions. Years of Facebook abuse has shown that anything levied will be "cost of doing business" at best though.
The worst part is, the data has been scraped already, regardless of any opt-out, and there is no explicit confirmation that it hasn't been shared with midjourney already. The wording on the staff post is... Very Vague about that.
Because if that's the case, no amount of opting-out will change anything. Tumblr says they'll notify Midjourney if some data is now opted-out, but come on, I have absolutely no reason to believe that Midjourney will do anything about it. They don't care, they already have the data.
So I guess there are two paths of training data. Some company selling it explicitly, and the companies just scraping accessible data. Not that either is "good", but at least with public data, you only have the AI company profiting.
Yep. That's why the two things I say Automattic MUST do to make things right are about proper consent controls for Automattic's use of data and sale to AI vendors, but the third thing is a proposed proactive defense against scrapers.