It's a fundamental misunderstanding of how you automatically have copyright on any written work you produce, and how it's unclear whether any sort of licensing even applies to training data in the US.
Do I have to register with your office to be protected?
No. In general, registration is voluntary. Copyright exists from the moment the work is created. You will have to register, however, if you wish to bring a lawsuit for infringement of a U.S. work. See Circular 1, Copyright Basics, section “Copyright Registration.”
Yes, so you have copyright when you make the work. I have copyright on this comment just for having written it. Pasting a CC notice would give me less control over the use of this comment, not more. Regardless, I doubt anyone is planning on suing a multi-billion dollar business over their comments on social media being used as training data.
Yes. However whether or not it has protections under copyright is not always clear. Likely your comment is too short and simple to be protected. But if it can't be protected claiming to grant a license to that work doesn't change it.
Basically by adding this note they are effectively granting a license to the work. There is no situation in which granting a license can restrict how a work (which is effectively maximum protection).
Why wouldn't it be? It's just as much a textual medium as a PDF, or a book, for that matter. Hell, any file on a computer can be read as characters. I could type Homer's Odyssey in a series of comments, or the source code to DOOM, or the color values of every pixel of every frame of a video I took of my friend chasing a duck.
Because certain types of text aren't actually copyrightable. You can't copyright a fact, for one.
Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed.
For some of those (ideas, systems, or methods of operation) you need a patent. Even with the copyright clause in a comment, it might not be valid. At least concerning US laws.
What about when people paste copypastas, share memes, or reference age old arguments? The very culture of internet comments seems to be opposed to copyright.
Pasting a copypasta is probably actually copyright infringement. Same with memes.
The thing about copyright is that it really only matters if you choose to enforce your protection. Presumably the owners of the copypasta don't care enough and the owners of the memes think it brings more popularity to the movie than any licensing costs they could possibly gain from selling the stills.
(Some memes may be considered transformative enough to be fair use, but some of them almost certainly are not.)
Video game streaming is a clear example of this. Almost certainly live-streaming or doing full gameplay videos are infringing the game owner's copyright. The work is often commercial, is often a replacement for the original (at least for some people) and very rarely transformative. But most game publishers think that it is worth it for the advertising. So they don't enforce their copyright. Many publishers will explicitly grant licenses for streaming their games. A few publishers will enforce their copyright and take down videos, they are likely well within their rights.
I don't know if I would say the internet is opposed to copyright. I think there is a lot of misunderstanding and a lot of not caring. If the average internet commenter posts a meme it is of such minuscule cost to the owner of that work that it doesn't make sense to go after them. So it sort of just happens. This makes people think that it is allowed, even if it probably isn't. Most people would probably also agree that this is morally ok. But I don't think that means that they are against copyright in general. I think if you asked most people. "Should I be allowed to download a CGP Grey video and reupload it for my own profit" they would say no. Probably similar for "Should I be allowed to sell cracked copies of Celeste for half price".
As long as you aren't committing copyright infringement by using a meme you don't have the rights to, and otherwise meeting the standards of having a "modicum of creativity," I don't see why you wouldn't have copyright on it. That being said, there are few goals more futile than that of trying to remove something off the internet for copyright infringement.
That's why DMCA exists. For the most part, the Internet is full of copyright infringement that is simply never acted upon. DMCA shit makes it so the posters and the host aren't liable for infringement, so long as they comply with official take downs within X period of time (with a chance to appeal).
It’s a fundamental misunderstanding of how you automatically have copyright on any written work you produce, and how it’s unclear whether any sort of licensing even applies to training data in the US.
For what its worth, I do understand copyright, and how it works. Part of my including the link is for futures sake, as I know that right now as we speak type Congress is getting lobbied for new laws on who owns the content that AI models are being trained from, and who has to pay who for the privledge of using that data to do so.
Congress is getting lobbied for new laws on who owns the content that AI models are being trained from
Training AI from something definitely can't change who owns that thing. This is ridiculous and I'm pretty sure isn't being considered.
If I let AI watch Frozen does that change who owns it? No Disney still does.
who has to pay who for the privledge of using that data
IIUC most of the laws talk about if AI training is "fair use". If it is fair use copyright protections don't apply. But granting a license to your work won't change that.
The only thing I could see potentially being done would be changing the default copyright protections to allowed a revocable default grant for AI training. But it isn't even clear if granting a new license would implicitly revoke that default grant. It also seems unlikely that this is the way the law would work.