That said, I can't say that I mind LLMs using copyrighted materials that it accesses legally/appropriately (lots of copyrighted content may be freely available to some extent, like news articles or song lyrics)
I'm open to arguments correcting me. I'd prefer to have another reason to be against this technology, not arguing on the side of frauds like Sam Altman. Here's my take:
All content created by humans follows consumption of other content. If I read lots of Vonnegut, I should be able to churn out prose that roughly (or precisely) includes his idiosyncrasies as a writer. We read more than one author; we read dozens or hundreds over our lifetimes. Likewise musicians, film directors, etc etc.
If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?
If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?
That is the trillion-dollar question, isn’t it?
I’ve got two thoughts to frame the question, but I won’t give an answer.
Laws are just social constructs, to help people get along with each other. They’re not supposed to be grand universal moral frameworks, or coherent/consistent philosophies. They’re always full of contradictions. So… does it even matter if it’s “meaningfully” different or not, if it’s socially useful to treat it as different (or not)?
We’ve seen with digital locks, gig work, algorithmic market manipulation, and playing either side of Section 230 when convenient… that the ethos of big tech is pretty much “define what’s illegal, so I can colonize the precise border of illegality, to a fractal level of granularity”. I’m not super stoked to come with an objective quantitative framework for them to follow, cuz I know they’ll just flow around it like water and continue to find ways to do antisocial shit in ways that technically follow the rules.
Yup. Violating IP licenses is a great reason to prevent it. According to current law, if they get Alice license for the book they should be able to use it how they want. I'm not permitted to pirate a book just because I only intend to read it and then give it back. AI shouldn't be able to either if people can't.
Beyond that, we need to accept that might need to come up with new rules for new technology. There's a lot of people, notably artists, who object to art they put on their website being used for training. Under current law if you make it publicly available, people can download it and use it on their computer as long as they don't distribute it. That current law allows something we don't want doesn't mean we need to find a way to interpret current law as not allowing it, it just means we need new laws that say "fair use for people is not the same as fair use for AI training".
and learns how to copy its various characteristics
Because you are a human. Not an immortal corporation.
I am tired of people trying to have iNtElLeCtUaL dIsCuSsIoN about/with entities that would feed you feet first into a wood chipper if it thought it could profit from it.
You can sue for anything in the USA. But it is pretty much impossible to successfully sue for "ripping off someone's style". Where do you even begin to define a writing style?