Analysis shows that indiscriminately training generative artificial intelligence on real and generated content, usually done by scraping data from the Internet, can lead to a collapse in the ability of the models to generate diverse high-quality output.
This has been obvious for a while to those of us using GitHub Copilot for programming. Start a function, and then just keep hitting tab to let it autotype based on what it already wrote. It quickly devolves into strange and random bullshit. You gotta babysit it.