AI's reliance on synthetic data can create "intersectional hallucinations," leading to unrealistic and potentially dangerous outcomes. These errors highlight the risks of AI misinterpreting complex human data relationships.
However, training these AI models needs a LOT of data, so much that some of it has to be synthetic – not real data from real people, but data that reproduces existing patterns. Most synthetic datasets are themselves generated by Machine Learning AI.
Maybe having a blind man teach another blind man how to see based on how he imagines seeing works is a recipe for disaster...
Turns out analogies are not the actual thing they're analogizing, though. Synthetic data - when properly created and curated - has proven to be very useful and effective in training AI.
Except when it's not, or else this article would have no point.
ETA: also, it seems like a terrible idea to train science models on data you essentially invented. The reason science works is because it follows the extant evidence, not the other way around.