I accidentally got Dall-E to be racist. I'm on a forum where we do endless AI Godzilla pictures (don't ask) and I did "Godzilla crosses the border illegally" hoping for some sort of spy thing. Instead, I got Godzilla in a sombrero by a Trump-like border wall.
I've been encountering this! I thought it was the topics I was using as prompts somehow being bad -- it was making some of my podcast sketches look stupidly racist, admittedly though some of them it seemed to style after some not-so-savoury podcasters, which made things worse.
"My sketches" as in "me using the AI software to draw pictures". It's not my podcast, I was trying to guess at what the presenters looked like based off the topics they discuss.
Nah, AI has just gotten really good at hands. They can still be a bit wonky, but I've seen images, hands and everything, that'd probably fool anyone who doesn't go over the picture with a magnifying glass.
That explanation makes no fucking sense and makes them look like they know fuck all about AI training.
The output keywords have nothing to do with the training data. If the model in use has fuck all BME training data, it will struggle to draw a BME regardless of what key words are used.
And any AI person training their algorithms on AI generated data is liable to get fired. That is a big no-no. Not only does it not provide any new information from the data, it also amplifies the mistakes made by the AI.
They are not talking about the training process, to combat racial bias on the training process, they insert words on the prompt, like for example "racially ambiguous". For some reason, this time the AI weighted the inserted promt too much that it made Homer from the Caribbean.
They literally say they do this "to combat the racial bias in its training data"
to combat racial bias on the training process, they insert words on the prompt, like for example “racially ambiguous”.
And like I said, this makes no fucking sense.
If your training processes, specifically your training data, has biases, inserting key words does not fix that issue. It literally does nothing to actually combat it. It might hide issues if the data model has sufficient training to do the job with the inserted key words, but that is not a fix, nor combating the issue. It is a cheap hack that does not address the underlying training issues.
There are 2 problems with not having enough diversity in training data:
The AI will be worse at depicting diversity when prompted, eg. If the AI hasn't seen enough pictures of black people it may not be able to depict black hair properly as it doesn't "know what it looks like"
The AI will not show as much diversity when not prompted. The AI is working off statistics so if you tell it to depict a person and most of the people it's "seen" are white men it will almost always depict a white man because that's statistically what a person is according to its data.
This method combats the second problem, but not the first. The first can mostly be solved by generally scaling the training data though, which is mostly what these companies have been doing. Even if only 1% of your images are of POC, if you have 1b images 10mil will be of POC which may be enough to train it. The second problem would remain unsolved though since the AI will always go with the statistically safe 99%.
any AI person training their algorithms on AI generated data is liable to get fired
though this isn't pertinent to the post in question, training AI (and by AI I presume you mean neural networks, since there's a fairly important distinction) on AI-generated data is absolutely a part of machine learning.
some of the most famous neural networks out there are trained on data that they've generated themselves -> e.g., AlphaGo Zero
They could try to compensate the imbalance by explicitly asking for the lesser represented classes in the data... It's an idea, not quite bad but not quite good either because of the problems you mentioned.