Silero is an open source Text To Speech large language model that can run on your own hardware. It actually does quite well running on just a CPU, even an older one if you have the RAM to load the model in the first place. The documentation kinda sucks for doing more advanced stuff and there are some issues with generating extremely long stuff like reading books.
Silero is a Russian company. They do both TTS and STT, although STT requires advanced fine tuning and there are newer and more advanced options from others.
Hugging Face is like the GitHub of offline open source AI tools. They host example instances of many models. Unless you make an account with HF, you can only use them for something like 2 queries per day anonymously. This is an instance of Silero TTS with Russian speakers setup:
Hosted a Russian exchange student, been casually learning for a long time before that, and what I've learned about speaking Russian is that it's a language that isn't deeply concerned about pronouncing every syllable clearly. That surprised me, because the way Russian accents and speakers were always portrayed on TV was with this heavy, lumbering, formidable sounding language. Go cold war, huh? Seems to me that you really have to worry about getting the beginning and end right and just kind of glide over the stuff in the middle. It's something we had to teach our exchange student to stop doing when he spoke English; when he would talk in longer sentences, he'd go fast and mumble, and I've more or less figured that that's what you do in Russian.
So, I don't think it's Guh-Dyeh as much as it's [start with tongue in the hard G position]-Dyeh.