that would require an LLM then, but also multiple full walkthroughs are explained in text on the internet, so how would you be sure it was figuring stuff out by itself?
As I said: I don't think an LLM could do it (since LLMs can't reason). Just saying that it wouldn't have to deduce the mechanics from a single screenshot.
I'm saying that if you're attempting to parse the mechanics of play by shoving in the whole internet and saying "well the instructions are in there somewhere" then the best tool for that is an LLM.