Key word is "some". And it's not someone, it's a whole group of people. I was part of it.
There's still a lot missing. It was only because when geocities was going to get deleted, a large group of web scrapers pulled together to grab as much as possible.
Then, we manually combined all of them to try to piece together this internet history.
Things not crawled are gone. Things geocities have already been deleted before the announcement is gone.
The worst part is Yahoo gave the heads up. Most hosting companies don't. And many delete behind the scenes, like Photobucket if you don't log in for a long time. Or forums when they stop paying the bills.
I found everything I recalled posting to the page and no broken links, but my Geocities page was pretty flat, about two levels - index.html and files linked from there
I think the "under construction" and web ring links were broken