Because of AI bots ignoring robots.txt (especially when you don't explicitly mention their user-agent and rather use a * wildcard) more and more people are implementing exactly that and I wouldn't be surprised if that is what triggered the need to implement robots.txt support for FediDB.
It is not possible to detect bots. Attempting to do so will invariably lead to false positives denying access to your content to what is usually the most at-risk & marginalized folks
Just implement a cache and forget about it. If read only content is causing you too much load, you're doing something terribly wrong.
While I agree with you, the quantity of robots has greatly increased of late. While still not as numerous as users, they are hitting every link and wrecking your caches by not focusing on hotspots like humans do.
This looks more accurate than fedidb TBH. The initial serge from reddit back in 2023. The slow fall of active members. I personally think the reason the number of users drops so much is because certain instances turn off the ability for outside crawlers to get their user info.
No idea honestly. If anyone knows, let us know!
I dont think its necessarily a bad thing, If their crawler was being too aggressive, then it can accidentally DDOS smaller servers. Im hoping that is what they are doing and respecting the robot.txt that some sites have.
Gotosocial has a setting in development that is designed to baffle bots that don't respect robots.txt. FediDB didn't know about that feature and thought gotosocial was trying to inflate their stats.
In the arguments that went back and forth between the devs of the apps involved, it turns out that FediDB was ignoring robots.txt. ie, it was badly behaved
Maybe the definition of the term "crawler" has changed but crawling used to mean downloading a web page, parsing the links and then downloading all those links, parsing those pages, etc etc until the whole site has been downloaded. If there were links going to other sites found in that corpus then the same process repeats for those. Obviously this could cause heavy load, hence robots.txt.
Fedidb isn't doing anything like that so I'm a bit bemused by this whole thing.