One of the side effects of the reddit meltdown is that many search results were unavailable because of communities going private. It would be great if we could fill in the void with lemmy content instead.
I heard that reddit has a dedicated cdn each for Microsoft and Google scraping. That's why they work so well to search reddit posts. It will probably take some effort to feed data so we'll from the fediverse.
On that note, perhaps we should have some per-community as well as per-post scrape/noscrape toggle. Might be difficult to get buy-in from all parties.
Whether a community gets to opt out of being scraped depends on the scraper respecting robots.txt and/or the meta tag of the page.
Not all do, particularly the ones scraping for SEO purposes, so instances might to add IP bans for scrapers that refuse to respect restrictions in those places.
I've just tried a quick test using some popular queries and it looks as though communities are indexed but individual posts aren't? I agree, it would be nice to replace Reddit in this regard.
Maybe the above is only a temporary measure to help maintain server load?
Some google searches already give me Lemmy posts, so it seems to work. I think indexing Lemmy posts takes more time, as I couldn't find my 'blog article' about hosting Lemmy on a Raspberry Pi or the community where it was posted yet trough Google yet. But I was able to find older communities on Feddit.nl, So most of the posts probably can't be found yet, as they simply are too new.