That's a good point. The same content exists on multiple instances.
I think Lemmy should set a canonical URL the HTML <head>. The canonical URL of each post should point to the instance where a post originates from.
Seems like that is not implemented in Lemmy. Also checked Mastodon, and doesn't have a canonical tag either.
You can search posts on lemmy using Google already. They are indexed as separate sites, so you may have to use "site:lemmy.ml" or "site:beehaw.org" in order to find a post. I do wonder if major search engines will try to handle federation more comprehensively in the future, though.
Yes, actually it's already getting indexed. For example you can try searching for site:lemmy.ml on DDG or Google. Although it'll probably take a while before search engines will deem lemmy instances "popular enough" for posts to show up for regular search queries (assuming that'll even happen at all).
Yes, lemmy posts can be indexed and found, but there are disadvantages compared to big, centralized services. I just found some posts on ecosia page 3.
I'm not sure if posts from instances without 'lemmy' in their name would show up when somebody searches for "something lemmy".
Legitimate search engines will index everything, except what's disallowed. Of course, the robots.txt could be changed to block all indexing by legitimate search engines.