How does having multiple lemmy servers spread the load?
Basically, I have read several statements addressing this topic. For example:
"If my server gets too big I will just close registrations"
"Server X got too big, so they closed registrations to manage the load"
While I do understand that this can help for small servers which don't have a big number of external users. How does this help with big and popular servers? Don't they have to serve requests from external users using their resources? For example, I might self host a server just for my account but I read all my content from lemmy.world. Am I not using their bandwidth and their resources anyway?
Bonus question: Does federating with other servers increase the resource usage of my server? What kind of metadata/data do I have to store from each server I federate with?
How I understand it is that database/io calls are heavy and network calls are relatively light. A user on the instance itself equals Database/io and a federated server means just 1 database call and a bunch of network calls. Since it's a push model the instance only has to retrieve the data from the database itself once and then just pushes it to all subscribed instances.
Up until early July, Lemmy was damned if you do, damned if you don't. Federation had massive performance overhead due to some bugs and each additional instance that went online and subscribed to the big 4 popular servers was causing an even worse load problem than if say 30 users had joined directly. Especially instances that wanted a fully populated All listing, that meant every single thing was being sent to the server even if nobody was really reading that stuff.
And things like searching for topic content are going to be pretty limited given these newer servers don't have much history.
The aftermath of this attempt to scale is that there is also likely a lot of duplicate data, conversations that are mostly repetitive and posts to the same topics. Let alone the bugs Lemmy has federating deletes and moderation removal that doesn't impact direct users on the main servers as much.
It reduces the load simply because your instance handles most of the traffic, particularly compute/database. Currently media still goes to the instance of the poster, but there's talks of also proxying and caching those locally, and CDNs like CloudFront/Cloudflare are a thing that can help a lot with that.
So lets say we have server A and B, both with a thousand users on them, totalling 2000 users. For the most part, A and B only have to handle their local thousand users, plus some extra traffic between them for federation. And assuming the users uses communities of both instances roughly equally, it also means that the load of hosting pictures is also spread out between the two instances.
Federating with other servers does add some load (and on theirs as well), because your instance is effectively ingesting all the remote communities' data that your users have subscribed to. But ingesting that once is still much less demanding than thousands of users all requesting the same data. Your instance acts as a cache layer.
ActivityPub is also a push model. Remote instances push content to your instance, you don't pull from them.
This means that if user 1 from server A requests a post from server B, server A will cache that post. Then, if User 2 from Server A wants to see the same post they get the cached version instead of the remote instance pushing it to server A? Is this cache eternal (i.e it is never deleted from Server A) or is that something the spec doesn't address and it is up to each server owner?
It works a little differently to that. When someone posts on server B, that post and it's comments get blasted out to all subscribed servers. So server A will already have the post cached if someone is subscribed to that community. The cache in server A will update any time activity happens on server B.
It's eternal yes, unless the admin manually purges it.
I also said cache for the sake of simplicity, it's technically not a cache. Every instance gets activity pushed to them pretty much in realtime, and stores a copy of everything. Posts, comments, votes, even moderation actions. So it's more like a massively distributed multi-primary eventually-maybe-consistent database than a cache.
Apart from the initial preview that fetches the last 20 posts and no comments, everything is populated purely through ActivityPub messages being pushed to every subscribed instance, in mostly realtime.
So user 1&2 never request A to go get a post from B. They simply request a post that's already on A that's a copy that's been pushed by B and may have been published by C. B is only involved if a user from A comments on the post, then A will push that comment to B which will then push it to C and D and others.
So 10,000 users viewing a post on A is entirely handled by A, and 20,000 users on C viewing the same post is entirely handled by C. B could have zero users and it would still work perfectly. Similarly, A could have zero communities and rely entirely on B to manage the communities. B would have very little work to do despite having a total of 30,000 users viewing its posts. In fact, B could even go down and A and C would still serve the post and even take comments and votes, they just will be synchronized back when B comes back up and A&C would temporarily have a slightly different view of the same post.
So the more instances, the more distributed everything is. And that's why instances that becomes too large can simply shut down registrations or even kick its users out. It could become B in this example.
Serve browse traffic: This is what you're familiar with, when you view your post feed or a single post, the server has to fetch those posts or comments from its database and send them to you. The resources required to do so depend on the total number of browse requests the server handles... roughly num_users * num_feed_refreshes_or_post_views_per_user_per_minute. If a server has a lot of users that view a lot of stuff, splitting some of them off to a second server (or just stopping signups) will help.
Federated replication: This is what copies posts and comments from the server that hosts the community to the server that hosts your account, and what enables your account server to bear the browse load for communities hosted on this server. The resources required to do this work are roughly proportional to the total number of federation messages sent, or number_of_federated_peer_serverd * number_of_subscribed_communities_per_server * number_of_posts_comments_votes_edits_etc_per_community.
What you may see here is that federation replication workload scales with the number of instances in the threadiverse and browse workload scales with the number of users per instance. This leads to a goldilocks problem. Ideally, you want a medium number of servers that each have a medium number of subscribers. Obviously no real world network scales in this ideal way, but some guidelines emerge:
Single user instances are probably only a net win if the user is very active. If you read every post your instance subscribes to then maybe your browse load is bigger than your instance's federation load... but if you log in once a month and view 1% of the posts replicated to your instance... it's still generating federation workload while you're asleep for posts you'll never read.
Single-user instances using scripts that mass subscribe to thousands of communities, while they make your all feed lively... make you a pretty terrible fediverse citizen. Your instance is now generating the federation load of a 5k user instance to copy posts and comments you'll never read. BTW, your instance publicly serves copies of all the posts you subscribe to. So if one of these scripts subs porn, piracy, or hate speech communities on poorly admin'ed instances, it may be creating legal liability for you depending on your jurisdiction. Also, federated replication is pretty broky right now: https://github.com/LemmyNet/lemmy/issues/3101 (this recently got marked resolved but I continue to see replication issues daily and I expect similar but perhaps more targeted follow ups.to be filed soon)
Having an account on a Very Big Instance like lemmy.world or lemmy.ml is a bit of a personal risk. Those instances will always find the limits of both browse and federation scaling first because they have lots of active users and also lots of active communities that are widely subscribed by other instances. This will make them a bit unreliable as they're at the tip of the efforts to fix scaling constraints.
For example, I might self host a server just for my account but I read all my content from lemmy.world. Am I not using their bandwidth and their resources anyway?
Well, it'd use your CPU to generate the webpages that you view. But, yeah, it'd need to transfer anything that you subscribe to to your system via federation (though the federation stuff may be "lower priority" -- I don't know how lemmy and kbin deal with transferring data to federated servers rather than requests from users directly browsing them at the moment, but at least in theory, serving the user browsing directly has to have a higher priority to be usable).
But what would be more ideal -- and people are going to have to find out what the scaling issues are with hard measurements, but this is probably a pretty reasonable guess -- is to have a number of instances, with multiple users on each. Then, once lemmy.world transfers a given post or comment once via federation, that other instance stores it and can serve up the webpages and content to all of the users registered on that other instance.
If you spread out the communities, too, then it also spreads out the bandwidth required to propagate each post.
As it stands, at least on kbin (and I assume lemmy), images don't transfer via federation, though, so they're an exception -- if you're attaching a bunch of images to your comments, only one instance is serving them. My guess is that that may wind up producing scaling problems too, and I am not at all sure that all lemmy or kbin servers are going to be able to do image-hosting, at least in this fashion.
Probably the same way a hybrid gas-electric car is more fuel efficient. In a hybrid, the battery revs up and down with need while the engine just powers the battery at a steady clip. Since the engine can run constantly at a fix optimum speed, it is more fuel efficient.
Likewise, I figure, each server has a certain amount of bandwidth. If everyone is on one server, all the posts and comments come at random intervals with spikes and troughs. Either the bandwidth gets throttled, which causes lag, or all the comments go through at the same time, which uses a lot of bandwidth. With multiple servers, those posts get federated and (probably, I'm guessing at this point) wait for the federated server to signal that they are no longer busy, which flattens the bandwidth demand.