Skip Navigation

Lemmy PERFORMANCE CRISIS: popular instances with many remote instances following big communities could stop federating outbound for Votes on comments/posts - code path identified

I spent several hours tracing in production (updating the code a dozen times with extra logging) to identify the actual path the lemmy_server code uses for outbound federation of votes to subscribed servers.

Major popular servers, Beehaw, Leemy.world, Lemmy.ml - have a large number of instance servers subscribing to their communities to get copies of every post/comment. Comment votes/likes are the most common activity, and it is proposed that during the PERFORMANCE CRISIS that outbound vote/like sharing be turned off by these overwhelmed servers.

pull request for draft:

https://github.com/LemmyNet/lemmy/compare/main...RocketDerp:lemmy_comment_votes_nofed1:no_federation_of_votes_outbound0

EDIT: LEMMY_SKIP_FEDERATE_VOTES environment variable

15
15 comments
  • Thanks for doing all this.

    Do we have any real numbers from a real server? How many votes are trying to be federated to how many servers?

    Just ballparking some approximate numbers:

    • [email protected]
    • 15k subscribers
    • 4000 subscribed servers
    • 10 votes per subscriber per day

    15000 * 4000 * 10 = 600,000,000 federated actions. That is around 7,000 per second 24/7 for one community.

    IMO, this real time federation just doesn't scale. We need to start planning the specs for federation batching.

    • I'm hoping the 'subscribed servers' is maybe only 300 or so? But I don't know, the big sites haven't been sharing information like that in my experience. They did say there were "millions" of outbound federation tasks. I expect the number of votes by user is higher than your number. They did put in code changes to detect servers they can't reach and to stop attempting delivery.

      We need to start planning the specs for federation batching.

      I think a pull app that goes around to servers with content and uses the front-end API to grab 300 or more comments at a time, etc is the way to go. The client API is geared toward batch delivery. since lemmy.ml is so unstable for discussion, I opened a topic on GitHub: https://github.com/RocketDerp/lemmy_helper/discussions/4 - where I proposed some new /api/syncshare to get more raw data out of the PostgreSQL tables.

  • Part of what makes Lemmy (and other voting link aggregators) work is the voting aspect. By taking away outbound vote federation, it forces further consolidation into these popular instances. Thereby further exacerbate the problem because now they’re even more consolidated and the posts and comments eventually becomes the bottleneck for the exact same underlying chatty protocol. For this reason, I’d be vehemently against this change without a pairing PR that allows this information to be requested via a batch pull that the protocol makes available.

  • prototype pull

    pull request for prototype: https://github.com/LemmyNet/lemmy/pull/3475

    The environment variable LEMMY_SKIP_FEDERATE_VOTES is as good a way as any to reference this code hack.

  • Somewhat related, but why are we federating votes? Why not just federate the upvote count and downvote count? Does each server need to track the identity of every voter on a subscribed community?

    Each server will track votes from their own users, preventing duplicate votes.

    • Why not just federate the upvote count and downvote count?

      I think the answer to that is that it isn't an optimized design.

      Does each server need to track the identity of every voter on a subscribed community?

      I think so. Which isn't a terrible assumption that user who votes will eventually comment/post and that profile will be of use.

  • I have no idea why the file is named "mod.rs", as normal non-admin non-moderator users seem to go through this code path.

      • ok, I figured out how to get Rust to match the enum, is there a way to do this with match instead of if statements?

        +    if let AnnouncableActivities::UndoVote(_) = activity {
        +      warn!("zebratrace310 SKIP UndoVote");
        +    } else if let AnnouncableActivities::Vote(_) = activity {
        +      warn!("zebratrace310A SKIP Vote");
        +    } else {
        +      warn!("zebratrace311 send");
        +      AnnounceActivity::send(activity.clone().try_into()?, community, context).await?;
        +    };
        

        Code seems to work great, blocks UndoVote/Vote but does the send on comment reply.

  • I hacked in a bunch of extra logging to the latest Lemmy code to follow in production how votes flow from local to outbound. I tagged each log with "zebratrace" and these locations proved to be the key points to watch in the system log:

    diff --git a/crates/api/src/comment/like.rs b/crates/api/src/comment/like.rs
    index 6c4bdebc7..063249a7e 100644
    --- a/crates/api/src/comment/like.rs
    +++ b/crates/api/src/comment/like.rs
    @@ -17,6 +17,7 @@ use lemmy_db_schema::{
     };
     use lemmy_db_views::structs::{CommentView, LocalUserView};
     use lemmy_utils::error::LemmyError;
    +use tracing::warn;
     
     #[async_trait::async_trait(?Send)]
     impl Perform for CreateCommentLike {
    @@ -25,6 +26,7 @@ impl Perform for CreateCommentLike {
       #[tracing::instrument(skip(context))]
       async fn perform(&self, context: &Data<LemmyContext>) -> Result<CommentResponse, LemmyError> {
         let data: &CreateCommentLike = self;
    +    // Here we are with that database read on every single like for site data.
         let local_site = LocalSite::read(context.pool()).await?;
         let local_user_view = local_user_view_from_jwt(&data.auth, context).await?;
     
    @@ -62,16 +64,20 @@ impl Perform for CreateCommentLike {
         // Remove any likes first
         let person_id = local_user_view.person.id;
     
    +    // does database transaction, despite next one doing update condition?
         CommentLike::remove(context.pool(), person_id, comment_id).await?;
     
         // Only add the like if the score isnt 0
         let do_add = like_form.score != 0 && (like_form.score == 1 || like_form.score == -1);
         if do_add {
    +      // this does database transaction. it await for result
           CommentLike::like(context.pool(), &like_form)
             .await
             .map_err(|e| LemmyError::from_error_message(e, "couldnt_like_comment"))?;
         }
     
    +    warn!("zebratrace100 vote do_add {:?}", do_add);
    +
         build_comment_response(
           context,
           comment_id,
    diff --git a/crates/apub/src/activities/community/announce.rs b/crates/apub/src/activities/community/announce.rs
    index 116b02726..62af17a1a 100644
    --- a/crates/apub/src/activities/community/announce.rs
    +++ b/crates/apub/src/activities/community/announce.rs
    @@ -24,6 +24,7 @@ use lemmy_api_common::context::LemmyContext;
     use lemmy_utils::error::LemmyError;
     use serde_json::Value;
     use url::Url;
    +use tracing::warn;
     
     #[async_trait::async_trait]
     impl ActivityHandler for RawAnnouncableActivities {
    @@ -93,6 +94,11 @@ impl AnnounceActivity {
       ) -> Result<(), LemmyError> {
         let announce = AnnounceActivity::new(object.clone(), community, context)?;
         let inboxes = community.get_follower_inboxes(context).await?;
    +
    +    // example, local community with remote subscribers generates this: zebratrace500 like activity inboxes [Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("lemmy.ml")), port: None, path: "/inbox", query: None, fragment: None }, Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("lemmy.management")), port: None, path: "/inbox", query: None, fragment: None }, Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("lemmy.austinite.online")), port: None, path: "/inbox", query: None, fragment: None }, Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("lemmywinks.xyz")), port: None, path: "/inbox", query: None, fragment: None }, Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("lemmy.asc6.org")), port: None, path: "/inbox", query: None, fragment: None }]
    +    // example, local community with no remote subscribers, generates this: zebratrace500 like activity inboxes []
    +    warn!("zebratrace500 like activity inboxes {:?}", inboxes);
    +
         send_lemmy_activity(context, announce, community, inboxes.clone(), false).await?;
     
         // Pleroma and Mastodon can't handle activities like Announce/Create/Page. So for
    diff --git a/crates/apub/src/activities/community/mod.rs b/crates/apub/src/activities/community/mod.rs
    index 010bad4f4..c539b10e0 100644
    --- a/crates/apub/src/activities/community/mod.rs
    +++ b/crates/apub/src/activities/community/mod.rs
    @@ -9,6 +9,7 @@ use lemmy_api_common::context::LemmyContext;
     use lemmy_db_schema::source::person::PersonFollower;
     use lemmy_utils::error::LemmyError;
     use url::Url;
    +use tracing::warn;
     
     pub mod announce;
     pub mod collection_add;
    @@ -51,14 +52,33 @@ pub(crate) async fn send_activity_in_community(
         );
       }
     
    +  warn!("zebratrace300 send_activity_in_community vote local? {:?} act {:?}", community.local, activity);
    +
       if community.local {
         // send directly to community followers
         // see comments on GitHub
         AnnounceActivity::send(activity.clone().try_into()?, community, context).await?;
       } else {
         // send to the community, which will then forward to followers
         inboxes.push(community.shared_inbox_or_inbox());
       }
     
    +  // example output, remote homed community: zebratrace301 send_activity_in_community vote inboxes [Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("beehaw.org")), port: None, path: "/inbox", query: None, fragment: None }]
    +  // example output, local homed community with no remote followers: zebratrace301 send_activity_in_community vote inboxes []
    +  // same output even when local community has a remote follower: zebratrace301 send_activity_in_community vote inboxes []
    +  warn!("zebratrace301 send_activity_in_community vote inboxes {:?}", inboxes);
    +
       send_lemmy_activity(context, activity.clone(), actor, inboxes, false).await?;
    +
    +  warn!("zebratrace302 send_activity_in_community after");
    +
       Ok(())
     }
    diff --git a/crates/apub/src/activities/voting/mod.rs b/crates/apub/src/activities/voting/mod.rs
    index 8bae05577..1f680c516 100644
    --- a/crates/apub/src/activities/voting/mod.rs
    +++ b/crates/apub/src/activities/voting/mod.rs
    @@ -31,6 +31,7 @@ use lemmy_utils::error::LemmyError;
     
     pub mod undo_vote;
     pub mod vote;
    +use tracing::warn;
     
     #[async_trait::async_trait]
     impl SendActivity for CreatePostLike {
    @@ -76,6 +77,7 @@ impl SendActivity for CreateCommentLike {
       }
     }
     
    +// this send_activity is for votes, look at the score parameter
     async fn send_activity(
       object_id: ObjectId<PostOrComment>,
       community_id: CommunityId,
    @@ -89,6 +91,10 @@ async fn send_activity(
         .await?
         .into();
     
    +  // production testing reveals that this code is hit with a user doing a vote on local or federated comment.
    +  // production testing reveals that when a local user votes, even in a community with zero remote subscribers, this has a hefty &actor data payload. which is just the user who did the vote... lots of overhead to vote on a comment?
    +  warn!("zebratrace200 send_activity vote {:?}", &actor);
    +
       // score of 1 means upvote, -1 downvote, 0 undo a previous vote
       if score != 0 {
         let vote = Vote::new(object_id, &actor, &community, score.try_into()?, context)?;
    diff --git a/crates/apub/src/activities/voting/undo_vote.rs b/crates/apub/src/activities/voting/undo_vote.rs
    index bcb8ee406..44da390f3 100644
    --- a/crates/apub/src/activities/voting/undo_vote.rs
    +++ b/crates/apub/src/activities/voting/undo_vote.rs
    @@ -12,6 +12,7 @@ use crate::{
       },
       PostOrComment,
     };
    +// Ok, federation is involved here in this class somehow
     use activitypub_federation::{
       config::Data,
       kinds::activity::UndoType,
    diff --git a/crates/apub/src/activities/voting/vote.rs b/crates/apub/src/activities/voting/vote.rs
    index 7f36ed471..ee72563ed 100644
    --- a/crates/apub/src/activities/voting/vote.rs
    +++ b/crates/apub/src/activities/voting/vote.rs
    @@ -12,6 +12,7 @@ use crate::{
       },
       PostOrComment,
     };
    +// Ok, federation is involved somehow here
     use activitypub_federation::{
       config::Data,
       fetch::object_id::ObjectId,
    @@ -22,6 +23,7 @@ use lemmy_api_common::context::LemmyContext;
     use lemmy_db_schema::source::local_site::LocalSite;
     use lemmy_utils::error::LemmyError;
     use url::Url;
    +use tracing::warn;
     
     impl Vote {
       pub(in crate::activities::voting) fn new(
    @@ -57,6 +59,7 @@ impl ActivityHandler for Vote {
       #[tracing::instrument(skip_all)]
       async fn verify(&self, context: &Data<LemmyContext>) -> Result<(), LemmyError> {
         let community = self.community(context).await?;
    +    warn!("zebratrace000 vote verify {:?}", &self.actor);
         verify_person_in_community(&self.actor, &community, context).await?;
         let enable_downvotes = LocalSite::read(context.pool())
           .await
    @@ -70,6 +73,8 @@ impl ActivityHandler for Vote {
     
       #[tracing::instrument(skip_all)]
       async fn receive(self, context: &Data<LemmyContext>) -> Result<(), LemmyError> {
    +    // based on testing with a client locally, this code path is only used for federatio incoming votes?
    +    warn!("zebratrace000 vote receive {:?}", &self.id);
         insert_activity(&self.id, &self, false, true, context).await?;
         let actor = self.actor.dereference(context).await?;
         let object = self.object.dereference(context).await?;
    diff --git a/crates/apub/src/objects/community.rs b/crates/apub/src/objects/community.rs
    index 17476e9f8..e437c61ad 100644
    --- a/crates/apub/src/objects/community.rs
    +++ b/crates/apub/src/objects/community.rs
    @@ -33,6 +33,7 @@ use lemmy_utils::{
     };
     use std::ops::Deref;
     use tracing::debug;
    +use tracing::warn;
     use url::Url;
     
     #[derive(Clone, Debug)]
    @@ -207,6 +208,8 @@ impl ApubCommunity {
           })
           .collect();
     
    +    warn!("zebratrace600 is every single federated outbound Like hitting this poiint?");
    +
         Ok(inboxes)
       }
     }
    
15 comments