Lemmy as a project has suffered all month because Lemmy.ml has not been sharing critical logs from Nginx and Lemmy's code logging itself

Lemmy.ml front page has been full of nginx errors, 500, 502, etc. And 404 errors coming from Lemmy.

Every new Lemmy install begins with no votes, comments, postings, users to test against. So the problems related to performance, scaling, error handling, stability under user load can not easily be matched given that we can not download the established content of communities.

Either the developers have an attitude that the logs are of low quality and not useful for identifying problems in the code and design, or the importance of getting these logs in front of the technical community and trying to identify the underlying patterns of faults is being given too low of a priority.

It's also important to make each log of failures identifiable to where in the code this specific timeout, crash, exception, resource limit is encountered. Users and operations personnel reporting generic messages that are non-unique only slow down server operators, programmers, database experts, etc.

There are also a number of problems testing federation given the nature of multiple servers involved and trying not to bring down servers in front of end-users. It's absolutely critical that failures for servers to federate data be taken seriously and attempts to enhance logging activities and triangulate causes of why peer instances have missing data be track down to protocol design issues, code failures, network failures, etc. Major Lemmy sites doing large amounts of data replication are an extremely valuable source of data about errors and performance. Please, for the love of god, share these logs and let us look for the underlying causes in hard to reproduce crashes and failures!

I really hope internal logging and details of the inner workings of the biggest Lemmy instances is shared more openly with more eyes on how to keep scaling the applications as the number of posts, messages, likes and votes continue to grow each and every day. Thank you.

Three recently created communities: [email protected] -- [email protected] -- [email protected]

52 comments

Hey buddy, I understand you're frustrated, but I just want to make a few points:

I have personally seen many instance admins and Lemmy contributors note many times over the past weeks that Lemmy is unoptimized and not ready for the current traffic

I have myself mentioned it several times in announcements to users of my own Lemmy instance

Lemmy maintainers have asked for help with optimization in several channels

Lemmy maintainers are clearly working hard at fixing Lemmy issues and improving performance - just look at the work that went into 0.18 - the fact that it's far from perfect is clear to everybody, but progress is constantly being made

Lemmy maintainers have mentioned multiple times that their inboxes are full of notifications and DMs - it's not that they're brushing anything under the rug, it's just that they're not physically able to keep up with the volume of communication that is being thrown at them

I really believe that you have some useful insights and can be very helpful for Lemmy, but I'm afraid that if you take this accusatory tone and blame people for not doing enough then that will overshadow anything helpful that you're actually saying.

Having said all that, if you would like to take a look at some stats about queries on lemm.ee (a Lemmy instance with 4k users - definitely much smaller than lemmy.ml), I have put together a spreadsheet here: https://docs.google.com/spreadsheets/d/e/2PACX-1vSPpqM6QCZYAAvnWe8p-xxN553ukRIquHw71j3nB763x7TNeqeUO-Oss51yPC7zVaT2x4jll39NCeMu/pubhtml#
- Lemmy maintainers have asked for help with optimization in several channels
  
  I do not see them using Lemmy itself to actually discuss the problems of Lemmy. Specific to lemmy.ml and the developer relationship with this specific server, crashes (logs) are not being shared.
  
  10 days ago: https://lemmy.ml/post/1271936
  
  I can not emphasize the title of the posting you are reading enough. "Lemmy as a project has suffered all month because Lemmy.ml has not been sharing critical logs from Nginx and Lemmy's code logging itself"
  
  Logs, logs, logs. Why were these crash logs not shared as part of the Lemmy project? When the most busy server on the whole project is not sharing their Rust code logs and crashes, what are us trying to work on the SQL and architecture problems supposed to do? I didn't even report 1 in 100 of the crashes I was experiencing.
  
  It is a peer to peer network, server to server, and the central hub has encouraged everyone to run out and create new servers without any concern to report the crashes going on within the central hub. I just don't get why everyone here is defending such behavior and leadership.
  
  What I see was sharing of CONCLUSIONS - that "increase the worker count" was the problem. No, the problem is fundamental to the whole Rust application's automatically generated SQL statements, lack of data caching, lack of proper MTA and queue for federation inbound and outbound data. Just saying that the federation worker count was the problem and making the value infinite was not in any way getting to the problems that sharing the server crash logs would have exposed.
  
  June 14, the GitHub issue on "Scaling Federation" was CLOSED by project leadership! Meanwhile, lemmy.ml was crashing for me every hour! Failing to federate with any reliability too. June 15 is when https://lemmy.ml/post/1271936 was opened, the day after this CLOSE of a GitHub issue:
  
  The DDOS is coming from WITHIN THE HOUSE. Lemmy's performance problems are causing federation to bring down peer servers, and the LOGS of Rust code exceptions that are being KEPT SECRET will reveal this! The sharing of logs and making this a federation-wide announcement that the hub is failing on data exchange is critical, not optional
  
  It's sad to me that the leadership of this project can't just come out and openly admit it is "experimental" project and "unstable", and is ignoring https://lemmy.ml/post/1271936 and bragging on GitHub that it is "high performance Rust". It might have seemed high performance when you sent 8 whole test messages to 4 servers a day, but that isn't the meaning of "high performance". depressing to see such denial and the people who believe in the "reality distortion field" around the project.
I agree with you, but I think you sound a bit too harsh for developers.

I think they are doing their best currently and have probably identified more immediate issues before addressing all that we see.

There are other big instances which could share the logs, let's ask lemmy.world and beehaw if they can share the logs and leave main developers to work.

Another bug thing I am thinking can benefit from information sharing is bot account detection.

I would like to take a look at that data and find ways to identify bots. I just don't know what data can be useful, but will try to make my own instance and work on it.
- I agree with you, but I think you sound a bit too harsh for developers.
  
  The failure to inform users by official announcement or mention in the 0.18 release notes that Lemmy is failing to replicate data reliably I think is a failure of the project management. "your data doesn't matter here on the Lemmy network". Why are end users not being told that their messages are in fact not reliably being shared to other instances? Why are the server install and release notes not warning the community that each additional instance being brought online is increasing the replication workload of establishes sites - that are already faltering?
  
  The problem is being covered up, brushed under the rug. The issues of creating tools to adequately load test federation and track problems wasn't raised during project development as an important ToDo item, call for assistance, nor has it really been noticed by most of the server operators. I've personally been going around to dozens of Lemmy instances and hand observing the failures to replicate data. No thought was put into even the most primitive tools to operate a server and have a sense of 'how would you know' if federation was failing?
  
  Yet, the leaders of Lemmy have created directories of "recommended sites" to go sign up with and given the impression that you can access active communities from peer instances to help offload the server reliability problem. Federation itself is unreliable on Lemmy to Lemmy!
  
  Either they are covering up the problem, hiding it out of pride, or not opening bugs on GitHub or not calling for help in the 0.18 Release Notes. Which is it?
  
  The problem is being covered up, brushed under the rug
  
  Either they are covering up the problem, hiding it out of pride, or not opening bugs on GitHub or not calling for help in the 0.18 Release Notes.
  
  You've raised many good points to come to a wildly accusatory conclusion.
  
  Get off of this line of thinking if you want to raise support for fixing the valid issues you've raised.
  
  Have you made any attempt to see if these issues have been raised on GitHub? Have you made any attempt to create issues on GitHub? Have you made any attempt to submit code enhancements via GitHub?
  
  I like your posts and hunt for performance issues, I just think that developers decided (wether you and I agree or not) some other features are more important.
  
  Until few weeks ago communication was clear since there were not many people here, so there was no need for some specific notes you are mentioning.
  
  Now we do need them and reminding developers of it, or even better doing it would be much appreciated I expect.
  
  I have seen developers on some threads here or on github issues commenting on repication problems and they are hard at work for those.
  
  Even caching is discussed, as I understand, they first need to implement cache control headers so that admins can set up caching, as they see fit, outside lemmy.
  
  There is a lo of good will around, please have understanding and be part of it. Give the time to grow up to this opportunity.
- I think you sound a bit too harsh for developers.
  
  Where is the caching in the Rust code for databases? Why isn't caching being discussed in a Lemmy community using Lemmy itself as a venue for discussing the major performance problems in the code?
  
  The thread this comment is from: https://www.reddit.com/r/rust/comments/zvt1mu/tips_on_scaling_a_monolithic_rust_web_server/
  
  https://github.com/LemmyNet/lemmy/issues/2975
  
  Caching is being worked on in shape of cache control headers, not in a way you mention sql cache, but will get better.
  
  If it shows it itms not enough I can imagine devs will change their opinion on it, like they did with websockets.
There was a time, far in the past before Reddit mocking became the central focus. that sharing server logs was actually recommended:

But then nginx 500 errors came along, and nobody running big servers bothered to share their crash logs.
People report the problem on Github, server operators ignore it, and they give up. The problem is being systemically ignored, and those who persistently raise that there are serious problem in the design, execution, and operation of the code - are ignored, silenced and deflected.

This isn't me, this is another person ignored on Girhub:

RELEASE 0.18 went out without any warning of the ongoing federation failures.
Lemmy community is an echo-chamber of people ignoring the fact that the developers bold claim "HIGH PERFORMANCE" on the GitHub project page without any validation.

Not one person was willing to stand up and say "this is LOW performance, amature mistakes in not having caching of" SELECT "local_site"."id", "local_site"."site_id", "local_site"."site_setup", "local_site"."enable_downvotes", "local_site"."enable_nsfw", "local_site"."community_creation_admin_only", "local_site"."require_email_verification", "local_site"."application_question", "local_site"."private_instance", "local_site"."default_theme", "local_site"."default_post_listing_type", "local_site"."legal_information", "local_site"."hide_modlog_mod_names", "local_site"."application_email_admins", "local_site"."slur_filter_regex", "local_site"."actor_name_max_length", "local_site"."federation_enabled", "local_site"."federation_worker_count", "local_site"."captcha_enabled", "local_site"."captcha_difficulty", "local_site"."published", "local_site"."updated", "local_site"."registration_mode", "local_site"."reports_email_admins" FROM "local_site" LIMIT $1

It's outright disillusion that people labeled it "HIGH PERFORMANCE".

The Lemmy community sure have the bullying part and shit-talking peer applications down on Social media, but the project internally shows that there is no street credibility when it comes to "HIGH PERFORMANCE" bold claims against even the basic email systems.

Lemmy admins who won't use Lemmy to talk about and try to solve their hourly server crashes .

Gaslighting me about it was way more serious than me speaking up bravely about the problem. Intimidation tactics you are showing to me say a ton about why you have these problems in the first place. The lack of concern for the end-user data being lost also speaks volumes as to priorities.
So far, I've gotten nothing but replies that do not talk about the failure to show logs and the importance of logging in server applications.

Logging matters

Sharing logs matter

Server apps don't have a nice GUI, you use logs

Logging matters

Replies are DEFLECTING the problem

Do I need to keep repeating how much lemmy.ml's UNAVAILABLE logs of actual failures has been holding back the entire platform and community? this should be blindingly obvious to anyone who has built and supported big client/server apps - what do the error logs say is crashing when you get a 500 error? Issue a BOLO to other server operators on Github or on LEMMY social media platform!

Data integrity, failure to replicate comments to peer instances, is also being ignored. WHY WAS THIS NOT IN THE 0.18 RELEASE NOTES when the application is being pushed as 'High Performance' is on the front of the Github page?

Lemmy isn't being used to even discuss the technical problems of Lemmy. "Eat your own dogfood" isn't cared about here. The people running servers aren't reporting major problems and sharing logs to the community.

I'll repeat it since so many comments DEFECT and use FUD, gaslighting, downvotes, and intimidation to try and shut up the truth.

My name is Stephen Alfred Gutknecht, I don't HIDE my identity behind anonymous names and deflection. My username is "RocketDerp" on GitHub, and I created [email protected] as a subtle message weeks ago. The SUBTLE MESSAGE ISN"T WORKING, GOT THAT?!

The plain-spoken Truth was Posted by me on June 7: https://lemmy.ml/post/1166882
- The replies are a reflection of your abrasive and antisocial approach to this. Start working with people instead of yelling at them and maybe you can help improve the situation.
  
  The replies are a reflection of your abrasive and antisocial approach to this. Start working with people instead of yelling at them and maybe you can help improve the situation.
  
  The paid project management not making this in the release notes of 0.18 - that DATA LOSS IS A REGULAR THING on the platform, is a major sign of incompetent leadership.
  
  abrasive and antisocial approach to this
  
  You can't talk about logs, can you? That logs are critical and important in server applications that run without a front-end console? you just make personal insults about the person sharing the truth of the situation in an open message and are in DEEP DENIAL of the underlying code and project management communication/priority problems.
  
  The party came to Lemmy because of Reddit's failure, and Lemmy crashed for day after day. And nobody running the project reported the crashes on Github, show me where the logs and crashes were shared?
  
  FUD is the norm here, more replies that are DENIAL and gaslight that there is a priority and communication problem in the project. Trying to intimidate me MORE to shut up.
GO ahead, keep trying to gaslight me that Lemmy is "high performance" server application

This is on a low-traffic community with a targeted audience, not on Reddit, not on /c/Lemmy

I have the receipts, I know where the bugs are, I know how much this problem is being IGNORED and I am personally being gaslight that the problem isn't real and true. DEFLECTION is the first response in this unhealthy community to newcomers who know their shit.

It's a disgrace to Rust, Linux, and PostgreSQL that this false statement is on the home page of GitHub for Lemmy.

“In the land of the blind, the one-eyed man is a hallucinating idiot...for he sees what no one else does: things that, to everyone else, are not there.” ― Marshall McLuhan

EDIT: I see your downvotes, and you replies intimidating me. Praise "the powers that be" of the project, eh?