First of all, I would like to thank the Lemmy.world team and the 2 admins of other servers @[email protected] and @[email protected] for their help! We did some thorough troubleshooting to get this working!
The upgrade
The upgrade itself isn't too hard. Create a backup, and then change the image names in the docker-compose.yml and restart.
But, like the first 2 tries, after a few minutes the site started getting slow until it stopped responding. Then the troubleshooting started.
The solutions
What I had noticed previously, is that the lemmy container could reach around 1500% CPU usage, above that the site got slow. Which is weird, because the server has 64 threads, so 6400% should be the max.
So we tried what @[email protected] had suggested before: we created extra lemmy containers to spread the load. (And extra lemmy-ui containers). And used nginx to load balance between them.
Et voilà. That seems to work.
Also, as suggested by him, we start the lemmy containers with the scheduler disabled, and have 1 extra lemmy running with the scheduler enabled, unused for other stuff.
There will be room for improvement, and probably new bugs, but we're very happy lemmy.world is now at 0.18.1-rc. This fixes a lot of bugs.
Good work upgrading! I can't imagine it being too easy with a big instance.
I had issues with comments not federating to my own instance before this update (showing 0 for hours). Opening up this up now showed most of them right away if not all. Hopefully that means 0.18.1 fixed a fair few issues people had with federation.
This is caused by an issue in the latest RC of the Lemmy UI.
It's already been reported, and ruud will probably decide how to deal with it tomorrow.
The current workaround
Make sure you are on the main page (https://lemmy.world) and not looking at any posts or something like that before hitting the login button.
If you encounter other issues, please make sure to clear the browser cache. The latest upgrade also made changes to the API, which can cause issues with the cached version of the website.
A bit off topic, but does anyone else hate how when you click on a post and then go back, the page auto-resets to the top? Wish it would remember how far down you scrolled and return to that point.
Huge thanks to the lemmy.world team over the last couple of days to scale and maintain the instance! There's a link for donating on the sidebar for lemmy.world - just a couple bucks a month can help us support this instance!
So some strange behaviour: When I pressed the upvote arrows in 0.17.4, it'd immediately show this in the UI. Right now, it does not. The response appears quite slow. Is this a function of 0.18.1-rc or a function of the traffic of the Reddit-fugees?
I had a strange bug today where I wasn't able to upvote comments. So I cleared out my website data like the website suggested and I started having problems logging in. It would log in but then when I refreshed it wasn't logged in anymore. It stopped after a while but then when I clicked on an old tab when I refreshed I was logged out again. So, the log in issue must be something to do with how iOS Safari handles web cache on tabs.
I'm not sure if this has been said but, when I open lemmy on browser, my account would sometimes be someone elses.
I don't know if it's a bug and I saw it happen three times to me so far, and it even happened again a few minutes ago.
It's like I i logged into someone elses account, I saw three other usernames so far.
A few minutes ago it said my account was Professor -?-?-?- with that account's profile picture shown too.
It only does that for half a second before it returns back to my account.
I'm just making sure this is said because I don't want to one day accidentally log into someone
else's account by accident.
Thanks a bunch for your hard work, Ruud and other admin folks! It's so damn GOOD to be able to use Jerboa again!
Also, it's really nice to see the breakdown of your work, helps a lot in understanding what you go through and maybe even of there's anything we can help with. Keep it up!
I'm one of the many who have had trouble logging in, and this issue is surely underreported as those affected generally aren't able to report it. It also seems like I'm not able to upvote or downvote. I'll update with any more issues that I come across but I only just now became able to log-in after a long wait and several different browsers.
Edit: it seems like I can successfully upvote/downvote, but the updated vote count and my blue/red arrow only show after refreshing the page. Thanks for all the work you put into this instance btw
I may not be a user on your instance, but either way, thanks for the upgrade. I was noticing a lot of issues with federation from lemmy.world, and it seems like this upgrade more-or-less fixed them.
I'm just running a tiny, single-user instance, but I want you to know that I appreciate the work you're putting in! I run large-scale infra as my day job, so I understand how challenging this sudden influx of users (and federated servers!) is.
Real challenging this morning posting and commenting. Circle of death waiting for something to post. Then getting multiple posts if it does go through.
I am having some issues logging in to my lemmy.world account atm, just a heads up. I'm sure you folks are slammed right now, thanks for all the work you're doing!
I'd like to know more about the exact container topology you have, since I may try something similar on my instance as well.
Is it something like this?
It's faster now, and we finally have buttons for rich text features! Congratulations!
Update: upvotes are a bit broken and weird right now, I need to refresh every time to see that I upvoted. But that's really the only issue I see right now.
I really appreciate the transparency in this post. There's enough information for me to feel like I kind of know what's going on, and I can go dig into it deeper if I feel like it. This is a breath of fresh air from what I'm used to, thanks so much!
Thank you from England for all the hard work AND for giving such interesting details, especially as it will encourage others to set up their own instances, and help them cross similar hurdles!!
So, basically I can't see any content from lemmyworld, I'm commenting right now from another instance. When I logged into my Lemmy world account its just empty, zero content, any solutions?
Edit still see some performance issues. Needs more troubleshooting
Federation overheard is putting a lot of load on servers. Creating one task for every single post, comment, and vote in RAM-only queue.... pending changes: https://github.com/LemmyNet/lemmy/pull/3466
Login problem is fixed for me, yay! Back on Jerboa and here on the browser! Thanks for your hard work and for putting up with me, lol.
I'm getting network errors that aren't allowing me to actually view content on Jerboa right now, though, but at this point I'm assuming it's a Jerboa thing and not a problem with the instance.
To everyone having a login problem, it seems that resetting the password solves the issue! Maybe this means that the upgrade corrupted the stored hashes somehow?
Thanks for the update. I especially like the transparency on not only the “upgrade” itself but also the potential issues encountered, together with the solutions. Seems rare nowadays, or I’m just seeing less and less people doing this.
Nice, really liking the update!
Some questions about development for the fediverse:
Is the code for running Lemmy written by one or person or a smome core team?
Is there any decision making process as to which features will be worked on in the next release or which bugs to prioritize?
In theory what would happen if the original developers started making changes that other people don't agree with? Would we get a fork then where servers have to choose to adopt it or not?
Is there a issue with the api? ( Because the api wrapper lemmy-js-client doesnt work on login. ) I tried it yesterday but not today yet. I will test it when i can :)
Have you considered running your Lemmy instance on more than a single machine? If it is possible to run two lemmy containers anyway (ie, lemmy is not a singleton), why not run them on separate machines? With load balancing you could achieve a more stable experience.
It might be cheaper to have many mediocre machines rather than a single powerful one too, as well as more sustainable long-term (vertical vs horizontal scaling).
The downside would be that the set-up would be less obvious than with Docker compose and you would probably need to get into k8s/k3s/nomad territory in order to orchestrate a proper fleet.
obviously not critical, but it looks like there's a small sidebar bug (or feature?) that puts the pic near the instance name if it is the first thing in its description?
Running so many Lemmy instances against the same database doesn't cause race conditions? I wonder why that "just worked" so easily, usually load balancing DB-backed apps is a whole beast on its own.
0.18 looks a lot better. Far better use of screen real estate on PCs.
Lag is still very prevalent though. Page loading, upvote delay. It's frustrating.
Live comments (like on new Reddit) does not seem to be working on 0.18, so I have to manually refresh the page each time. That also resets the comment sort to Hot, causing further annoyance.
Thanks for the hard work!, I had an issue the first minutes where every time I logged in I got logged in with a different stranger account, now it doesn't happen but I can't login haha.
Browser still not working for me. The interface loads but there's no content. Also can't login on browser, after entering user and password and clicking login nothing happens.
Tried to login but nothing happen except a "?" was added into the link. Tried delete data, cookie, etc but the probelm still persist. Comment from other instance
I was having trouble earlier but now able to log in just fine on browser. Voting on posts doesn't seem to be working for me on desktop or apps. In apps I keep seeing error notices about votes not going through and desktop browser (Firefox) doesn't work but there's no notification there. Anyone else? Maybe everything needs a little time to sync up.
Love the update, all back up and running again :)
I joined this morning after discovering this awesome Apollo replacement and was so disappointed that it was down already! Understand that the sudden surge must be huge, looking forward to seeing the data of amount of users gained by Lemmy!
Honestly praying this is the solution we all want and need!
Does browsing with Incognito/Private mode opens up new bugs, or does the refreshing thing kept the same principle? I should be stayed as logged on, but for some reason - after this update - whenever I open a new private tab from the tab I'm logged on I am indicated as not logged in.
thank u for letting us know what happening behind the scene.
Me myself is a sysadmin and really love to read story about scaling up servers and it actually works!
Once again. Thank you.
You know, this is a nice post because now I understand what was happening to me as a user. Thanks for confirming that I am not insane! Well, maybe I'm insane, but what I was trying to do and couldn't was real, not something I was doing wrong. Also, thanks for updating the stuff that makes it work.
Thank you for al the work. Do you have a need or plans for community help at all? Outside of content moderation? Not quite sure how I could help but I do software for a living.
Question @[email protected] why update to the release candidate? Just want to help testing? Or was there some readdition (ie: captcha) that had you quick on the trigger?
Thanks very much for your time and effort Ruud, it's much appreciated!
Now, after you've put the kids to bed, grab yourself a beer and put your feet up!
Congrats on figuring it out! I'm just wading into docker in a professional capacity so I admit some of it feels like magic to my traditional developer brain but glad it worked out.
What I had noticed previously, is that the lemmy container could reach around 1500% CPU usage, above that the site got slow. Which is weird, because the server has 64 threads, so 6400% should be the max. So we tried what @[email protected] had suggested before: we created extra lemmy containers to spread the load. (And extra lemmy-ui containers). And used nginx to load balance between them.
Et voilà. That seems to work.
They're on virtual tin and didn’t configure properly. There are limits in the flat files you need to change by hand to get it to scale properly, it’s tricky.
Kinda makes sense that multiple containers might scale better. The actual processes within the container may have some limitations in terms of how well they thread etc.
Thank you so much for doing this! The having an instance this big really made the difference for leaving reddit. I really missed jerboa and am glad to have it back as a client.