Skip Navigation

[Outage] I dungoof'd - We were unreachable for several Hours! SORRY!

Hello, everyone!

Due to me fiddling around with Cloudflare and accidently applying wrong settings during one of my sleepless nights (I was tired again ๐Ÿ˜…), we weren't reachable for most people for a couple hours.

Since it was a DNS settings problem, it took some time to propagate and therefore it took some time for me to notice. I took my changes back for now and now it'll take some time for that to propagate, as well.

I'm really sorry, it'll not happen again it might happen again, but I'll do my best to prevent things like that in the future. I just want a stable experience for everyone!

7

You're viewing a single thread.

7 comments
  • Hey! First, let me thank you for doing what you do. Maintaining a public service on your own even for a small group of people isn't easy and brings all kinds of responsibilities and stress.

    However, this is a good opportunity to learn. Going from my over 20 years experience the best way to deal with such unfortunate events is going through a blameless postmortem, see https://sre.google/sre-book/postmortem-culture/

    In a nutshell, it's important to be confident about the fact that something like that will happen again, find out what happened, why it's happened, and what you can do to prevent this happening in the future. The later could be some additional check, a peer review, or a process decision, e.g. "don't apply changes late in the night".

    Thank you again and keep up the good work!

    • Thank you very much for your kind words.

      The thing is that I'm a trained sysadmin and even I do make mistakes. This (Lemmy instance) is a hobby of mine and I plan to put something up that's here to stay, however, a hobby needs time. Right now, in order to devote time for my hobbies, I usually go into what I call "sleep debt", which means I usually do stuff at night.

      I will implement a status monitor for this instance (and what belongs to it) later today and I'll also need to find someone else in a different timezone who I can trust with actual server (shell) access in order to fix stuff that might arise when I'm asleep.

7 comments