Seconded. My only complaint (which this might already be a feature I haven't found yet) is it doesn't seem to support multiple drives. But yes, it is shit easy to set up and has a beautiful UI
We just recently started using zabbix. Open source and has a web interface to get a central view that can be accessed from wherever we allow it.
So far it's been great but er have had little time and so far have used only 1% of what it can do
Still, I'd recommend it. Super easy to install, seems light weight, has clients for any os you'd need, can send out alerts (we currently use pushover for that)
checkmk user here. i can second the adjustment phase. i tend to ignore my servers but when something goes sideways it's awesome to have checkmk's structure in place.
Zabbix is pretty quick and easy. Many different services built in for sending notifications, along with your own custom (including webhooks).
Fully customizable dashboard as well so you can add whatever you want/need at a glance.
Any chance you'd be willing to share playbooks or point me toward any resources you used?
I use Ansible to manage config across all my workstations/servers but I haven't gotten around to automating log shipping yet or aggregating system metrics.
Some systems have their own metrics endpoints - instead of getting Prometheus to scrape these directly I set up a Cron job to curl these into files for node exporter - this means I don't need extra config in Prometheus to find the endpoints, and don't need to mess with firewall rules
Other systems don't directly expose metrics in a format Prometheus can use - in this case I will write/find a script that can do the conversation, then either set it up to write the metrics file directly and run it on a Cron, or run it as a service and another Cron job to do the scrape
On this specifically you might want to check ntfy as it's quite easy to setup and can give you notifications on pretty much any device (including iOS) via your own infrastructure all the way down to basics e.g. SSE. That mean you can subscribe to a topic, e.g. servers per physical location, alert level, etc and only get the ones you need.
Netdata is exactly what you're looking for. It's basically an all in one monitoring and and alerting suite that collects and analyzes data, and provides a gorgeous web dashboard for you to view.
You can also manually replicate this using Prometheus, Grafana and other tools, but that requires a much bigger effort to set up.
The five node limit is a dealbreaker for me too. I'm also annoyed the free version doesn't have any real built in options to secure data by default. I followed a TechnoTim tutorial to get the NetData/Prometheus/Grafana stuff setup but it was too limited and required too much manual effort.
Nagios. It does depend on what you mean by monitor though. Nagios is good at telling you that "service A on host B" is down" but less useful for looking at things like performance trends. I particularly like being able to setup dependencies between services, so I get the alert for the root cause, and not all of the services that have gone down because of it.
I just see if it works when I need it. If I’m at home it works. If I’m at work it may work. If I’ve left to travel it’s 95% definitely down and cannot be fixed. This works well!
While I use LibreNMS as it uses SNMP for monitoring (which is pretty much available everywhere), I don't believe it has http alerts, but I know for a fact that it can send Telegram messages.
I remember liking Sensu. We used it a little bit at my previous job, but I didn't get a chance to work with it much. I can't remember what we specifically used it for though. Sorry, wish I had more info for you.