Prometheus is an open-source monitoring solution that resides locally on your machine.
With Prometheus’s Integration, Zenduty sends new Prometheus alerts to the right team and notifies them based on on-call schedules via email, text messages (SMS), phone calls(Voice), Slack, Microsoft Teams and iOS & Android push notifications, and escalates alerts until the alert is acknowledged or closed. Zenduty provides your NOC, SRE and application engineers with detailed context around the Prometheus alert along with playbooks and a complete incident command framework to triage, remediate and resolve incidents with speed.
Whenever Prometheus alert rule condition is triggered, an alert is created in Zenduty, which creates an incident. When that condition goes back to normal levels, Zenduty will auto-resolve the incident.
You can also use Alert Rules to custom route specific Prometheus alerts to specific users, teams or escalation policies, write suppression rules, auto add notes, responders and incident tasks.
To add a new Prometheus integration, go to “Teams” on Zenduty and click on the “Manage” button corresponding to the team you want to add the integration to.
Next, go to “Services” and click on the “Manage” button corresponding to the relevant Service.
Go to “Integrations” and then “Add New Integration”. Give it a name and select the application “Prometheus” from the dropdown menu.
Go to “Configure” under your integrations and copy the webhooks URL generated.
Ensure that both Prometheus and Prometheus Alertmanager are downloaded and accessible locally on your system. To download them, visit here
Go to Alertmanager Folder and open “alertmanager.yml”. Add the webhook url (copied in the earlier steps) under “Webhook Configs”. Your “alertmanager.yml” file should now look like this:
``` global: resolve_timeout: 5m route: group_by: ['alertname', 'cluster', 'service'] group_wait: 30s group_interval: 5m repeat_interval: 3h receiver: 'web.hook' receivers: - name: 'web.hook' webhook_configs: - url: 'https://www.zenduty.com/api/integration/prometheus/8a02aa3b-4289-4360-9ad4-f31f40aea5ed/' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] ```
Tip: If you’re trying to generate alerts across multiple Zenduty Services, you can define your “Alert Rules” in different files. For example: “first_rules.yml”, “second_rules.yml”, and so on, each with a different integration endpoint.
In the Prometheus folder, open “prometheus.yml”. Add new rules files that you just created and set Target. Zenduty groups Prometheus alerts based on the alertname parameter. Your “prometheus.yml” file should look like this:
``` # my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"] # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] ```
Run Prometheus and Alert Manager using commands like:
run prometheus: ./prometheus --config.file=prometheus.yml
run alertmanager: ./alertmanager --config.file=alertmanager.yml
Once Prometheus is running, you will be able to see the alerts rules you configured.
When an alert is required, Zenduty will automatically create an incident.
Prometheus is now integrated.
Looking for a better way to get real-time alerts from Prometheus Integration, setup a solid incident escalation and incident response pipeline and minimize response and resolution times for Prometheus Integration incidents?