I get sometimes 503 errors in the logs :
karma-5666566999-n28pv karma level=error msg="Request failed" error="request to https://***redacted***alertmanager:9093/metrics failed with 503 Service Unavailable" alertmanager=***redacted*** uri=https://***redacted***alertmanager:9093/
It seems that the error message comes from /internal/alertmanager/models.go#L92
And probeVersion
is called at /internal/alertmanager/models.go#L366.
Question 1 : could you confirm this ?
In this code, I also notice that when an error occur, probeVersion
will return ""
with some logging, but :
Question 2 / Bug ? : when probing the version fails, but Karma goes on retrieving silences and alerts, is this a bug ?
Question 3 / Feature request : Could you create a metric that shows when probing Alertmanager version failed ? Shouldn't it stop at line 370 ?
For this feature request, maybe you could create a metric named karma_alertmanager_probed_version
with the version as a label and with the value set to 1
, or 0
if something failed ?
I have not had 503 errors for a while.
When I wrote this issue, maybe there was a problem on Alertmanager that I could not reproduce at that moment, and that blocked Karma for the version but not for the alerts&silences.
As you say, Karma assumes the latest compatible version when it cannot retrieve the Alertmanager version. This it why Karma was still working and I noticed nothing but the log with the 503 error.
No problem for a while : should we close this issue ?
About feature request and the metric, it could be a counter that increments every time Karma fails to connect to Alertmanager. This should be easier to have a native metric than creating a custom metric with Promtail matching on 503. But I have had no problem for a while : do I still need it ? I don't know...
Owner Name | prymitive |
Repo Name | karma |
Full Name | prymitive/karma |
Language | TypeScript |
Created Date | 2018-09-09 |
Updated Date | 2023-03-17 |
Star Count | 1921 |
Watcher Count | 33 |
Fork Count | 166 |
Issue Count | 2 |
Issue Title | Created Date | Updated Date |
---|