Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

vladmalynych · 2024-09-19T07:47:57Z

Problem:

The Grafana dashboards defined in grafana-dashboardDefinitions.yaml include graphs for memory consumption per pod. The memory consumption query currently used is:

https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/grafana-dashboardDefinitions.yaml#L8300

                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container)",
                          "legendFormat": "__auto"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(\n    kube_pod_container_resource_requests{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
                          "legendFormat": "requests"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "${datasource}"
                          },
                          "expr": "sum(\n    kube_pod_container_resource_limits{job=\"kube-state-metrics\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", resource=\"memory\"}\n)\n",
                          "legendFormat": "limits"
                      }
                  ],
                  "title": "Memory Usage (WSS)",
                  "type": "timeseries"
              },

When a pod is restarted, the current query adds memory usage data from both the old and new containers simultaneously. This can lead to temporary spikes in the displayed memory consumption. As a result, the dashboard may show memory usage that exceeds the container's memory limit, even though the actual memory consumption is within the limit.

Steps to Reproduce:

Trigger a pod restart (e.g OOM kill, or Evict).
Compare graphs with expression grouped by just container field with graph that has expression that groups by container and id:

"expr": "sum(container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\", cluster=\"$cluster\", namespace=\"$namespace\", pod=\"$pod\", container!=\"\", image!=\"\"}) by (container, id)"

The text was updated successfully, but these errors were encountered:

…h Behaviour

froblesmartin · 2024-11-04T06:43:37Z

Hi! I also found this issue, but I am thinking that it may not be an issue on the dashboard, but on the metric itself or in the scrapping configuration, no? 🤔 For me I still see the previous container run for 4:30 minutes (comparing when the new run started and the metrics from the previous one disappear).

I would expect this metric only to show, as it is described in the official documentation, the

Current working set of the container in bytes

but as seen, it also includes the memory from a container that is not currently running anymore.

May it be a misconfiguration in the metrics scrapper?

And the problem is that, for a dashboard showing a single pod, this is viable showing the different containers, but what about a dashboard showing the total memory usage in the cluster? If you still sum the different container instances for the same container you will be displaying something wrong. 🤔

vladmalynych added the kind/bug label Sep 19, 2024

vladmalynych added a commit to vladmalynych/kube-prometheus that referenced this issue Sep 20, 2024

Fixes prometheus-operator#2522: Incorrect Pod Memory Usage (WSS) Grap…

33b6cb1

…h Behaviour

vladmalynych linked a pull request Sep 20, 2024 that will close this issue

Fixes #2522: Incorrect Pod Memory Usage (WSS) Graph Behaviour #2524

Open

5 tasks

vladmalynych mentioned this issue Oct 17, 2024

[kube-prometheus-stack] Allow Overriding or Disabling Default Grafana Dashboards prometheus-community/helm-charts#4920

Open

froblesmartin mentioned this issue Nov 4, 2024

container_memory_working_set_bytes shows previous container memory kubernetes/kubernetes#128538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

vladmalynych commented Sep 19, 2024

froblesmartin commented Nov 4, 2024 •

edited

Loading

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

Incorrect Container Memory Consumption Graph Behavior When Pod is Restarted #2522

Comments

vladmalynych commented Sep 19, 2024

Problem:

Steps to Reproduce:

froblesmartin commented Nov 4, 2024 • edited Loading

froblesmartin commented Nov 4, 2024 •

edited

Loading