Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kafkaexporter] How to optimize the performance of the Collector, especially the CPU utilization rate? #36974

Open
xiaoyao2246 opened this issue Dec 26, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@xiaoyao2246
Copy link

xiaoyao2246 commented Dec 26, 2024

Describe the bug
I deployed the Collector using Kubernetes to receive Trace data and report it to Kafka. However, I found that the CPU utilization rate of the Collector is high, but the memory utilization rate is low.

Steps to reproduce

I provided the Collector with a configuration of 1C2G.

What did you expect to see?
I hope the CPU utilization of the Collector can be lower, because its memory utilization is very low.

Are there any other ways to optimize this Collector? I want it to use less CPU resources.

What did you see instead?
Then I sent data to this Collector, and its monitoring situation is as follows:
image

What version did you use?
v0.95.0

What config did you use?

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-gd-config
  namespace: default
data:
  config.yaml: |-
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      memory_limiter:
        check_interval: 1s
        limit_mib: 2000
        spike_limit_mib: 400
      batch:
        send_batch_size: 500
        send_batch_max_size: 500
      resource:
        attributes:
          - key: from-collector
            value: gd-fat-k8s
            action: insert
    exporters:
      logging:
        verbosity: normal
      kafka:
        brokers:
          - xx.xx.xx.xx:9092
          - xx.xx.xx.xx:9092
          - xx.xx.xx.xx:9092
        topic: otlp_trace_fat
        partition_traces_by_id: true
        protocol_version: 1.0.0
        sending_queue:
          enabled: true
          num_consumers: 10
          queue_size: 10000
    
    extensions:
      pprof:
        endpoint: ":1777"
    service:
      extensions: [pprof]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, resource, batch]
          exporters: [logging, kafka]

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector-gd
  namespace: default
  labels:
    app: opentelemetry
    component: otel-collector-gd
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector-gd
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-collector-gd
    spec:
      containers:
        - name: otel-collector-gd
          image: otel/opentelemetry-collector-contrib:0.95.0
          resources:
            limits:
              cpu: 1000m
              memory: 2048Mi
          volumeMounts:
            - mountPath: /var/log
              name: varlog
              readOnly: true
            - mountPath: /var/lib/docker/containers
              name: varlibdockercontainers
              readOnly: true
            - mountPath: /etc/otelcol-contrib/config.yaml
              name: data
              subPath: config.yaml
              readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: data
          configMap:
            name: otel-collector-gd-config

---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector-gd
  namespace: default
  labels:
    app: opentelemetry
    component: otel-collector-gd
spec:
  ports:
    - name: otlp-grpc
      port: 4317
      protocol: TCP
      targetPort: 4317
    - name: otlp-http
      port: 4318
      protocol: TCP
      targetPort: 4318
    - name: pprof
      port: 1777
      protocol: TCP
      targetPort: 1777
  selector:
    component: otel-collector-gd

Environment

Additional context
I used pprof to analyze the CPU usage.

File: otelcol-contrib
Type: cpu
Time: Dec 26, 2024 at 1:50pm (CST)
Duration: 300s, Total samples = 230.42s (76.81%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 82.53s, 35.82% of 230.42s total
Dropped 1059 nodes (cum <= 1.15s)
Showing top 10 nodes out of 261
      flat  flat%   sum%        cum   cum%
    14.37s  6.24%  6.24%     14.37s  6.24%  runtime/internal/syscall.Syscall6
    10.79s  4.68% 10.92%     13.18s  5.72%  compress/flate.(*decompressor).huffSym
    10.43s  4.53% 15.45%     20.19s  8.76%  runtime.scanobject
    10.29s  4.47% 19.91%     45.47s 19.73%  runtime.mallocgc
     8.19s  3.55% 23.47%      8.19s  3.55%  runtime.memclrNoHeapPointers
     7.17s  3.11% 26.58%      7.17s  3.11%  runtime.memmove
     7.03s  3.05% 29.63%      7.95s  3.45%  runtime.lock2
     5.52s  2.40% 32.02%     24.51s 10.64%  compress/flate.(*decompressor).huffmanBlock
     4.57s  1.98% 34.01%      4.75s  2.06%  runtime.unlock2
     4.17s  1.81% 35.82%      4.17s  1.81%  runtime.nextFreeFast (inline)

image

@xiaoyao2246 xiaoyao2246 added the bug Something isn't working label Dec 26, 2024
@JaredTan95
Copy link
Member

hi @xiaoyao2246 pls move this issue into opentelemetry-collector-contrib repo.

@bogdandrutu bogdandrutu transferred this issue from open-telemetry/opentelemetry-collector Dec 27, 2024
@bogdandrutu
Copy link
Member

@JaredTan95 done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants