Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki retention stops working after a while #15479

Open
janxe opened this issue Dec 18, 2024 · 0 comments
Open

Loki retention stops working after a while #15479

janxe opened this issue Dec 18, 2024 · 0 comments

Comments

@janxe
Copy link

janxe commented Dec 18, 2024

Describe the bug
Hi, it seems that log retention for our loki deployment stops working after we have redeployed loki a few times.
And all logs seems to be stored forever
I can't say for sure what triggers this, only that it has occured atleast 2-3 times now, when we have redeployed loki a few times with changed size of pvc and other misc changes, while reusing the same s3-buckets.

To Reproduce

  1. deploy loki via helm-chart
  2. delete loki deployment
  3. redeploy loki with different settings, different size for pvc.s, replicas etc, while reusing same s3-buckets as previous deployments

Expected behavior
Loki log retentions still being applied to logs stored from later deployment, and current

Environment:

  • Infrastructure: company kubernetes cluster, local hitatchi s3 storage
  • Deployment tool: helm, argocd

Screenshots, Promtail config, or terminal output
Here is the configuration used, together with some loki log errors we can see.
loki_index_20075 seems to be the newest index stored in the s3 bucket

loki:
  podAnnotations:
    kyverno.io/inject-cacerts: enabled

  podSecurityContext:
    runAsNonRoot: true
    runAsGroup: 10001
    runAsUser: 10001
    fsGroup: 10001
    seccompProfile:
      type: RuntimeDefault

  auth_enabled: true

  analytics:
    reporting_enabled: false

  compactor:
    retention_enabled: true
    delete_request_store: s3
    retention_delete_delay: 10m
  limits_config:
    retention_period: 49h
    max_streams_per_user: 100000

  frontend:
    max_outstanding_per_tenant: 4096

  commonConfig:
    ring:
      kvstore:
        store: memberlist

  storage:
    filesystem: null
    s3:
      endpoint: "${LOKI_ENDPOINT}"
      accessKeyId: "${LOKI_ACCESS_KEY_ID}"
      secretAccessKey: "${LOKI_SECRET_ACCESS_KEY}"
      s3ForcePathStyle: false
    bucketNames:
      chunks: loki-chunk-dev
      ruler: loki-ruler-dev
      admin: loki-admin-dev

  # tsdb and v13 are needed for Loki v3.0.0
  # https://grafana.com/docs/loki/latest/operations/storage/tsdb/
  schemaConfig:
    configs:
      - from: "2022-01-11"
        index:
          period: 24h
          prefix: loki_index_
        object_store: s3
        schema: v12
        store: boltdb-shipper

      - from: "2024-09-03"
        index:
          period: 24h
          prefix: loki_index_
        object_store: s3
        schema: v13
        store: tsdb

  # the TSDB index dispatches many more, but each individually smaller, requests.
  # We increase the pending request queue sizes to compensate.
  query_scheduler:
    max_outstanding_requests_per_tenant: 32768

  # Each `querier` component process runs a number of parallel workers to process queries simultaneously.
  querier:
    max_concurrent: 16
    multi_tenant_queries_enabled: true

# Tests need to be disabled in order to disable Canary
test:
  enabled: false

lokiCanary:
  enabled: false

gateway:
  replicas: 3
  resources:
    requests:
      memory: 25Mi
      cpu: 5m
  podSecurityContext:
    fsGroup: 101
    runAsGroup: 101
    runAsNonRoot: true
    runAsUser: 101
    seccompProfile:
      type: RuntimeDefault

write:
  replicas: 2
  resources:
    limits:
      memory: 1500Mi
      cpu: 300m
    requests:
      memory: 1500Mi
      cpu: 50m
  persistence:
    size: 3Gi
  affinity:
    podAntiAffinity:
      # this need to be empty otherwise it will conflict with preferred
      requiredDuringSchedulingIgnoredDuringExecution:

      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app.kubernetes.io/component: write
                app.kubernetes.io/instance: loki
                app.kubernetes.io/name: loki
            topologyKey: kubernetes.io/hostname
  extraArgs:
    - "-config.expand-env=true"
  extraEnv:
    - name: LOKI_ENDPOINT
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_ENDPOINT
    - name: LOKI_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_ACCESS_KEY_ID
    - name: LOKI_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_SECRET_ACCESS_KEY

read:
  replicas: 2
  resources:
    limits:
      memory: 700Mi
      cpu: 400m
    requests:
      memory: 256Mi
      cpu: 100m
  persistence:
    size: 5Gi
  extraArgs:
    - "-config.expand-env=true"
  extraEnv:
    - name: LOKI_ENDPOINT
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_ENDPOINT
    - name: LOKI_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_ACCESS_KEY_ID
    - name: LOKI_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_SECRET_ACCESS_KEY

backend:
  replicas: 2
  resources:
    limits:
      memory: 1300Mi
      cpu: 600m
    requests:
      memory: 592Mi
      cpu: 50m
  persistence:
    size: 7Gi
    enableStatefulSetAutoDeletePVC: true
  extraArgs:
    - "-config.expand-env=true"
  extraEnv:
    - name: LOKI_ENDPOINT
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_ENDPOINT
    - name: LOKI_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_ACCESS_KEY_ID
    - name: LOKI_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: loki-block-storage
          key: LOKI_SECRET_ACCESS_KEY
  extraVolumes:
    - name: override-rules
      configMap:
        name: loki-tenant-override-rules
  extraVolumeMounts:
    - name: override-rules
      mountPath: /etc/loki/config/override/

sidecar:
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
        - ALL
    readOnlyRootFilesystem: true

memcached:
  podSecurityContext:
    runAsNonRoot: true
    runAsGroup: 11211
    runAsUser: 11211
    seccompProfile:
      type: RuntimeDefault

resultsCache:
  enabled: true
  resources:
    limits:
      memory: 1331Mi
    requests:
      cpu: 20m
      memory: 1331Mi
  defaultValidity: 12h
  allocatedMemory: 1024

chunksCache:
  enabled: true
  resources:
    limits:
      memory: 2662Mi
    requests:
      cpu: 20m
      memory: 2662Mi
  defaultValidity: 0s
  allocatedMemory: 2048

rbac:
  namespaced: true
level=error ts=2024-12-18T15:00:46.160085038Z caller=compactor.go:571 msg="failed to apply retention" err="SerializationError: empty response payload\n\tstatus code: 204, request id: , host id: \ncaused by: EOF"
level=error ts=2024-12-18T15:00:46.159994556Z caller=compactor.go:662 msg="failed to compact files" table=loki_index_20075 err="SerializationError: empty response payload\n\tstatus code: 204, request id: , host id: \ncaused by: EOF"
level=info ts=2024-12-18T15:00:35.726949025Z caller=expiration.go:78 msg="overall smallest retention period 1734357635.726, default smallest retention period 1734357635.726"
level=error ts=2024-12-18T14:55:33.479896021Z caller=compactor.go:548 msg="failed to run compaction" err="SerializationError: empty response payload\n\tstatus code: 204, request id: , host id: \ncaused by: EOF"
level=error ts=2024-12-18T14:55:33.479838188Z caller=compactor.go:662 msg="failed to compact files" table=loki_index_20075 err="SerializationError: empty response payload\n\tstatus code: 204, request id: , host id: \ncaused by: EOF"
level=error ts=2024-12-18T14:45:35.726608822Z caller=compactor.go:561 msg="failed to apply retention" err="SerializationError: empty response payload\n\tstatus code: 204, request id: , host id: \ncaused by: EOF"
level=error ts=2024-12-18T14:45:35.72653095Z caller=compactor.go:662 msg="failed to compact files" table=loki_index_20075 err="SerializationError: empty response payload\n\tstatus code: 204, request id: , host id: \ncaused by: EOF"
level=info ts=2024-12-18T14:45:24.901131705Z caller=expiration.go:78 msg="overall smallest retention period 1734356724.901, default smallest retention period 1734356724.901"
level=info ts=2024-12-18T14:45:24.90106913Z caller=compactor.go:592 msg="compactor started"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant