You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of #68 I investigated an issue in the containerd restart routine. When the node-installer installs a runtime and restarts containerd, the corresponding pod terminates with status Unknown
Overview:
kubectl get job
NAME COMPLETIONS DURATION AGE
kwasm-worker-spin-v2-install 1/1 28s 21m
kubectl get po
NAME READY STATUS RESTARTS AGE
kwasm-worker-spin-v2-install-n82d9 0/1 Unknown 0 7m25s
kwasm-worker-spin-v2-install-rq78d 0/1 Completed 0 7m3s
Logs of Pod with status Unknown
kubectl logs kwasm-worker-spin-v2-install-n82d9 -c downloader
2024-05-20T20:49:40 INFO start downloading shim from https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz...
2024-05-20T20:49:42 INFO download successful:
total 40M
drwxrwxrwx 1 root root 46 May 20 20:49 .
drwxr-xr-x 1 root root 48 May 20 20:49 ..
-rwxr-xr-x 1 1001 127 39.6M May 8 17:13 containerd-shim-spin-v2
kubectl logs kwasm-worker-spin-v2-install-n82d9 -c provisioner
2024/05/20 20:49:46 INFO shim installed shim=spin-v2 path=/opt/kwasm/bin/containerd-shim-spin-v2 new-version=true
2024/05/20 20:49:46 INFO shim configured shim=spin-v2 path=/etc/containerd/config.toml
2024/05/20 20:49:46 INFO restarting containerd
Logs of Pod with status Completed
kubectl logs kwasm-worker-spin-v2-install-rq78d -c downloader
2024-05-20T20:49:57 INFO start downloading shim from https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz...
2024-05-20T20:49:59 INFO download successful:
total 40M
drwxrwxrwx 1 root root 46 May 20 20:49 .
drwxr-xr-x 1 root root 48 May 20 20:49 ..
-rwxr-xr-x 1 1001 127 39.6M May 8 17:13 containerd-shim-spin-v2
kubectl logs kwasm-worker-spin-v2-install-rq78d -c provisioner
2024/05/20 20:50:00 INFO shim installed shim=spin-v2 path=/opt/kwasm/bin/containerd-shim-spin-v2 new-version=false
2024/05/20 20:50:00 INFO runtime config already exists, skipping runtime=spin-v2
2024/05/20 20:50:00 INFO shim configured shim=spin-v2 path=/etc/containerd/config.toml
2024/05/20 20:50:00 INFO nothing changed, nothing more to do
The Completed pod only gets scheduled in the first place, as the first one did not terminated successfully; even though the actual job (rewriting containerd config and removing the binary) is done. As a result, the second run of the job has nothing left todo.
Description of Pod with Status Unknown
State: TerminatedReason: UnknownExit Code: 255Started: Mon, 20 May 2024 22:49:46 +0200Finished: Mon, 20 May 2024 22:49:48 +0200
kubectl describe po kwasm-worker-spin-v2-install-n82d9
Name: kwasm-worker-spin-v2-install-n82d9
Namespace: default
Priority: 0
Service Account: default
Node: kwasm-worker/192.168.228.5
Start Time: Mon, 20 May 2024 22:49:35 +0200
Labels: batch.kubernetes.io/controller-uid=7878f58f-1b99-4e81-99f1-7bd5b7bf54ac
batch.kubernetes.io/job-name=kwasm-worker-spin-v2-install
controller-uid=7878f58f-1b99-4e81-99f1-7bd5b7bf54ac
job-name=kwasm-worker-spin-v2-install
Annotations: <none>
Status: Failed
IP: 10.244.2.2
IPs:
IP: 10.244.2.2
Controlled By: Job/kwasm-worker-spin-v2-install
Init Containers:
downloader:
Container ID: containerd://7f63983e513efa392e3cc684bf53d2553aeb898b4bfe08fb22229fbae83406cb
Image: ghcr.io/spinkube/shim-downloader:latest-feat-add_shim_downloader
Image ID: ghcr.io/spinkube/shim-downloader@sha256:719f54c518fc0fc65abbe8ac27978ea188d13faee23530544faf9d622aa2be92
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 20 May 2024 22:49:40 +0200
Finished: Mon, 20 May 2024 22:49:42 +0200
Ready: True
Restart Count: 0
Environment:
SHIM_NAME: spin-v2
SHIM_LOCATION: https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz
Mounts:
/assets from shim-download (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wnr2x (ro)
Containers:
provisioner:
Container ID: containerd://92dd4c994b2fc95d269b5de630c00f55fff233d04d1d649a6b69ce512936278b
Image: ghcr.io/spinkube/node-installer:latest-feat-add_shim_downloader
Image ID: ghcr.io/spinkube/node-installer@sha256:fcbfa4d8197d3de3b9953219af6a8784f23abf7d798150b2c2a606daaeebe6df
Port: <none>
Host Port: <none>
Args:
install
-H
/mnt/node-root
-r
spin-v2
State: Terminated
Reason: Unknown
Exit Code: 255
Started: Mon, 20 May 2024 22:49:46 +0200
Finished: Mon, 20 May 2024 22:49:47 +0200
Ready: False
Restart Count: 0
Environment:
HOST_ROOT: /mnt/node-root
SHIM_FETCH_STRATEGY: /mnt/node-root
Mounts:
/assets from shim-download (rw)
/mnt/node-root from root-mount (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-wnr2x (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
shim-download:
Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> root-mount: Type: HostPath (bare host directory volume) Path: / HostPathType: kube-api-access-wnr2x: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: trueQoS Class: BestEffortNode-Selectors: <none>Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300sEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulling 25m kubelet Pulling image "ghcr.io/spinkube/shim-downloader:latest-feat-add_shim_downloader" Normal Pulled 25m kubelet Successfully pulled image "ghcr.io/spinkube/shim-downloader:latest-feat-add_shim_downloader" in 4.108s (4.108s including waiting) Normal Created 25m kubelet Created container downloader Normal Started 25m kubelet Started container downloader Normal Pulling 25m kubelet Pulling image "ghcr.io/spinkube/node-installer:latest-feat-add_shim_downloader" Normal Pulled 25m kubelet Successfully pulled image "ghcr.io/spinkube/node-installer:latest-feat-add_shim_downloader" in 3.105s (3.105s including waiting) Normal Created 25m kubelet Created container provisioner Normal Started 25m kubelet Started container provisioner
Entire resource of Job (e.g. for recreation of the bug)
The install-pods of kwasm do not terminate with status Unknown, but Completed. The main difference is, that kwasms install script uses the system schedulers restart functionality.
Seeing similar behavior in the uninstall jobs (when deleting a shim). The first pod deletes the shim and restarts containerd but ends with status Unknown. The second and subsequent pods then enter a failure loop, failing with e.g.:
$ k -n rcm logs kind-worker1-spin-v2-uninstall-6m57l
2024/11/06 22:39:35 INFO uninstall called shim=spin-v2
2024/11/06 22:39:35 ERROR failed to uninstall error="failed to delete shim '/opt/kwasm/bin/spin-v2': shim spin-v2 not installed"
As part of #68 I investigated an issue in the containerd restart routine. When the node-installer installs a runtime and restarts containerd, the corresponding pod terminates with status
Unknown
Overview:
Logs of Pod with status
Unknown
kubectl logs kwasm-worker-spin-v2-install-n82d9 -c downloader 2024-05-20T20:49:40 INFO start downloading shim from https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz... 2024-05-20T20:49:42 INFO download successful: total 40M drwxrwxrwx 1 root root 46 May 20 20:49 . drwxr-xr-x 1 root root 48 May 20 20:49 .. -rwxr-xr-x 1 1001 127 39.6M May 8 17:13 containerd-shim-spin-v2
Logs of Pod with status
Completed
kubectl logs kwasm-worker-spin-v2-install-rq78d -c downloader 2024-05-20T20:49:57 INFO start downloading shim from https://github.com/spinkube/containerd-shim-spin/releases/download/v0.14.1/containerd-shim-spin-v2-linux-aarch64.tar.gz... 2024-05-20T20:49:59 INFO download successful: total 40M drwxrwxrwx 1 root root 46 May 20 20:49 . drwxr-xr-x 1 root root 48 May 20 20:49 .. -rwxr-xr-x 1 1001 127 39.6M May 8 17:13 containerd-shim-spin-v2
kubectl logs kwasm-worker-spin-v2-install-rq78d -c provisioner 2024/05/20 20:50:00 INFO shim installed shim=spin-v2 path=/opt/kwasm/bin/containerd-shim-spin-v2 new-version=false 2024/05/20 20:50:00 INFO runtime config already exists, skipping runtime=spin-v2 2024/05/20 20:50:00 INFO shim configured shim=spin-v2 path=/etc/containerd/config.toml 2024/05/20 20:50:00 INFO nothing changed, nothing more to do
The
Completed
pod only gets scheduled in the first place, as the first one did not terminated successfully; even though the actual job (rewriting containerd config and removing the binary) is done. As a result, the second run of the job has nothing left todo.Description of Pod with Status
Unknown
kubectl describe po kwasm-worker-spin-v2-install-n82d9
Entire resource of Job (e.g. for recreation of the bug)
While the goal of installing/uninstalling the shim is achieved, this is not a desired behavior and desires for a solution.
The text was updated successfully, but these errors were encountered: