-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment of Functions in vHive failing #539
Comments
hi @aditya2803, indeed vHive didn't have the firecracker-containerd folks' fix. We are about to merge a new version vHive/Firecracker snapshots with this #465. The code in that PR does have more recent firecracker-containerd binary, which is already tested but the docs are not updated. You can use that branch before the PR is merged |
Hi @ustiugov thanks for your suggestion. I cloned the PR branch and rebuilt the stack. However, the functions are still not getting deployed properly. Below is the error I get:
Output of
Note that I am using the #481 fix in my local code, as suggested by you. Another observation is that running the script with the 'stock-only' option results in proper deployment of the functions. It is with firecracker-containerd (default) that the issue comes up. Services in case of using stock-only:
|
@aditya2803 I cannot say much without lower-level logs in the firecracker setup (vHive, containerd, firecracker-containerd). The vhive CRI test worked in that branch. Try deploying a new cluster on a fresh node also, the YAML of the workloads do not suit the stock-only setup. You need to use YAML files in a conventional Knative format (you can take them from their website). |
@ustiugov Here are the logs: vhive.stdout logs
containerd.stderr logs
firecracker.stderr logs
|
the logs don't show any problems. what does |
Hi @ustiugov, the output of The output of
|
|
Yes, I presume that's because of no ready revisions. But I am not sure about why that it happening. Output of
Output of
Output of
|
after you started using the other branch, have you started with a new clean node or kept using the old one? |
I have been using the old node. But I have cleared all previous files (starting with an empty filesystem), and cloned the new branch, then started the process. |
I suggest using a fresh node |
Sure, I'll try that. That may take a few days however. Do you propose any way to clean up the current node in a way to use it for the fresh branch ? |
Hi @ustiugov, I tried using a fresh AWS ec2 instance running AMD and Ubuntu 20.04 for this. I used the new branch (#465) and also applied the change of (#481) locally. However, I ran into the exactly the same issue once again (same output for vhive.stdout
firecracker.stderr
Let me know if you observe something, or need more detailed logs. Note |
Hi @ustiugov, I managed to get the problem fixed by setting up on a new machine, and by enabling KVM, and ensuring that this script worked okay. The functions are now getting deployed properly now. However, I am having an issue with the istio set-up. This is similar to #475.
Output of
Output of
Output of
Output of
Istio Error during set-up
|
@aditya2803 ok, pre-requisites is a good catch. I don't quite get how functions can be deployed properly without istio installation being successful. Can you collect a complete log of the bash scripts that set up the cluster? |
Here it is:
Also, for my understanding, isn't Istio just used for serving the function endpoints ? Isn't the deployment of the functions independent of it ? I understand why function invocation won't work without it, but not clear about the deploying part. |
Debugging function deployment in a failed knative cluster is not a good strategy. Let us focus on Istio first. Please provide |
Sure. I tried running the cleanup and again starting the cluster a couple of times, so the logs are appended with that. Sorry about it.
|
I suggest to clean up the node again with Please crop the logs in future and attach full logs as files if necessary, otherwise the issue thread becomes unmanageably long. |
Hi @ustiugov, apologies for the full log files, I'll keep it in mind next time onward. After running the clean* script, and redeploying the cluster, istio suddenly was deployed perfectly :) I did try this a few hours earlier, but somehow did not work that time. Anyways, all pods are running okay now. Also, functions are getting deployed and invoked normally now, and I am getting the final output in the Thanks a lot for your support. Also, shall I submit a PR for the changes in the set-up guide, including the KVM check-up etc ? |
@aditya2803 glad to hear! we always welcome improvements from the community 👍 please close the Issue if it's resolved |
Description
I am trying to set up vHive on a single node cluster, and get it working by deploying and then invoking the functions, as described in the guide here. I am able to follow through the steps manually, and all the kubernetes pods are running as desired. However, when deploying functions using this link, I ran into some errors.
System Configuration
lscpu output:
cat /etc/os-release output:
Logs
vHive logs:
(I get the same issue as #476 initially. I then used the solution proposed on the ticket. Above logs are post application of the solution.)
Notes
There is a similar issue mentioned here. This seems to be a firecracker-containerd issue for non-Intel vendors, which they seem to have fixed later (as per the issue). I am not sure whether the firecracker-containerd binary used in vHive is the latest one. When I clone the latest firecracker-containerd repo, install it, and replace the /vhive/bin/firecracker-containerd binary with the one I built, the vHive error log gets reduced to:
I have also gone through #525 and have access to /dev/kvm. Also, I am running on a bare-metal x86_64 amd server running Ubuntu 20.04.
Expected Behavior
Functions should be deployed normally.
Steps to reproduce
Simply follow the start-up guide provided to set up an one-node cluster & then run the deployer.
The text was updated successfully, but these errors were encountered: