-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TDX] Added basic documentation to enable TDX in ChatQnA #1212
base: main
Are you sure you want to change the base?
[TDX] Added basic documentation to enable TDX in ChatQnA #1212
Conversation
68149c6
to
e47902a
Compare
### Kubelet Configuration | ||
|
||
To run a complex and heavy application like OPEA, the cluster administrator must increase the kubelet timeout for container creation, otherwise the pod creation may fail due to timeout `Context deadline exceeded`. | ||
This is required because the container creation process can take a long time due to the size of pod images and the need to download the AI models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this timeout change generally required for any k8s deployment? If so should this be added to the main k8s readme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is generally required for all use cases where the Container creation takes long time. When TDX is involved, container creation time increases so much that it usually exceeds the default 2 minutes. It is described in k8s docs: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that just for peer pods or running CoCo on the host also often breaks 2 minutes?
> [!NOTE] | ||
> Running TDX-protected services requires the user to define the pod's resources request (cpu, memory). | ||
> | ||
> Due to lack of hotplugging feature in TDX, the assigned resources cannot be changed after the pod is scheduled and the resources will not be shared with any other pod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check hotplugging (TEE-specific? or kata-specific?)
> | ||
> After kubelet restart, some of the internal pods from `kube-system` namespace might be reloaded automatically. | ||
|
||
All kubelet configuration options can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
```bash | ||
POD_NAME=$(kubectl get pods | grep 'chatqna-tgi' | awk '{print $1}') | ||
kubectl get pod $POD_NAME -o jsonpath='{.spec.runtimeClassName}' | ||
``` | ||
|
||
In the output you should see: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just show that it is running
- added README_tdx.md - described steps to run ChatQnA using helm and GMC Signed-off-by: Jakub Ledworowski <[email protected]>
- Removed deployment option with helm - Added sample chatqna_tdx.yaml - Generalized description but left ChatQnA as an example Signed-off-by: Jakub Ledworowski <[email protected]>
Signed-off-by: Jakub Ledworowski <[email protected]>
Signed-off-by: Jakub Ledworowski <[email protected]>
Signed-off-by: Jakub Ledworowski <[email protected]>
12e4c94
to
608e869
Compare
Signed-off-by: Jakub Ledworowski <[email protected]>
Description
Confidential computing in AI in the cloud focuses on protecting sensitive data and computations from unauthorized access and tampering. It uses advanced security technologies, such as hardware-based isolation and encryption, to create secure environments where data and AI models can be processed safely. This ensures that even cloud service providers cannot access the data, providing a higher level of privacy and security. By leveraging confidential computing, organizations can confidently use AI in the cloud for tasks that involve sensitive information, such as healthcare data analysis or financial predictions, while complying with strict data protection regulations.
This change introduces the guide on protecting chosen microservices with Intel TDX technology:
README_tdx.md
chatqna_tdx.yaml
that has all microservices configured with TDX-protection and default settingsIssues
n/a
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
n/a
Tests
Manual tests with sample request enabling TDX in all ChatQnA services:
dataprep
,embedding
,llm
,redis
,reranking
,retriever
,tei
,teirerank
,tgi