Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vLLM support to DocSum Helm chart #649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Dec 18, 2024

Description

This was split from Helm vLLM support added in #610. It adds vLLM support to DocSum Helm chart.

(Similarly to how it's already done for ChatQnA app + Agent component, there are tgi.enabled & vllm.enabled flags for selecting which LLM will be used.)

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

opea/llm-docsum-vllm:latest image is currently missing from CI & DockerHub registries:
opea-project/GenAIComps#961

(Although corresponding opea/llm-docsum-tgi:latest image for TGI, and opea/llm-vllm:latest vLLM text-generation images already exist.)

Tests

Manual testing with opea/llm-docsum-vllm:latest image built locally.

@eero-t eero-t marked this pull request as draft December 18, 2024 18:58
@eero-t
Copy link
Contributor Author

eero-t commented Dec 18, 2024

Setting as draft because the required image is still missing from DockerHub, and this needs retesting after currently pending DocSum changes for Comps & Examples repos have completed.

@eero-t
Copy link
Contributor Author

eero-t commented Dec 20, 2024

While CI "docsum, gaudi, ci-gaudi-vllm-values" test fails as expected, due to OPEA missing llm-docsum-vllm image...

There seems to be a bug in component unrelated to this PR, as also run "llm-uservice, xeon, ci-faqgen-values, common" CI test fails to a package missing from image:

[pod/llm-uservice20241218190439-5b9b7b79fd-r65l9/llm-uservice20241218190439]
...
   File "/home/user/comps/llms/faq-generation/tgi/langchain/llm.py", line 77, in stream_generator
     from langserve.serialization import WellKnownLCSerializer
   File "/home/user/.local/lib/python3.11/site-packages/langserve/__init__.py", line 8, in <module>
     from langserve.client import RemoteRunnable
   File "/home/user/.local/lib/python3.11/site-packages/langserve/client.py", line 24, in <module>
     from httpx._types import AuthTypes, CertTypes, CookieTypes, HeaderTypes, VerifyTypes
 ImportError: cannot import name 'VerifyTypes' from 'httpx._types' (/home/user/.local/lib/python3.11/site-packages/httpx/_types.py)

=> requirements.txt for llm-faqgen-tgi:latest image generation is not up to date in Comps repo?

@lianhao?

@eero-t
Copy link
Contributor Author

eero-t commented Dec 30, 2024

Rebased to main + dropped "draft" status, as the required OPEA image is now available in DockerHub!

@eero-t
Copy link
Contributor Author

eero-t commented Dec 30, 2024

CI still fails.

PR #659 fixes DocSum issues with updates in other repos, includes the same model ID workaround as this one, and passed CI => better to merge that first & rebase this?


"docsum, gaudi, ci-gaudi-vllm-values" fails because CI registry is out of date. Although required image has been at DockerHub for 4 days [1], fetching it still fails:
Normal BackOff 4m47s (x18 over 9m42s) kubelet Back-off pulling image "100.83.111.229:5000/opea/llm-docsum-vllm:latest"
[1] https://hub.docker.com/r/opea/llm-docsum-vllm/tags

"docsum, xeon, ci-values" fails to connection failure:

[pod/docsum20241230125518-5599c984c6-dt5fc/docsum20241230125518] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

"docsum, gaudi, ci-gaudi-tgi-values" fails to similar CI issue:

[pod/docsum20241230130614-d598c6674-qqhjf/docsum20241230130614] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t
Copy link
Contributor Author

eero-t commented Jan 2, 2025

Rebased to main to get CI tests passing, and dropped already merged fix. However, CI is still broken.

@daisy-ycguo CI "docsum, gaudi, ci-gaudi-vllm-values" still fails to CI registry not being up to date with DockerHub: https://hub.docker.com/r/opea/llm-docsum-vllm/tags
Failed to pull image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": rpc error: code = NotFound desc = failed to pull and unpack image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": failed to resolve reference "100.83.111.229:5000/opea/llm-docsum-vllm:latest": 100.83.111.229:5000/opea/llm-docsum-vllm:latest: not found

@lianhao, CI "docsum, gaudi, ci-gaudi-tgi-values" test fails now to test bug?
[pod/docsum20250102125737-llm-uservice-7d6f8d968f-v9b4w/docsum20250102125737] | huggingface_hub.errors.ValidationError: Input validation error: 'inputs' tokens + 'max_new_tokens' must be <= 4096. Given: 4095 'inputs' tokens and 17 'max_new_tokens'

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

tgi:
enabled: true
LLM_MODEL_ID: Intel/neural-chat-7b-v3-3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAX_INPUT_LENGTH and MAX_TOTAL_TOKENS settings for tgi is missing during the rebase. We need to set them otherwise, CI test will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants