Add LiteLLM router to FastAPI container #14

petermuller · 2024-05-29T01:43:15Z

This set of changes adds a dependency on LiteLLM in our FastAPI container so that we may pass requests from the LISA Serve ALB directly to LiteLLM. This enables us to handle any form of OpenAI API spec implementation so long as LiteLLM also supports it.

Summary of changes:

Add LiteLLM dependency
Add /v2/serve route to the FastAPI container for the new implementation
Update the schema file to embed the entire LiteLLM config into the LISA config yaml file
Add LiteLLM config snippet to the example_config.yaml
Add runtime logic for adding LISA-served models to the LiteLLM on container start

Testing:

Deployed a personal stack using the TGI and TEI container models from the example config
Validated that I could perform OpenAI spec operations on my backend ALB and received intelligible responses from the requests

curl -H 'Api-Key: myRedactedKey' -X GET https://myredacteddomain/v2/serve/models
curl -H 'Api-Key: myRedactedKey' -X POST https://myredacteddomain/v2/serve/chat/completions -d '{"model":"mistralai/Mistral-7B-Instruct-v0.2","messages":[{"role":"user","content":"You are a helpful AI assistant who responds to user requests in a concise and accurate manner. All of the following is a conversation between you and the user, and you respond helpfully and thoughtfully to the user. Your responses should be as short as possible."},{"role":"assistant","content":"Understood."},{"role":"user","content":"Suggest a programming language and a project to complete in it."}], "stream": true, "max_tokens": 1500}'

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Peter Muller <[email protected]>

petermuller · 2024-05-29T01:50:08Z

I'll add a new commit to fix the failing tests.

lib/schema.ts

lib/serve/rest-api/src/api/endpoints/v2/litellm_passthrough.py

KyleLilly · 2024-05-30T14:50:21Z

lib/serve/rest-api/src/api/routes.py

@@ -30,6 +31,9 @@
 router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)])
 router.include_router(embeddings.router, prefix="/v1", tags=["embeddings"], dependencies=[Depends(security)])
 router.include_router(generation.router, prefix="/v1", tags=["generation"], dependencies=[Depends(security)])
+router.include_router(
+    litellm_passthrough.router, prefix="/v2/serve", tags=["litellm_passthrough"], dependencies=[Depends(security)]


I'm assuming this is /v2/serve because we're going to move the API GW stuff into /v2/ as well?

Yes, that is correct. I'm also welcome to changing the name, but I wanted to avoid the situation of calling it "model" and having a url like the following when using it directly for the OpenAI list models call:

curl -X GET https://mydomain/v2/model/models

lib/serve/rest-api/src/requirements.txt

lib/serve/rest-api/src/entrypoint.sh

These changes address pul request feedback for adding comments for inter-file dependencies and updates to the README file. This also modifies the LiteLLM configuration to use the user-provided modelId if it exists, otherwise default back to the model name. This allows for users to use the same model and weights, but using two different containers.

estohlmann · 2024-06-02T04:11:18Z

lib/serve/rest-api/src/api/routes.py

@@ -30,6 +31,9 @@
 router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)])


Are we going to be marking the v1 endpoints as depreciated?

There's not going to be a lot of time between them being deprecated and us releasing the next revision, so I think for now, we can just leave as-is

Add LiteLLM router to FastAPI container

958d078

Signed-off-by: Peter Muller <[email protected]>

petermuller requested a review from KyleLilly May 29, 2024 01:43

petermuller self-assigned this May 29, 2024

Add LiteLLM config to test config.yaml

ed42a82

KyleLilly reviewed May 30, 2024

View reviewed changes

petermuller mentioned this pull request May 30, 2024

v2.0 prep: make ecsModels optional #15

Closed

petermuller added 2 commits May 31, 2024 00:18

Add dependency missed when pinning versions

69c440a

petermuller requested a review from estohlmann May 31, 2024 18:53

estohlmann reviewed Jun 2, 2024

View reviewed changes

estohlmann approved these changes Jun 2, 2024

View reviewed changes

petermuller merged commit 646e0fa into main Jun 3, 2024
2 checks passed

petermuller deleted the feature/litellm-router branch June 3, 2024 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LiteLLM router to FastAPI container #14

Add LiteLLM router to FastAPI container #14

petermuller commented May 29, 2024

petermuller commented May 29, 2024

KyleLilly May 30, 2024

petermuller May 30, 2024

estohlmann Jun 2, 2024

petermuller Jun 3, 2024

		@@ -30,6 +31,9 @@
		router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)])

Add LiteLLM router to FastAPI container #14

Add LiteLLM router to FastAPI container #14

Conversation

petermuller commented May 29, 2024

petermuller commented May 29, 2024

KyleLilly May 30, 2024

Choose a reason for hiding this comment

petermuller May 30, 2024

Choose a reason for hiding this comment

estohlmann Jun 2, 2024

Choose a reason for hiding this comment

petermuller Jun 3, 2024

Choose a reason for hiding this comment