Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LiteLLM router to FastAPI container #14

Merged
merged 4 commits into from
Jun 3, 2024
Merged

Conversation

petermuller
Copy link
Contributor

This set of changes adds a dependency on LiteLLM in our FastAPI container so that we may pass requests from the LISA Serve ALB directly to LiteLLM. This enables us to handle any form of OpenAI API spec implementation so long as LiteLLM also supports it.

Summary of changes:

  1. Add LiteLLM dependency
  2. Add /v2/serve route to the FastAPI container for the new implementation
  3. Update the schema file to embed the entire LiteLLM config into the LISA config yaml file
  4. Add LiteLLM config snippet to the example_config.yaml
  5. Add runtime logic for adding LISA-served models to the LiteLLM on container start

Testing:

  1. Deployed a personal stack using the TGI and TEI container models from the example config
  2. Validated that I could perform OpenAI spec operations on my backend ALB and received intelligible responses from the requests
curl -H 'Api-Key: myRedactedKey' -X GET https://myredacteddomain/v2/serve/models
curl -H 'Api-Key: myRedactedKey' -X POST https://myredacteddomain/v2/serve/chat/completions -d '{"model":"mistralai/Mistral-7B-Instruct-v0.2","messages":[{"role":"user","content":"You are a helpful AI assistant who responds to user requests in a concise and accurate manner. All of the following is a conversation between you and the user, and you respond helpfully and thoughtfully to the user. Your responses should be as short as possible."},{"role":"assistant","content":"Understood."},{"role":"user","content":"Suggest a programming language and a project to complete in it."}], "stream": true, "max_tokens": 1500}'

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@petermuller petermuller requested a review from KyleLilly May 29, 2024 01:43
@petermuller petermuller self-assigned this May 29, 2024
@petermuller
Copy link
Contributor Author

I'll add a new commit to fix the failing tests.

lib/schema.ts Show resolved Hide resolved
lib/schema.ts Show resolved Hide resolved
@@ -30,6 +31,9 @@
router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)])
router.include_router(embeddings.router, prefix="/v1", tags=["embeddings"], dependencies=[Depends(security)])
router.include_router(generation.router, prefix="/v1", tags=["generation"], dependencies=[Depends(security)])
router.include_router(
litellm_passthrough.router, prefix="/v2/serve", tags=["litellm_passthrough"], dependencies=[Depends(security)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is /v2/serve because we're going to move the API GW stuff into /v2/ as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. I'm also welcome to changing the name, but I wanted to avoid the situation of calling it "model" and having a url like the following when using it directly for the OpenAI list models call:

curl -X GET https://mydomain/v2/model/models

lib/serve/rest-api/src/requirements.txt Outdated Show resolved Hide resolved
lib/serve/rest-api/src/entrypoint.sh Show resolved Hide resolved
These changes address pul request feedback for adding comments for inter-file dependencies and
updates to the README file. This also modifies the LiteLLM configuration to use the user-provided
modelId if it exists, otherwise default back to the model name. This allows for users to use the
same model and weights, but using two different containers.
@petermuller petermuller requested a review from estohlmann May 31, 2024 18:53
@@ -30,6 +31,9 @@
router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to be marking the v1 endpoints as depreciated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not going to be a lot of time between them being deprecated and us releasing the next revision, so I think for now, we can just leave as-is

@petermuller petermuller merged commit 646e0fa into main Jun 3, 2024
2 checks passed
@petermuller petermuller deleted the feature/litellm-router branch June 3, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants