-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LiteLLM router to FastAPI container #14
Conversation
Signed-off-by: Peter Muller <[email protected]>
I'll add a new commit to fix the failing tests. |
@@ -30,6 +31,9 @@ | |||
router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)]) | |||
router.include_router(embeddings.router, prefix="/v1", tags=["embeddings"], dependencies=[Depends(security)]) | |||
router.include_router(generation.router, prefix="/v1", tags=["generation"], dependencies=[Depends(security)]) | |||
router.include_router( | |||
litellm_passthrough.router, prefix="/v2/serve", tags=["litellm_passthrough"], dependencies=[Depends(security)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is /v2/serve
because we're going to move the API GW stuff into /v2/
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. I'm also welcome to changing the name, but I wanted to avoid the situation of calling it "model" and having a url like the following when using it directly for the OpenAI list models call:
curl -X GET https://mydomain/v2/model/models
These changes address pul request feedback for adding comments for inter-file dependencies and updates to the README file. This also modifies the LiteLLM configuration to use the user-provided modelId if it exists, otherwise default back to the model name. This allows for users to use the same model and weights, but using two different containers.
@@ -30,6 +31,9 @@ | |||
router.include_router(models.router, prefix="/v1", tags=["models"], dependencies=[Depends(security)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to be marking the v1 endpoints as depreciated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not going to be a lot of time between them being deprecated and us releasing the next revision, so I think for now, we can just leave as-is
This set of changes adds a dependency on LiteLLM in our FastAPI container so that we may pass requests from the LISA Serve ALB directly to LiteLLM. This enables us to handle any form of OpenAI API spec implementation so long as LiteLLM also supports it.
Summary of changes:
Testing:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.