-
Notifications
You must be signed in to change notification settings - Fork 631
[Examples] vLLM example for SkyServe + Mixtral #2948
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
Conversation
llm/vllm/README.md
Outdated
@@ -126,3 +126,61 @@ curl http://$IP:8000/v1/chat/completions \ | |||
} | |||
} | |||
``` | |||
|
|||
## Serving Mixtral 8x7b model with vLLM and SkyServe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the mixtral 8x7b + vLLM and SkyServe in llm/mixtral. Should we just make the example above to be launchable with sky serve and have an additional link refering to the llm/mixtral?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! Changed. PTAL again 🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the example @cblmemo! Left several comments.
llm/vllm/service.yaml
Outdated
HF_TOKEN: <your-huggingface-token> # Change to your own huggingface token | ||
|
||
resources: | ||
accelerators: L4:1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use multiple accelerators for this and the original yaml files, so that a user without GCP credentials can use the yaml out-of-the-box, e.g.,{L4:1, A10G:1, A10:1, A100:1, A100-80GB:1}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks!
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Added example in #2922 to
llm/vllm
.Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh