-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
737cca5
to
2ed744d
Compare
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. So the failed unttests in the llama-stack are expected?
llama-stack implementation hasn't passed all of them yet. it does pass 1 more test than vllm now. multi-turn tool calling is a bit flaky and failing for many llama4 inference providers too. @wukaixingxp and team are still actively investigating. |
…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>
…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Yang Wang <elainewy@meta.com>
…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>
Summary
Fixing tool calling related chat templates for llama4. LLaMa4 followed similar pythonic format. Co-authored with @wukaixingxp.
Tool call messages and tool response messages should end with <|eom|>. <|python_start|> and <|python_end|> are not very consistent in the output with current checkpoint. Some details in: meta-llama/llama-stack#1886.
Unit tests
Integration tests
with llama-stack
server:
llama-stack client