[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463

yeqcharlotte · 2025-04-11T08:42:35Z

Summary

Fixing tool calling related chat templates for llama4. LLaMa4 followed similar pythonic format. Co-authored with @wukaixingxp.

Tool call messages and tool response messages should end with <|eom|>. <|python_start|> and <|python_end|> are not very consistent in the output with current checkpoint. Some details in: meta-llama/llama-stack#1886.

Unit tests

pytest -s -vv tests/tool_use --models llama4 --extended
...
======================================================== 149 passed, 50 skipped in 344.22s (0:05:44) =========================================================

Integration tests

with llama-stack
server:

export PORT=8081; export LLAMA_DIR=~/local/checkpoints/Llama-4-Scout-17B-16E-Instruct; VLLM_DEBUG_LOG_API_SERVER_RESPONSE=1 VLLM_TRACE_FUNCTION=1 VLLM_USE_V1=1 SAFETENSORS_FAST_GPU=1 wp vllm serve $LLAMA_DIR -tp 8 --host :: --port $PORT --served-model-name default --max-model-len 32768  --enable-auto-tool-choice --tool-call-parser pythonic --chat-template examples/tool_chat_template_llama4_pythonic.jinja

llama-stack client

export INFERENCE_MODEL=default; export MODEL=default; export VLLM_URL=http://localhost:8081/v1; export VLLM_MAX_TOKENS=20000; pytest -s -v tests/integration/inference/test_text_inference.py --text-model $MODEL --stack-config remote-vllm --vision-model=$MODEL

FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=default-inference:chat_completion:multi_turn_tool_calling_03] - assert '123' in "<|python_start|>addproduct(name='widget', price=19.99, instock=true, tags=['new', 'sale'])<|python_end|>"
FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=default-inference:chat_completion:multi_turn_tool_calling_04] - assert 'no' in '<|python_start|>[get_event(date="2025-03-03", time="10:00")]<|python_end|>'
FAILED tests/integration/inference/test_text_inference.py::test_text_chat_completion_with_multi_turn_tool_calling[txt=default-inference:chat_completion:multi_turn_tool_calling_05] - AssertionError: assert '1000' in '<|python_start|>[getmonthlyexpensesummary(month=1, year=2025)]<|python_end|>'
================================================================================================ 3 failed, 20 passed, 3 warnings in 53.22s ================================================================================================

github-actions · 2025-04-11T08:42:47Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

Summary: Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

yeqcharlotte · 2025-04-11T17:22:16Z

cc: @maxdebayser @K-Mistele @robertgshaw2-redhat @DarkLight1337

houseroad

Looks good. So the failed unttests in the llama-stack are expected?

yeqcharlotte · 2025-04-11T19:14:55Z

Looks good. So the failed unttests in the llama-stack are expected?

llama-stack implementation hasn't passed all of them yet. it does pass 1 more test than vllm now. multi-turn tool calling is a bit flaky and failing for many llama4 inference providers too. @wukaixingxp and team are still actively investigating.

…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>

…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Yang Wang <elainewy@meta.com>

…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>

mergify bot added documentation Improvements or additions to documentation frontend labels Apr 11, 2025

yeqcharlotte marked this pull request as ready for review April 11, 2025 09:07

yeqcharlotte and others added 10 commits April 11, 2025 02:08

raw hf chat template

c0d4a46

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

init template

97ce423

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

add llama4 parser

e1b75e9

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

debug

cfa0534

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

pythonic template

be1a3c8

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

Add working template

e674d03

Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

remove json parser

28ec8b9

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

add l4 tool call unit tests

922a939

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

simplify

d94ad04

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

cleanup

2ed744d

Summary: Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

yeqcharlotte force-pushed the l4_tool_parser branch from 737cca5 to 2ed744d Compare April 11, 2025 09:09

slight system prompt changes to allow multi-turn

4fcf68a

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

yeqcharlotte mentioned this pull request Apr 11, 2025

[Usage]: Llama4 tool parser #16214

Closed

1 task

yeqcharlotte requested review from DarkLight1337 and robertgshaw2-redhat April 11, 2025 17:22

houseroad reviewed Apr 11, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 11, 2025

houseroad approved these changes Apr 11, 2025

View reviewed changes

DarkLight1337 merged commit 16eda8c into vllm-project:main Apr 11, 2025
62 checks passed

houseroad mentioned this pull request Apr 15, 2025

[Feature]: Llama4 Support Enhancement #16114

Open

19 tasks

Chenyaaang pushed a commit to Chenyaaang/vllm that referenced this pull request Apr 16, 2025

[Frontend] Added chat templates for LLaMa4 pythonic tool calling (vll…

9d70ba5

…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>

maxdebayser mentioned this pull request Apr 18, 2025

[New Model]: Snowflake Arctic Embed (Family) #16649

Merged

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[Frontend] Added chat templates for LLaMa4 pythonic tool calling (vll…

0a9eb53

…m-project#16463) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Kai Wu <kaiwu@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463

[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463

yeqcharlotte commented Apr 11, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Apr 11, 2025

yeqcharlotte commented Apr 11, 2025

houseroad left a comment

yeqcharlotte commented Apr 11, 2025 •

edited

Loading

[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463

[Frontend] Added chat templates for LLaMa4 pythonic tool calling #16463

Conversation

yeqcharlotte commented Apr 11, 2025 • edited by github-actions bot Loading

Summary

Unit tests

Integration tests

github-actions bot commented Apr 11, 2025

yeqcharlotte commented Apr 11, 2025

houseroad left a comment

Choose a reason for hiding this comment

yeqcharlotte commented Apr 11, 2025 • edited Loading

yeqcharlotte commented Apr 11, 2025 •

edited by github-actions bot

Loading

yeqcharlotte commented Apr 11, 2025 •

edited

Loading