[V1] Refactor Structured Output for multiple backends #14694

russellb · 2025-03-12T18:17:27Z

This change does some refactoring of the V1 structured output
implementation to prepare for supporting multiple backends. This code is
already successfully in use in a branch to support a second backend. I
think it will be easier to review other backends if the refactoring goes
in first on its own.

Signed-off-by: Russell Bryant rbryant@redhat.com

github-actions · 2025-03-12T18:17:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon · 2025-03-13T01:11:19Z

@aarnphm Could you please take a look?

aarnphm

just a few comments wrt structuring, otherwise LGTM

vllm/v1/structured_output/__init__.py

aarnphm · 2025-03-13T01:40:42Z

vllm/v1/structured_output/__init__.py

-            vocab_size=self.vocab_size,
-            ctx=ctx,
-        )
+        assert self.backend is not None


we should try not to use assert in critical path (and I believe this is)

Given that -O and -OO will strip assert (ik that we aren't using it atm, but probably worth knowing)

well, it's something that should never happen and we'd want to know if it did because we know it'll break anyway. It also gives hints to mypy, which is often how I end up adding it.

This should be covered in a style guide somewhere so we have guidelines for the project.

vllm/v1/structured_output/backend_types.py

vllm/v1/structured_output/backend_xgrammar.py

WoosukKwon · 2025-03-14T00:43:59Z

@aarnphm @russellb Just wanted to double check: Is this PR ready for merge?

aarnphm · 2025-03-14T05:17:44Z

yes, but I don't think this would block 0.8.0.

Given that functionally it doesn't change anything.

vllm/v1/structured_output/__init__.py

russellb · 2025-03-14T15:11:38Z

please hold this while I fix an issue. I'll comment again when ready

russellb · 2025-03-14T16:03:28Z

This is good now -- I also hit a different bug along the way and fixed it in #14826

mergify · 2025-03-14T16:19:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

WoosukKwon · 2025-03-15T03:52:03Z

@russellb Just to double check: This is not a release blocker, is it?

aarnphm · 2025-03-15T03:55:47Z

This is not.

This change does some refactoring of the V1 structured output implementation to prepare for supporting multiple backends. This code is already successfully in use in a branch to support a second backend. I think it will be easier to review other backends if the refactoring goes in first on its own. Signed-off-by: Russell Bryant <rbryant@redhat.com>

- Ensure request-level choice matches server side config if specified. We don't support requesting something different than what was configured. - Fix handling when a backend is not specified in the request. Signed-off-by: Russell Bryant <rbryant@redhat.com>

vllm/v1/structured_output/backend_types.py

Signed-off-by: Russell Bryant <rbryant@redhat.com>

vllm/v1/structured_output/__init__.py

DarkLight1337

Otherwise LGTM

NickLucche

For sanity check I've also ran this on V1 TPU and it's not interfering on start up

russellb · 2025-03-18T18:21:37Z

For sanity check I've also ran this on V1 TPU and it's not interfering on start up

thank you for checking! much appreciated

Importing xgrammar appears to initialize the cuda context, which we don't want to do in the front-end process. It also means that the server can't be started with the (default) multiproc context mode of fork. I guess this is what LazyLoader is meant to help with, but it doesn't seem to be working as intended since vllm-project#14694 was merged. Signed-off-by: Nick Hill <nhill@redhat.com>

…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com>

…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com>

russellb requested a review from mgoin as a code owner March 12, 2025 18:17

mergify bot added the v1 label Mar 12, 2025

mgoin added the structured-output label Mar 12, 2025

aarnphm approved these changes Mar 13, 2025

View reviewed changes

aarnphm modified the milestone: v0.8.0 Mar 13, 2025

aarnphm approved these changes Mar 13, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2025

russellb force-pushed the v1-structured-output-multi-backend-refactor branch from f2e6222 to 2932546 Compare March 13, 2025 19:31

russellb mentioned this pull request Mar 13, 2025

[V1] guidance backend for structured output + auto fallback mode #14779

Merged

simon-mo added this to the v0.8.0 milestone Mar 14, 2025

aarnphm reviewed Mar 14, 2025

View reviewed changes

vllm/v1/structured_output/__init__.py Show resolved Hide resolved

russellb force-pushed the v1-structured-output-multi-backend-refactor branch from 2932546 to 5187ccf Compare March 14, 2025 16:01

russellb requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 14, 2025 16:01

mergify bot added the needs-rebase label Mar 14, 2025

russellb force-pushed the v1-structured-output-multi-backend-refactor branch from 5187ccf to e2ee14d Compare March 14, 2025 16:20

mergify bot removed the needs-rebase label Mar 14, 2025

WoosukKwon removed this from the v0.8.0 milestone Mar 15, 2025

russellb added 2 commits March 18, 2025 15:05

russellb force-pushed the v1-structured-output-multi-backend-refactor branch from e2ee14d to dc6dedf Compare March 18, 2025 15:07

aarnphm reviewed Mar 18, 2025

View reviewed changes

vllm/v1/structured_output/backend_types.py Show resolved Hide resolved

aarnphm approved these changes Mar 18, 2025

View reviewed changes

Add docstrings for structured output abstract classes

084c47b

Signed-off-by: Russell Bryant <rbryant@redhat.com>

DarkLight1337 reviewed Mar 18, 2025

View reviewed changes

vllm/v1/structured_output/__init__.py Show resolved Hide resolved

DarkLight1337 approved these changes Mar 18, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 18, 2025 16:23

NickLucche approved these changes Mar 18, 2025

View reviewed changes

DarkLight1337 merged commit 3a1e648 into vllm-project:main Mar 18, 2025
31 checks passed

njhill mentioned this pull request Mar 19, 2025

[BugFix] Lazily import XgrammarBackend to avoid early cuda init #15171

Merged

gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Apr 1, 2025

[V1] Refactor Structured Output for multiple backends (vllm-project#1…

b36e07b

…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] Refactor Structured Output for multiple backends (vllm-project#1…

1c5b4a8

…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[V1] Refactor Structured Output for multiple backends (vllm-project#1…

d15d849

…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com>

hmellor mentioned this pull request Apr 16, 2025

[V1] Set structured output backend to auto by default #15724

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Refactor Structured Output for multiple backends #14694

[V1] Refactor Structured Output for multiple backends #14694

russellb commented Mar 12, 2025

github-actions bot commented Mar 12, 2025

WoosukKwon commented Mar 13, 2025

aarnphm left a comment

aarnphm Mar 13, 2025

russellb Mar 13, 2025

WoosukKwon commented Mar 14, 2025

aarnphm commented Mar 14, 2025 •

edited

Loading

russellb commented Mar 14, 2025

russellb commented Mar 14, 2025

mergify bot commented Mar 14, 2025

WoosukKwon commented Mar 15, 2025

aarnphm commented Mar 15, 2025

DarkLight1337 left a comment

NickLucche left a comment

russellb commented Mar 18, 2025

[V1] Refactor Structured Output for multiple backends #14694

[V1] Refactor Structured Output for multiple backends #14694

Conversation

russellb commented Mar 12, 2025

github-actions bot commented Mar 12, 2025

WoosukKwon commented Mar 13, 2025

aarnphm left a comment

Choose a reason for hiding this comment

aarnphm Mar 13, 2025

Choose a reason for hiding this comment

russellb Mar 13, 2025

Choose a reason for hiding this comment

WoosukKwon commented Mar 14, 2025

aarnphm commented Mar 14, 2025 • edited Loading

russellb commented Mar 14, 2025

russellb commented Mar 14, 2025

mergify bot commented Mar 14, 2025

WoosukKwon commented Mar 15, 2025

aarnphm commented Mar 15, 2025

DarkLight1337 left a comment

Choose a reason for hiding this comment

NickLucche left a comment

Choose a reason for hiding this comment

russellb commented Mar 18, 2025

aarnphm commented Mar 14, 2025 •

edited

Loading