Skip to content

[V1] Refactor Structured Output for multiple backends #14694

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Conversation

russellb
Copy link
Member

This change does some refactoring of the V1 structured output
implementation to prepare for supporting multiple backends. This code is
already successfully in use in a branch to support a second backend. I
think it will be easier to review other backends if the refactoring goes
in first on its own.

Signed-off-by: Russell Bryant rbryant@redhat.com

@russellb russellb requested a review from mgoin as a code owner March 12, 2025 18:17
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@WoosukKwon
Copy link
Collaborator

@aarnphm Could you please take a look?

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few comments wrt structuring, otherwise LGTM

vocab_size=self.vocab_size,
ctx=ctx,
)
assert self.backend is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should try not to use assert in critical path (and I believe this is)

Given that -O and -OO will strip assert (ik that we aren't using it atm, but probably worth knowing)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it's something that should never happen and we'd want to know if it did because we know it'll break anyway. It also gives hints to mypy, which is often how I end up adding it.

This should be covered in a style guide somewhere so we have guidelines for the project.

@aarnphm aarnphm modified the milestone: v0.8.0 Mar 13, 2025
@WoosukKwon WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2025
@russellb russellb force-pushed the v1-structured-output-multi-backend-refactor branch from f2e6222 to 2932546 Compare March 13, 2025 19:31
@WoosukKwon
Copy link
Collaborator

@aarnphm @russellb Just wanted to double check: Is this PR ready for merge?

@simon-mo simon-mo added this to the v0.8.0 milestone Mar 14, 2025
@aarnphm
Copy link
Collaborator

aarnphm commented Mar 14, 2025

yes, but I don't think this would block 0.8.0.

Given that functionally it doesn't change anything.

@russellb
Copy link
Member Author

please hold this while I fix an issue. I'll comment again when ready

@russellb russellb force-pushed the v1-structured-output-multi-backend-refactor branch from 2932546 to 5187ccf Compare March 14, 2025 16:01
@russellb
Copy link
Member Author

This is good now -- I also hit a different bug along the way and fixed it in #14826

Copy link

mergify bot commented Mar 14, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @russellb.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 14, 2025
@russellb russellb force-pushed the v1-structured-output-multi-backend-refactor branch from 5187ccf to e2ee14d Compare March 14, 2025 16:20
@mergify mergify bot removed the needs-rebase label Mar 14, 2025
@WoosukKwon
Copy link
Collaborator

@russellb Just to double check: This is not a release blocker, is it?

@aarnphm
Copy link
Collaborator

aarnphm commented Mar 15, 2025

This is not.

@WoosukKwon WoosukKwon removed this from the v0.8.0 milestone Mar 15, 2025
This change does some refactoring of the V1 structured output
implementation to prepare for supporting multiple backends. This code is
already successfully in use in a branch to support a second backend. I
think it will be easier to review other backends if the refactoring goes
in first on its own.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
- Ensure request-level choice matches server side config if specified.
  We don't support requesting something different than what was
  configured.

- Fix handling when a backend is not specified in the request.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
@russellb russellb force-pushed the v1-structured-output-multi-backend-refactor branch from e2ee14d to dc6dedf Compare March 18, 2025 15:07
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 18, 2025 16:23
Copy link
Contributor

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sanity check I've also ran this on V1 TPU and it's not interfering on start up

@russellb
Copy link
Member Author

For sanity check I've also ran this on V1 TPU and it's not interfering on start up

thank you for checking! much appreciated

@DarkLight1337 DarkLight1337 merged commit 3a1e648 into vllm-project:main Mar 18, 2025
31 checks passed
njhill added a commit to njhill/vllm that referenced this pull request Mar 19, 2025
Importing xgrammar appears to initialize the cuda context, which we don't want to do in the front-end process. It also means that the server can't be started with the (default) multiproc context mode of fork.

I guess this is what LazyLoader is meant to help with, but it doesn't seem to be working as intended since vllm-project#14694 was merged.

Signed-off-by: Nick Hill <nhill@redhat.com>
gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Apr 1, 2025
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…4694)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed structured-output v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants