-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[V1] Refactor Structured Output for multiple backends #14694
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
[V1] Refactor Structured Output for multiple backends #14694
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@aarnphm Could you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments wrt structuring, otherwise LGTM
vocab_size=self.vocab_size, | ||
ctx=ctx, | ||
) | ||
assert self.backend is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should try not to use assert in critical path (and I believe this is)
Given that -O
and -OO
will strip assert (ik that we aren't using it atm, but probably worth knowing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, it's something that should never happen and we'd want to know if it did because we know it'll break anyway. It also gives hints to mypy, which is often how I end up adding it.
This should be covered in a style guide somewhere so we have guidelines for the project.
f2e6222
to
2932546
Compare
yes, but I don't think this would block 0.8.0. Given that functionally it doesn't change anything. |
please hold this while I fix an issue. I'll comment again when ready |
2932546
to
5187ccf
Compare
This is good now -- I also hit a different bug along the way and fixed it in #14826 |
This pull request has merge conflicts that must be resolved before it can be |
5187ccf
to
e2ee14d
Compare
@russellb Just to double check: This is not a release blocker, is it? |
This is not. |
This change does some refactoring of the V1 structured output implementation to prepare for supporting multiple backends. This code is already successfully in use in a branch to support a second backend. I think it will be easier to review other backends if the refactoring goes in first on its own. Signed-off-by: Russell Bryant <rbryant@redhat.com>
- Ensure request-level choice matches server side config if specified. We don't support requesting something different than what was configured. - Fix handling when a backend is not specified in the request. Signed-off-by: Russell Bryant <rbryant@redhat.com>
e2ee14d
to
dc6dedf
Compare
Signed-off-by: Russell Bryant <rbryant@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sanity check I've also ran this on V1 TPU and it's not interfering on start up
thank you for checking! much appreciated |
Importing xgrammar appears to initialize the cuda context, which we don't want to do in the front-end process. It also means that the server can't be started with the (default) multiproc context mode of fork. I guess this is what LazyLoader is meant to help with, but it doesn't seem to be working as intended since vllm-project#14694 was merged. Signed-off-by: Nick Hill <nhill@redhat.com>
…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com>
…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
…4694) Signed-off-by: Russell Bryant <rbryant@redhat.com>
This change does some refactoring of the V1 structured output
implementation to prepare for supporting multiple backends. This code is
already successfully in use in a branch to support a second backend. I
think it will be easier to review other backends if the refactoring goes
in first on its own.
Signed-off-by: Russell Bryant rbryant@redhat.com