[Feature]: Implement Concurrent Partial Prefills In V1 Engine #14003

robertgshaw2-redhat · 2025-02-28T01:34:31Z

🚀 The feature, motivation and pitch

In V0, we support concurrent partial prefills to avoid TTFT latency with long requests. Implement it in V1

cc @WoosukKwon

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

plops655 · 2025-03-16T03:33:24Z

I can take this up.

ccw1996 · 2025-03-20T14:42:30Z

I'm interested in contributing this.

houseroad · 2025-03-25T22:16:29Z

Shall we mark this as done? Since I feel in the context of chunked prefill, probably doesn't make sense to add support for max_num_partial_prefills and max_long_partial_prefills. cc: @comaniac

comaniac · 2025-03-25T23:30:15Z

Shall we mark this as done? Since I feel in the context of chunked prefill, probably doesn't make sense to add support for max_num_partial_prefills and max_long_partial_prefills. cc: @comaniac

IIRC in v0 we still apply max_num_partial_prefills for example even when chunked prefill is enabled, but I'm not fully sure which one (max_num_partial_prefills or long_prefill_token_threshold has a higher priority). This is a topic worth to discuss. cc @joerunde @njhill

robertgshaw2-redhat added feature request New feature or request help wanted Extra attention is needed labels Feb 28, 2025

ywang96 added this to Onboarding Tasks Mar 7, 2025

zhuohan123 moved this to Todo in Onboarding Tasks Mar 7, 2025

houseroad mentioned this issue Mar 24, 2025

[V1] Support long_prefill_token_threshold in v1 scheduler #15419

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Implement Concurrent Partial Prefills In V1 Engine #14003

[Feature]: Implement Concurrent Partial Prefills In V1 Engine #14003

robertgshaw2-redhat commented Feb 28, 2025

plops655 commented Mar 16, 2025

ccw1996 commented Mar 20, 2025

houseroad commented Mar 25, 2025

comaniac commented Mar 25, 2025

[Feature]: Implement Concurrent Partial Prefills In V1 Engine #14003

[Feature]: Implement Concurrent Partial Prefills In V1 Engine #14003

Comments

robertgshaw2-redhat commented Feb 28, 2025

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

plops655 commented Mar 16, 2025

ccw1996 commented Mar 20, 2025

houseroad commented Mar 25, 2025

comaniac commented Mar 25, 2025