[V1] [Feature] Collective RPC #15444

wwl2755 · 2025-03-25T06:32:06Z

Support LLM to directly use collective_rpc() mentioned in [Feature]: [V1] Collective RPC #15430 and [Bug]: LLM.collective_rpc is broken in v1 by default #15349.
Use cloudpickle to make sure the local functions can still be serialized

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

github-actions · 2025-03-25T06:32:15Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

wwl2755 · 2025-03-25T06:38:11Z

I have verified it can pass the tests when tp-size=1. I may get access to a multi-GPU machine within several days. I can test it after that or someone else can help to verify it? Thanks!

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

robertgshaw2-redhat · 2025-03-25T13:14:41Z

cc @russellb for security

njhill

Thanks for this @wwl2755.

The collective_rpc changes look good but we may want to look at the cloudpickle change more closely.

IIRC cloudpickle is generally slower than pickle and so using it as the default here might have an impact to performance of other aspects (I'm thinking about the multi-modal case in particular where this path is used to transfer large data to the engine). One possibility is to have it configured via an env var so that it can be turned on if needed.

Something like VLLM_PICKLE = pickle, cloudpickle, or disabled. And we may actually want to aim to make disabled the default for security reasons (though will need to modify some built-in things like the multi-modal case to work without it).

Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?

russellb · 2025-03-25T18:18:22Z

I'm OK with the change as it is here from a security perspective.

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 · 2025-03-25T18:50:43Z

Thanks for this @wwl2755.

The collective_rpc changes look good but we may want to look at the cloudpickle change more closely.

IIRC cloudpickle is generally slower than pickle and so using it as the default here might have an impact to performance of other aspects (I'm thinking about the multi-modal case in particular where this path is used to transfer large data to the engine). One possibility is to have it configured via an env var so that it can be turned on if needed.

Yes, cloudpickle will create a longer latency but it may support a larger range of object types. For example, in the tests/entrypoints/llm/test_collective_rpc.py, pickle cannot get the function local to the frontended process.

Something like VLLM_PICKLE = pickle, cloudpickle, or disabled. And we may actually want to aim to make disabled the default for security reasons (though will need to modify some built-in things like the multi-modal case to work without it).

Maybe we could make cloudpickle a fall-back option of pickle? I can make a follow-up PR taking care of this logic.

Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?

Sure. With the updated commit, VLLM_ENABLE_V1_MULTIPROCESSING=0 can also work.

wwl2755 · 2025-03-25T19:04:25Z

Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?

I'm not so familiar with the CI/CD build. After I have removed the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 (there are three places of them), how can run the yaml locally to make sure it work? Will python -m pytest do the checking?

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 · 2025-03-25T19:12:40Z

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

russellb · 2025-03-25T20:05:34Z

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7

It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle:
GHSA-rh4j-5rhw-hr54

wwl2755 · 2025-03-25T20:21:29Z

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7

It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle: GHSA-rh4j-5rhw-hr54

I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method?

One naive solution I can think of now is that we restrict the method as string-only and maintain a mapping from function name to function on the worker side, but this limits the possible choice of collective_rpc (maybe it indeed should be limited?).

Do you have any suggestion?

russellb · 2025-03-26T14:59:25Z

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7
It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle: GHSA-rh4j-5rhw-hr54

I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method?

One naive solution I can think of now is that we restrict the method as string-only and maintain a mapping from function name to function on the worker side, but this limits the possible choice of collective_rpc (maybe it indeed should be limited?).

Do you have any suggestion?

There's a tension here between the ultimate flexibility of pickle and providing a secure interface.

I'm working on some improvements here, but I don't want to give too much detail because the details are a bit sensitive. For now, don't worry about it.

wwl2755 · 2025-03-26T18:02:37Z

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7
It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle: GHSA-rh4j-5rhw-hr54

I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method?
One naive solution I can think of now is that we restrict the method as string-only and maintain a mapping from function name to function on the worker side, but this limits the possible choice of collective_rpc (maybe it indeed should be limited?).
Do you have any suggestion?

There's a tension here between the ultimate flexibility of pickle and providing a secure interface.

I'm working on some improvements here, but I don't want to give too much detail because the details are a bit sensitive. For now, don't worry about it.

I see. Thank you for commenting.

wwl2755 · 2025-03-26T18:04:57Z

Besides the (cloud)pickle issue, could you help to review this PR again? Thanks! @njhill

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 · 2025-03-26T19:13:20Z

Thanks @wwl2755, the changes look good apart from one minor comment inline.

Also could you revert the cloudpickle changes for now? Then I think we can merge this and deal with the cloudpickle thing as a follow-on (and @russellb is working on some related changes to that part anyhow).

The comments are addressed by the latest commits. PTAL. Thank you! @njhill

njhill

Thanks @wwl2755

wwl2755 · 2025-03-27T01:31:30Z

It seems weird, because I have tested on my local machine and it can passed all the tests from test_collective_rpc.py. Can you see any reason why the run fails in CI? @njhill

DarkLight1337 · 2025-03-27T10:25:25Z

Retrying

njhill · 2025-03-27T18:59:03Z

This one is a legit failure: https://buildkite.com/vllm/ci/builds/16314#0195d722-84b7-4833-a6e3-734e8e657969/206-571.

@wwl2755 perhaps you could add a third custom type in serial_utils.py like this:

def custom_enc_hook(obj: Any) -> Any:
    if isinstance(obj, torch.Tensor):
        # NOTE(rob): it is fastest to use numpy + pickle
        # when serializing torch tensors.
        # https://gist.github.com/tlrmchlsmth/8067f1b24a82b6e2f90450e7764fa103 # noqa: E501
        return msgpack.Ext(CUSTOM_TYPE_TENSOR, pickle.dumps(obj.numpy()))

    if isinstance(obj, FunctionType):
        return msgpack.Ext(CUSTOM_TYPE_CLOUDPICKLE, cloudpickle.dumps(obj))

    return msgpack.Ext(CUSTOM_TYPE_PICKLE, pickle.dumps(obj))


def custom_ext_hook(code: int, data: memoryview) -> Any:
    if code == CUSTOM_TYPE_TENSOR:
        return torch.from_numpy(pickle.loads(data))
    if code == CUSTOM_TYPE_PICKLE:
        return pickle.loads(data)
    if code == CUSTOM_TYPE_CLOUDPICKLE:
        return cloudpickle.loads(data)

    raise NotImplementedError(f"Extension type code {code} is not supported")

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 · 2025-03-27T19:35:30Z

I have retried the test script again in my local machine.

pytest -v -s tests/entrypoints/llm/test_collective_rpc.py will have all the tests passed, but cd tests && pytest -v -s entrypoints/llm/test_collective_rpc.py will fail in [ray-2] test.

After modifying based on your suggestion, all tests are passed. PTAL. Thanks! @njhill

mergify · 2025-03-28T00:17:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 · 2025-03-28T16:31:27Z

Hi, @njhill

I see that it got some CI failed in some kernel and TPU tests, which I don't think relevant to this PR. Are these failures required to be addressed before merging?

njhill · 2025-03-28T17:14:39Z

@wwl2755 could you merge in the latest main branch one more time? It should address those failures. Thanks!

wwl2755 · 2025-03-28T17:18:27Z

@wwl2755 could you merge in the latest main branch one more time? It should address those failures. Thanks!

Updated.

Collective RPC is supported now in vLLM (nightly), so we no longer need to always set `VLLM_ENABLE_V1_MULTIPROCESSING=0` when we use vLLM V1. vllm-project/vllm#15444 But we still need to set that when we enable the full_determinism mode, as vLLM does not guarantee the reproducibility of the results by default, for the sake of performance. We need to turn off multiprocessing to make the scheduling deterministic. This is not needed and will be ignored for V0. Signed-off-by: Hollow Man <hollowman@opensuse.org>

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 added 3 commits March 25, 2025 05:14

support llm.collective_rpc

3ff2ecf

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

use cloudpickle to serialize local function

defb58e

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

revert change in test script

9c0db55

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

wwl2755 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners March 25, 2025 06:32

mergify bot added frontend v1 labels Mar 25, 2025

make it also compatible with V0

ab8c8a9

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

njhill self-assigned this Mar 25, 2025

njhill reviewed Mar 25, 2025

View reviewed changes

support V1 without multiprocessing

864ab62

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

remove multiprocessing in buildkite and support async EngineCoreClient

958236d

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify bot added the ci/build label Mar 25, 2025

wwl2755 requested a review from njhill March 25, 2025 19:15

wwl2755 requested a review from njhill March 26, 2025 19:07

wwl2755 added 2 commits March 26, 2025 19:08

clean comments

506fdc9

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

clean comments

05c51ce

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

njhill approved these changes Mar 26, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 26, 2025

still use cloudpickle for function call

cfde838

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify bot added the needs-rebase label Mar 28, 2025

Merge branch 'main' of github.com:wwl2755/vllm into collective_rpc

058b49c

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

mergify bot removed the needs-rebase label Mar 28, 2025

Merge branch 'main' of github.com:wwl2755/vllm into collective_rpc

ab0b795

vllm-bot merged commit 94744ba into vllm-project:main Mar 29, 2025
57 of 59 checks passed

wwl2755 mentioned this pull request Mar 29, 2025

[Bug]: LLM.collective_rpc is broken in v1 by default #15349

Closed

1 task

HollowMan6 mentioned this pull request Apr 1, 2025

Fix full determinism mode when using vLLM V1 OpenRLHF/OpenRLHF#932

Merged

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Apr 2, 2025

[V1] [Feature] Collective RPC (vllm-project#15444)

d655491

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

[V1] [Feature] Collective RPC (vllm-project#15444)

764ce4d

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] [Feature] Collective RPC (vllm-project#15444)

039e288

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[V1] [Feature] Collective RPC (vllm-project#15444)

431a617

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>

22quinn mentioned this pull request Apr 13, 2025

[Bug]: [RLHF] Weights update broken with V1 multiprocessing #16434

Closed

1 task

youkaichao mentioned this pull request Apr 13, 2025

[Feature]: Expose option to load new model weights from disk #12774

Open

1 task

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] [Feature] Collective RPC #15444

[V1] [Feature] Collective RPC #15444

wwl2755 commented Mar 25, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

robertgshaw2-redhat commented Mar 25, 2025

njhill left a comment

russellb commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

russellb commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

russellb commented Mar 26, 2025

wwl2755 commented Mar 26, 2025

wwl2755 commented Mar 26, 2025 •

edited

Loading

wwl2755 commented Mar 26, 2025

njhill left a comment

wwl2755 commented Mar 27, 2025

DarkLight1337 commented Mar 27, 2025

njhill commented Mar 27, 2025

wwl2755 commented Mar 27, 2025 •

edited

Loading

mergify bot commented Mar 28, 2025

wwl2755 commented Mar 28, 2025

njhill commented Mar 28, 2025

wwl2755 commented Mar 28, 2025

[V1] [Feature] Collective RPC #15444

[V1] [Feature] Collective RPC #15444

Conversation

wwl2755 commented Mar 25, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

robertgshaw2-redhat commented Mar 25, 2025

njhill left a comment

Choose a reason for hiding this comment

russellb commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

russellb commented Mar 25, 2025

wwl2755 commented Mar 25, 2025

russellb commented Mar 26, 2025

wwl2755 commented Mar 26, 2025

wwl2755 commented Mar 26, 2025 • edited Loading

wwl2755 commented Mar 26, 2025

njhill left a comment

Choose a reason for hiding this comment

wwl2755 commented Mar 27, 2025

DarkLight1337 commented Mar 27, 2025

njhill commented Mar 27, 2025

wwl2755 commented Mar 27, 2025 • edited Loading

mergify bot commented Mar 28, 2025

wwl2755 commented Mar 28, 2025

njhill commented Mar 28, 2025

wwl2755 commented Mar 28, 2025

wwl2755 commented Mar 25, 2025 •

edited by github-actions bot

Loading

wwl2755 commented Mar 26, 2025 •

edited

Loading

wwl2755 commented Mar 27, 2025 •

edited

Loading