Skip to content

[V1] [Feature] Collective RPC #15444

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Merged
merged 13 commits into from
Mar 29, 2025
Merged

Conversation

wwl2755
Copy link
Contributor

@wwl2755 wwl2755 commented Mar 25, 2025

FIX: #15430 and #15349.

  1. Support LLM to directly use collective_rpc() mentioned in [Feature]: [V1] Collective RPC #15430 and [Bug]: LLM.collective_rpc is broken in v1 by default #15349.

  2. Use cloudpickle to make sure the local functions can still be serialized

wwl2755 added 3 commits March 25, 2025 05:14
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 25, 2025

I have verified it can pass the tests when tp-size=1. I may get access to a multi-GPU machine within several days. I can test it after that or someone else can help to verify it? Thanks!

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@robertgshaw2-redhat
Copy link
Collaborator

cc @russellb for security

@njhill njhill self-assigned this Mar 25, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @wwl2755.

The collective_rpc changes look good but we may want to look at the cloudpickle change more closely.

IIRC cloudpickle is generally slower than pickle and so using it as the default here might have an impact to performance of other aspects (I'm thinking about the multi-modal case in particular where this path is used to transfer large data to the engine). One possibility is to have it configured via an env var so that it can be turned on if needed.

Something like VLLM_PICKLE = pickle, cloudpickle, or disabled. And we may actually want to aim to make disabled the default for security reasons (though will need to modify some built-in things like the multi-modal case to work without it).

Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?

@russellb
Copy link
Member

I'm OK with the change as it is here from a security perspective.

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 25, 2025

Thanks for this @wwl2755.

The collective_rpc changes look good but we may want to look at the cloudpickle change more closely.

IIRC cloudpickle is generally slower than pickle and so using it as the default here might have an impact to performance of other aspects (I'm thinking about the multi-modal case in particular where this path is used to transfer large data to the engine). One possibility is to have it configured via an env var so that it can be turned on if needed.

Yes, cloudpickle will create a longer latency but it may support a larger range of object types. For example, in the tests/entrypoints/llm/test_collective_rpc.py, pickle cannot get the function local to the frontended process.

Something like VLLM_PICKLE = pickle, cloudpickle, or disabled. And we may actually want to aim to make disabled the default for security reasons (though will need to modify some built-in things like the multi-modal case to work without it).

Maybe we could make cloudpickle a fall-back option of pickle? I can make a follow-up PR taking care of this logic.

Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?

Sure. With the updated commit, VLLM_ENABLE_V1_MULTIPROCESSING=0 can also work.

@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 25, 2025

Could you also remove the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 from the tests in https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml, to make sure that they now pass without it?

I'm not so familiar with the CI/CD build. After I have removed the use of VLLM_ENABLE_V1_MULTIPROCESSING=0 (there are three places of them), how can run the yaml locally to make sure it work? Will python -m pytest do the checking?

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@mergify mergify bot added the ci/build label Mar 25, 2025
@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 25, 2025

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

@wwl2755 wwl2755 requested a review from njhill March 25, 2025 19:15
@russellb
Copy link
Member

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7

It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle:
GHSA-rh4j-5rhw-hr54

@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 25, 2025

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7

It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle: GHSA-rh4j-5rhw-hr54

I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method?

One naive solution I can think of now is that we restrict the method as string-only and maintain a mapping from function name to function on the worker side, but this limits the possible choice of collective_rpc (maybe it indeed should be limited?).

Do you have any suggestion?

@russellb
Copy link
Member

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7
It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle: GHSA-rh4j-5rhw-hr54

I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method?

One naive solution I can think of now is that we restrict the method as string-only and maintain a mapping from function name to function on the worker side, but this limits the possible choice of collective_rpc (maybe it indeed should be limited?).

Do you have any suggestion?

There's a tension here between the ultimate flexibility of pickle and providing a secure interface.

I'm working on some improvements here, but I don't want to give too much detail because the details are a bit sensitive. For now, don't worry about it.

@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 26, 2025

Where this will likely be a problem is if we want to support this over multiple hosts. At that point, the use of pickle (or cloudpickle) would most likely be a problem.

Are you also considering the latency effect? Otherwise we may come up with some our own logic to do the serialization stuff?

No, (cloud)pickle is generally not a safe serialization format to use across hosts. It is a common cause of security vulnerabilities that allow the execution of arbitrary code on a remote host. Here is the most recent example for vLLM: GHSA-x3m8-f7g5-qhm7
It's not just a concern with network communications. It's also a problem if used as a serialization format for sharing data. See this example that's a result of torch.load() using pickle: GHSA-rh4j-5rhw-hr54

I see. I guess it's similar as injection attack. Then, probably we should have a stricter format regulation to the function call or make our own serilization method?
One naive solution I can think of now is that we restrict the method as string-only and maintain a mapping from function name to function on the worker side, but this limits the possible choice of collective_rpc (maybe it indeed should be limited?).
Do you have any suggestion?

There's a tension here between the ultimate flexibility of pickle and providing a secure interface.

I'm working on some improvements here, but I don't want to give too much detail because the details are a bit sensitive. For now, don't worry about it.

I see. Thank you for commenting.

@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 26, 2025

Besides the (cloud)pickle issue, could you help to review this PR again? Thanks! @njhill

@wwl2755 wwl2755 requested a review from njhill March 26, 2025 19:07
wwl2755 added 2 commits March 26, 2025 19:08
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 26, 2025

Thanks @wwl2755, the changes look good apart from one minor comment inline.

Also could you revert the cloudpickle changes for now? Then I think we can merge this and deal with the cloudpickle thing as a follow-on (and @russellb is working on some related changes to that part anyhow).

The comments are addressed by the latest commits. PTAL. Thank you! @njhill

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wwl2755

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 26, 2025
@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 27, 2025

It seems weird, because I have tested on my local machine and it can passed all the tests from test_collective_rpc.py. Can you see any reason why the run fails in CI? @njhill
image

@DarkLight1337
Copy link
Member

Retrying

@njhill
Copy link
Member

njhill commented Mar 27, 2025

This one is a legit failure: https://buildkite.com/vllm/ci/builds/16314#0195d722-84b7-4833-a6e3-734e8e657969/206-571.

@wwl2755 perhaps you could add a third custom type in serial_utils.py like this:

def custom_enc_hook(obj: Any) -> Any:
    if isinstance(obj, torch.Tensor):
        # NOTE(rob): it is fastest to use numpy + pickle
        # when serializing torch tensors.
        # https://gist.github.com/tlrmchlsmth/8067f1b24a82b6e2f90450e7764fa103 # noqa: E501
        return msgpack.Ext(CUSTOM_TYPE_TENSOR, pickle.dumps(obj.numpy()))

    if isinstance(obj, FunctionType):
        return msgpack.Ext(CUSTOM_TYPE_CLOUDPICKLE, cloudpickle.dumps(obj))

    return msgpack.Ext(CUSTOM_TYPE_PICKLE, pickle.dumps(obj))


def custom_ext_hook(code: int, data: memoryview) -> Any:
    if code == CUSTOM_TYPE_TENSOR:
        return torch.from_numpy(pickle.loads(data))
    if code == CUSTOM_TYPE_PICKLE:
        return pickle.loads(data)
    if code == CUSTOM_TYPE_CLOUDPICKLE:
        return cloudpickle.loads(data)

    raise NotImplementedError(f"Extension type code {code} is not supported")

Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 27, 2025

I have retried the test script again in my local machine.

pytest -v -s tests/entrypoints/llm/test_collective_rpc.py will have all the tests passed, but cd tests && pytest -v -s entrypoints/llm/test_collective_rpc.py will fail in [ray-2] test.

After modifying based on your suggestion, all tests are passed. PTAL. Thanks! @njhill

Copy link

mergify bot commented Mar 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wwl2755.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 28, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
@mergify mergify bot removed the needs-rebase label Mar 28, 2025
@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 28, 2025

Hi, @njhill

I see that it got some CI failed in some kernel and TPU tests, which I don't think relevant to this PR. Are these failures required to be addressed before merging?

@njhill
Copy link
Member

njhill commented Mar 28, 2025

@wwl2755 could you merge in the latest main branch one more time? It should address those failures. Thanks!

@wwl2755
Copy link
Contributor Author

wwl2755 commented Mar 28, 2025

@wwl2755 could you merge in the latest main branch one more time? It should address those failures. Thanks!

Updated.

@vllm-bot vllm-bot merged commit 94744ba into vllm-project:main Mar 29, 2025
57 of 59 checks passed
hijkzzz pushed a commit to OpenRLHF/OpenRLHF that referenced this pull request Apr 2, 2025
Collective RPC is supported now in vLLM (nightly), so we
no longer need to always set `VLLM_ENABLE_V1_MULTIPROCESSING=0`
when we use vLLM V1. vllm-project/vllm#15444

But we still need to set that when we enable the full_determinism
mode, as vLLM does not guarantee the reproducibility of the results
by default, for the sake of performance. We need to turn off
multiprocessing to make the scheduling deterministic. This is
not needed and will be ignored for V0.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Apr 2, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: [V1] Collective RPC
6 participants