[Bugfix] Add int8 torch dtype for KVCache #15260

shen-shanshan · 2025-03-21T02:07:47Z

Some attention backend requires int8 kvcache dtype (e.g., quantization). It is used in initialization of CacheConfig:

if cache_config.cache_dtype == "auto":
    self.dtype = model_config.dtype
else:
    self.dtype = STR_DTYPE_TO_TORCH_DTYPE[cache_config.cache_dtype]

But there are no int8 dtype in STR_DTYPE_TO_TORCH_DTYPE:

STR_DTYPE_TO_TORCH_DTYPE = {
    "half": torch.half,
    "bfloat16": torch.bfloat16,
    "float": torch.float,
    "fp8": torch.uint8,
    "fp8_e4m3": torch.uint8,
    "fp8_e5m2": torch.uint8,
}

So, I think maybe it's better to add int8 into this STR_DTYPE_TO_TORCH_DTYPE.

Signed-off-by: shen-shanshan <467638484@qq.com>

github-actions · 2025-03-21T02:07:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

houseroad

Have we ever tested any thing with int8 KV cache?

Is adding an item to a map enough? I am wondering how int8 KV works here.

Isotr0py · 2025-03-21T07:28:14Z

Have we ever tested any thing with int8 KV cache?
Is adding an item to a map enough? I am wondering how int8 KV works here.

I think no quantization in main repo support int8 kv cache currently, but some OOT hardware like vllm-ascend indeed can support int8 kv_cache: https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/quantization/quant_config.py#L60-L96

shen-shanshan · 2025-03-21T07:30:55Z

Have we ever tested any thing with int8 KV cache?
Is adding an item to a map enough? I am wondering how int8 KV works here.

I think no quantization in main repo support int8 kv cache currently, but some OOT hardware like vllm-ascend indeed can support int8 kv_cache: https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/quantization/quant_config.py#L60-L96

Yes, thanks for your explaination~

Signed-off-by: shen-shanshan <467638484@qq.com>

Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: shen-shanshan <467638484@qq.com>

add new torch dtype for kv_cache

a02337e

Signed-off-by: shen-shanshan <467638484@qq.com>

Isotr0py approved these changes Mar 21, 2025

View reviewed changes

Isotr0py changed the title ~~[Bugfix] Add new torch dtype for KVCache~~ [Bugfix] Add int8 torch dtype for KVCache Mar 21, 2025

Isotr0py enabled auto-merge (squash) March 21, 2025 07:12

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 21, 2025

houseroad reviewed Mar 21, 2025

View reviewed changes

Isotr0py merged commit a989ca2 into vllm-project:main Mar 21, 2025
43 checks passed

shen-shanshan mentioned this pull request Mar 21, 2025

[Bugfix][Worker] Add Custom Cache Engine for NPU Worker to avoid patch vllm-project/vllm-ascend#356

Closed

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

698f488

Signed-off-by: shen-shanshan <467638484@qq.com>

lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

e7dabe8

Signed-off-by: shen-shanshan <467638484@qq.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

3c5cf68

Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[Bugfix] Add int8 torch dtype for KVCache (vllm-project#15260)

aac8b26

Signed-off-by: shen-shanshan <467638484@qq.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Add int8 torch dtype for KVCache #15260

[Bugfix] Add int8 torch dtype for KVCache #15260

shen-shanshan commented Mar 21, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 21, 2025

houseroad left a comment

Isotr0py commented Mar 21, 2025

shen-shanshan commented Mar 21, 2025 •

edited

Loading

[Bugfix] Add int8 torch dtype for KVCache #15260

[Bugfix] Add int8 torch dtype for KVCache #15260

Conversation

shen-shanshan commented Mar 21, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 21, 2025

houseroad left a comment

Choose a reason for hiding this comment

Isotr0py commented Mar 21, 2025

shen-shanshan commented Mar 21, 2025 • edited Loading

shen-shanshan commented Mar 21, 2025 •

edited by github-actions bot

Loading

shen-shanshan commented Mar 21, 2025 •

edited

Loading