[Bug]: MLA Warnings when using FP8 KV cache in v0.7.1 #12680

Syst3m1cAn0maly · 2025-02-03T07:03:00Z

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

Model Input Dumps

No response

🐛 Describe the bug

Since v0.7.1, there are a lot of warnings when using fp8 KV cache on models quantized with llm-compressor :

WARNING 02-02 22:55:17 config.py:991] compressed-tensors MLA support requires fp8 activations and weights in group 'group_0', but got activations type 'float' and weights type 'float'.
WARNING 02-02 22:55:17 config.py:991] Full config: {'config_groups': {'group_0': {'input_activations': {'actorder': None, 'block_structure': None, 'dynamic': False, 'group_size': None, 'num_bits': 8, 'observer': 'minmax', 'observer_kwargs': {}, 'strategy': 'tensor', 'symmetric': True, 'type': 'float'}, 'output_activations': None, 'targets': ['Linear'], 'weights': {'actorder': None, 'block_structure': None, 'dynamic': False, 'group_size': None, 'num_bits': 8, 'observer': 'minmax', 'observer_kwargs': {}, 'strategy': 'tensor', 'symmetric': True, 'type': 'float'}}}, 'format': 'float-quantized', 'global_compression_ratio': 1.462046196596282, 'ignore': ['lm_head'], 'kv_cache_scheme': {'actorder': None, 'block_structure': None, 'dynamic': False, 'group_size': None, 'num_bits': 8, 'observer': 'minmax', 'observer_kwargs': {}, 'strategy': 'tensor', 'symmetric': True, 'type': 'float'}, 'quant_method': 'compressed-tensors', 'quantization_status': 'compressed'}

it has been introduced with this PR :
[Attention] Deepseek v3 MLA support with FP8 compute #12601

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

LucasWilkinson · 2025-02-03T22:06:17Z

should be fixed by: #12704

Syst3m1cAn0maly added the bug Something isn't working label Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: MLA Warnings when using FP8 KV cache in v0.7.1 #12680

[Bug]: MLA Warnings when using FP8 KV cache in v0.7.1 #12680

Syst3m1cAn0maly commented Feb 3, 2025

LucasWilkinson commented Feb 3, 2025

[Bug]: MLA Warnings when using FP8 KV cache in v0.7.1 #12680

[Bug]: MLA Warnings when using FP8 KV cache in v0.7.1 #12680

Comments

Syst3m1cAn0maly commented Feb 3, 2025

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

LucasWilkinson commented Feb 3, 2025