[question] Does llm-compressor support K/V cache quantization?

Thanks to llm-compressor for its help in model compression. I would like to ask some K/V cache quantization questions about llm-compressor:

- Does llm-compressor currently support K/V cache quantization? It seems that this is not supported currently. If yes, will llm-compressor support K/V cache quantization in the future?
- vLLM seems to support K/V cache quantization.  Is K/V cache quantization supported only in vLLM and not in llm-compressor?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[question] Does llm-compressor support K/V cache quantization? #1711

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[question] Does llm-compressor support K/V cache quantization? #1711

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions