Thanks to llm-compressor for its help in model compression. I would like to ask some K/V cache quantization questions about llm-compressor:
- Does llm-compressor currently support K/V cache quantization? It seems that this is not supported currently. If yes, will llm-compressor support K/V cache quantization in the future?
- vLLM seems to support K/V cache quantization. Is K/V cache quantization supported only in vLLM and not in llm-compressor?