QSpec: Speculative Decoding with Complementary Quantization Schemes

Jan 1, 2024·

Juntao Zhao

,

Wenhao Lu

,

Sheng Wang

,

Lingpeng Kong

,

Chuan Wu

· 0 min read

PDF

Last updated on Jan 1, 2024

← Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization Jan 1, 2024

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices Jan 1, 2024 →