Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization Jan 1, 2024· Juntao Zhao , Borui Wan , Yanghua Peng , Haibin Lin , Chuan Wu · 0 min read Cite Last updated on Jan 1, 2024 ← Cdmpp: A device-model agnostic framework for latency prediction of tensor programs Jan 1, 2024 QSpec: Speculative Decoding with Complementary Quantization Schemes Jan 1, 2024 →