POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization Jan 1, 2024ยท Juntao Zhao , Borui Wan , Chuan Wu , Yanghua Peng , Haibin Lin ยท 0 min read Cite Type Conference paper Publication Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming Last updated on Jan 1, 2024 โ Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization Jan 1, 2024 QSpec: Speculative Decoding with Complementary Quantization Schemes Jan 1, 2024 โ