Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization Jan 1, 2024ยท Juntao Zhao , Borui Wan , Yanghua Peng , Haibin Lin , Chuan Wu ยท 0 min read Cite Type Journal article Publication arXiv preprint arXiv:2403.01136 Last updated on Jan 1, 2024 โ Cdmpp: A device-model agnostic framework for latency prediction of tensor programs Jan 1, 2024 POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization Jan 1, 2024 โ