POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

Jan 1, 2024ยท
Juntao Zhao
,
Borui Wan
,
Chuan Wu
,
Yanghua Peng
,
Haibin Lin
ยท 0 min read
Type
Publication
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming