Cdmpp: A device-model agnostic framework for latency prediction of tensor programs Jan 1, 2024· Hanpeng Hu , Junwei Su , Juntao Zhao , Yanghua Peng , Yibo Zhu , Haibin Lin , Chuan Wu · 0 min read Cite Last updated on Jan 1, 2024 ← SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization Jan 1, 2025 Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization Jan 1, 2024 →