Publications

(2025). OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training. arXiv preprint arXiv:2504.09844.
(2025). Efficient LLM Serving on Hybrid Real-time and Best-effort Requests. arXiv preprint arXiv:2504.09590.
(2024). QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices. 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
(2024). QSpec: Speculative Decoding with Complementary Quantization Schemes. arXiv preprint arXiv:2410.11305.
(2024). POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization. Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming.
(2024). Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization. arXiv preprint arXiv:2403.01136.
(2024). Cdmpp: A device-model agnostic framework for latency prediction of tensor programs. Proceedings of the Nineteenth European Conference on Computer Systems.
(2023). Adaptive message quantization and parallelization for distributed full-graph gnn training. Proceedings of Machine Learning and Systems.
(2022). Synchronization in games sound: an audiovisual study on player experience and performance. Proceedings of the 2nd Workshop on Games Systems.
(2022). CryptoArcade: A Cloud Gaming System With Blockchain-Based Token Economy. IEEE Transactions on Cloud Computing.