Tony Zhao
  • Bio
  • Papers
  • Experience
  • Game Projects
ESC

Searching...

Finding results for ""

No results found

No results for ""

↑↓ Navigate ↵ Select 0 filters
Powered by Hugo Blox
  • Recent & Upcoming Talks
    • Example Talk
  • Publications
    • Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
    • MegaScale-Data: Scaling DataLoader for Multisource Large Foundation Model Training
    • Sandwich: Separating Prefill-Decode Compilation for Efficient CPU LLM Serving
    • SplitQuant: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and Adaptive Quantization
    • Cdmpp: A device-model agnostic framework for latency prediction of tensor programs
    • Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization
    • QSpec: Speculative Decoding with Complementary Quantization Schemes
    • QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
    • Adaptive message quantization and parallelization for distributed full-graph gnn training
    • CryptoArcade: A Cloud Gaming System With Blockchain-Based Token Economy
    • Synchronization in games sound: an audiovisual study on player experience and performance
    • CloudArcade: A blockchain empowered cloud gaming system
    • An example preprint / working paper
    • An example journal article
    • An example conference paper
  • Projects
  • Blog
    • 🎉 Easily create your own simple yet highly customizable blog
    • 🧠 Sharpen your thinking with a second brain
    • 📈 Communicate your results effectively with the best data visualizations
    • 👩🏼‍🏫 Teach academic courses
    • ✅ Manage your projects
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python
  • Projects
    • CookingMasterSimulator
    • TransHome
    • Shadow Ticker

QSpec: Speculative Decoding with Complementary Quantization Schemes

Jan 1, 2024·
Juntao Zhao
,
Wenhao Lu
,
Sheng Wang
,
Lingpeng Kong
,
Chuan Wu
· 0 min read
PDF
Last updated on Jan 1, 2024

← Llm-pq: Serving llm on heterogeneous clusters with phase-aware partition and adaptive quantization Jan 1, 2024
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices Jan 1, 2024 →

© 2026 Me. This work is licensed under CC BY NC ND 4.0

Made with Hugo Blox. Start free →