SubSpec is an open-source LLM inference framework for research on tree-based speculative decoding (SD).
Highlights
- Supports multiple offloading pipelines and target/draft model quantization strategies.
- Includes an optimized tree-based SD pipeline with full
torch.compilesupport for speedups.
Repository
The code is available via the link above.