tiny-vllm

AI Tools

Educational LLM inference engine built from scratch in C++ and CUDA

tiny-vllm is an educational sibling of vLLM, implementing a full LLM forward pass (Llama 3.2 1B) from scratch in C++ and CUDA with KV cache, continuous batching, GQA, RoPE, and custom CUDA kernels.