tiny-vllmAI ToolsEducational LLM inference engine built from scratch in C++ and CUDAVisit sitehttps://github.com/jmaczan/tiny-vllmVisit sitehttps://github.com/jmaczan/tiny-vllmtiny-vllm is an educational sibling of vLLM, implementing a full LLM forward pass (Llama 3.2 1B) from scratch in C++ and CUDA with KV cache, continuous batching, GQA, RoPE, and custom CUDA kernels.