Three picks today: two for developers, one for anyone who wants free live TV.

tiny-vllm: Build an LLM Inference Engine from Scratch in C++ and CUDA

285 stars on GitHub, 81 points on Show HN. Developer Jędrzej Maczan spent 123 commits building a complete Llama 3.2 1B inference pipeline from scratch in pure C++17 and CUDA.

This isn't for production. Its value is showing you exactly how LLM inference works at every step: how embeddings are gathered, how RMSNorm does parallel tree reduction, how RoPE rotary position embeddings are computed, how GQA lets 4 query heads share 1 key-value head. All hand-written CUDA kernels, no PyTorch dependency.

The project implements both prefill (processing all input tokens) and decode (generating tokens one at a time), with KV cache and continuous batching. PagedAttention is marked as incoming.

The README is exceptionally detailed, starting from how bfloat16 floating-point works, through cuBLAS column-major/row-major tricks, to why KV cache exists. If you want to understand GPU inference internals, this is far more approachable than reading vLLM's source code.

🔗 GitHub: https://github.com/jmaczan/tiny-vllm


Zot: AI Coding in a Single Go Binary

61 points on Show HN, 90 stars on GitHub. Zot is a terminal-based AI coding agent shipped as a single static Go binary. No Node.js, no Docker — just drop it in your PATH.

It has 4 built-in tools: read, write, edit, and bash. Enough to be useful, not so many that things break.

Zot supports 25+ LLM providers including Anthropic, OpenAI, Google Gemini, DeepSeek, Kimi, GitHub Copilot, AWS Bedrock, Azure OpenAI, xAI, Groq, and more. You can use API keys or log in with your Claude/ChatGPT subscription directly.

The extension system is interesting: any language can write plugins via JSON-RPC over subprocess. Plugins can register slash commands, expose new tools, and intercept tool calls for permission gating. There's also a skills system using markdown files with YAML frontmatter that the model loads on demand.

Four run modes: interactive TUI with streaming, print mode for shell pipelines, JSON mode for scripts/CI, and RPC mode as a long-lived subprocess you can embed in other apps.

If you're tired of Node.js-based coding agent toolchains, Zot is worth a look.

🔗 Website: https://www.zot.sh 🔗 GitHub: https://github.com/patriceckhart/zot


TV Explorer: Free Live TV from 200+ Countries

104 points on Show HN. TV Explorer is an IPTV player front-end aggregating 11,000+ free, publicly available live TV channels from 200+ countries. The channel data comes from the open-source IPTV project on GitHub — all legitimate free-to-air streams.

Feature set includes Chromecast casting, multiview (watch multiple channels at once), DVR recording, favorites, and a hotbar for quick access. Search supports triple filtering by category, country, and language.

No login required, no data collection. Coverage spans Europe (2,348 channels), North America (2,343), Asia (1,744), South America (818), and more. Categories include news, sports, entertainment, music, and kids.

If you're abroad and want to watch channels from home, or just curious what news looks like in other countries, this is a handy tool.

🔗 Website: https://tvexplorer.live