Why RAG

LLMs have two limitations: knowledge cutoff and no access to your private data. RAG solves the second by retrieving relevant documents before generating answers.

RAG = Search + AI Answer

Basic Flow

  1. Convert question to vector (embedding)
  2. Search vector database for relevant document chunks
  3. Send retrieved chunks + question to LLM
  4. LLM generates answer based on retrieved content

Building a Simple RAG System

  1. Prepare documents: PDF, Word, Markdown, web pages
  2. Chunking: Split into 500-1000 token blocks with overlap
  3. Embedding: Use text-embedding-3-small or BGE-M3
  4. Vector database: Chroma, Milvus, Qdrant, or FAISS
  5. Retrieve + Generate: Search top-K chunks, send with question to LLM

Quality Tips

  • Chunk quality determines answer quality
  • Hybrid search (vector + keyword) works best
  • Reranking improves precision
  • Cite sources in answers
  • Update index when docs change

Ready-made Tools

Dify, FastGPT, Anything LLM, ChatGPT file upload

Limitations

RAG depends on retrieval quality. Complex reasoning needs Agent architectures.