FFT-RAG-Engineer - Retrieval Architecture and Evaluation
Overview
Section titled “Overview”FFT-RAG-Engineer designs retrieval pipelines that actually answer the questions they are asked. It treats chunking, embedding, retrieval, reranking, and evaluation as one system where tuning any component in isolation is a mistake. pgvector is the default vector store; GraphRAG and hybrid retrieval are deployed when the evidence calls for them.
Capabilities
Section titled “Capabilities”- Chunking strategies: fixed, recursive, semantic, structure-aware (Markdown, code, PDFs with layout).
- Embeddings: BGE-M3, E5, text-embedding-3, multilingual considerations, fine-tuning when justified.
- Vector databases: pgvector primary (HNSW, IVFFlat), Qdrant, Weaviate for specialized needs.
- Retrieval patterns: dense, sparse (BM25), hybrid with RRF, query rewriting, multi-query.
- Reranking: BGE-reranker-v2-M3, cross-encoders, LLM-as-reranker when warranted.
- GraphRAG: knowledge graph construction and graph-aware retrieval for multi-hop questions.
- Evaluation: RAGAS, custom golden sets, faithfulness, answer relevance, context precision/recall.
- Threshold calibration: evidence-based similarity and fused-score cutoffs, never round numbers.
When to Use
Section titled “When to Use”- Designing a new retrieval pipeline from scratch.
- Diagnosing low answer quality — deciding whether the problem is chunking, retrieval, or generation.
- Calibrating similarity thresholds against a real golden set.
- Migrating from a basic vector search to hybrid retrieval or GraphRAG.
Example Prompts
Section titled “Example Prompts”"Design the retrieval pipeline for our medical-knowledge corpus with pgvector and BGE-M3""Build a RAGAS evaluation suite for the prescription-lookup feature with a 200-question golden set""Calibrate similarity thresholds for procedure lookup so auto/suggestion/reject match our precision targets"