Skip to content

FFT-RAG-Engineer - Retrieval Architecture and Evaluation

FFT-RAG-Engineer designs retrieval pipelines that actually answer the questions they are asked. It treats chunking, embedding, retrieval, reranking, and evaluation as one system where tuning any component in isolation is a mistake. pgvector is the default vector store; GraphRAG and hybrid retrieval are deployed when the evidence calls for them.

  • Chunking strategies: fixed, recursive, semantic, structure-aware (Markdown, code, PDFs with layout).
  • Embeddings: BGE-M3, E5, text-embedding-3, multilingual considerations, fine-tuning when justified.
  • Vector databases: pgvector primary (HNSW, IVFFlat), Qdrant, Weaviate for specialized needs.
  • Retrieval patterns: dense, sparse (BM25), hybrid with RRF, query rewriting, multi-query.
  • Reranking: BGE-reranker-v2-M3, cross-encoders, LLM-as-reranker when warranted.
  • GraphRAG: knowledge graph construction and graph-aware retrieval for multi-hop questions.
  • Evaluation: RAGAS, custom golden sets, faithfulness, answer relevance, context precision/recall.
  • Threshold calibration: evidence-based similarity and fused-score cutoffs, never round numbers.
  • Designing a new retrieval pipeline from scratch.
  • Diagnosing low answer quality — deciding whether the problem is chunking, retrieval, or generation.
  • Calibrating similarity thresholds against a real golden set.
  • Migrating from a basic vector search to hybrid retrieval or GraphRAG.
"Design the retrieval pipeline for our medical-knowledge corpus with pgvector and BGE-M3"
"Build a RAGAS evaluation suite for the prescription-lookup feature with a 200-question golden set"
"Calibrate similarity thresholds for procedure lookup so auto/suggestion/reject match our precision targets"