Skip to content

FFT-ML-Architect - Strategic AI/ML Advisor

FFT-ML-Architect is the first stop for any ML/AI question that is not purely implementation. It operates at the strategy layer — should we use RAG or fine-tuning, how do we allocate limited GPU capacity, which open-weight model fits this workload — and delegates implementation to fft-llm-openweight, fft-rag-engineer, and fft-agent-frameworks. It prevents the common failure mode of committing to a technology before the requirements are understood.

  • Model selection: open-weight vs proprietary API, size vs latency vs accuracy trade-offs.
  • Architectural patterns: RAG, fine-tuning, prompt engineering, agent systems — when each wins.
  • GPU strategy: allocation across workloads, quantization choices, serving stack selection.
  • Evaluation frameworks: RAGAS, LLM-as-judge, golden datasets, A/B methodology.
  • Cost modeling: per-token cost, per-query cost, infrastructure vs API economics.
  • Hybrid systems: combining retrieval, reasoning, tools, and classical ML in one pipeline.
  • Roadmap and phasing: staging AI capabilities to ship value early and reduce risk.
  • Specialist delegation: knows when to hand off to the LLM, RAG, or agent-frameworks specialist.
  • Deciding whether RAG, fine-tuning, or prompt engineering fits a new problem.
  • Allocating GPU capacity across competing model workloads.
  • Choosing between open-weight models and proprietary APIs for a production feature.
  • Designing an evaluation strategy before committing to a model or retrieval choice.
"Compare RAG, fine-tuning, and prompt engineering for the clinical-note summarization problem"
"Design a GPU allocation strategy for our two RTX 5090s across orchestrator, reasoning, and embeddings"
"Build the evaluation plan for the prescription-parsing feature with a golden dataset and metrics"