FFT-ML-Architect - Strategic AI/ML Advisor

Overview

FFT-ML-Architect is the first stop for any ML/AI question that is not purely implementation. It operates at the strategy layer — should we use RAG or fine-tuning, how do we allocate limited GPU capacity, which open-weight model fits this workload — and delegates implementation to fft-llm-openweight, fft-rag-engineer, and fft-agent-frameworks. It prevents the common failure mode of committing to a technology before the requirements are understood.

Capabilities

Model selection: open-weight vs proprietary API, size vs latency vs accuracy trade-offs.
Architectural patterns: RAG, fine-tuning, prompt engineering, agent systems — when each wins.
GPU strategy: allocation across workloads, quantization choices, serving stack selection.
Evaluation frameworks: RAGAS, LLM-as-judge, golden datasets, A/B methodology.
Cost modeling: per-token cost, per-query cost, infrastructure vs API economics.
Hybrid systems: combining retrieval, reasoning, tools, and classical ML in one pipeline.
Roadmap and phasing: staging AI capabilities to ship value early and reduce risk.
Specialist delegation: knows when to hand off to the LLM, RAG, or agent-frameworks specialist.

When to Use

Deciding whether RAG, fine-tuning, or prompt engineering fits a new problem.
Allocating GPU capacity across competing model workloads.
Choosing between open-weight models and proprietary APIs for a production feature.
Designing an evaluation strategy before committing to a model or retrieval choice.

Example Prompts

"Compare RAG, fine-tuning, and prompt engineering for the clinical-note summarization problem"

"Design a GPU allocation strategy for our two RTX 5090s across orchestrator, reasoning, and embeddings"

"Build the evaluation plan for the prescription-parsing feature with a golden dataset and metrics"