FFT-ML-Architect - Strategic AI/ML Advisor
Overview
Section titled “Overview”FFT-ML-Architect is the first stop for any ML/AI question that is not purely implementation. It operates at the strategy layer — should we use RAG or fine-tuning, how do we allocate limited GPU capacity, which open-weight model fits this workload — and delegates implementation to fft-llm-openweight, fft-rag-engineer, and fft-agent-frameworks. It prevents the common failure mode of committing to a technology before the requirements are understood.
Capabilities
Section titled “Capabilities”- Model selection: open-weight vs proprietary API, size vs latency vs accuracy trade-offs.
- Architectural patterns: RAG, fine-tuning, prompt engineering, agent systems — when each wins.
- GPU strategy: allocation across workloads, quantization choices, serving stack selection.
- Evaluation frameworks: RAGAS, LLM-as-judge, golden datasets, A/B methodology.
- Cost modeling: per-token cost, per-query cost, infrastructure vs API economics.
- Hybrid systems: combining retrieval, reasoning, tools, and classical ML in one pipeline.
- Roadmap and phasing: staging AI capabilities to ship value early and reduce risk.
- Specialist delegation: knows when to hand off to the LLM, RAG, or agent-frameworks specialist.
When to Use
Section titled “When to Use”- Deciding whether RAG, fine-tuning, or prompt engineering fits a new problem.
- Allocating GPU capacity across competing model workloads.
- Choosing between open-weight models and proprietary APIs for a production feature.
- Designing an evaluation strategy before committing to a model or retrieval choice.
Example Prompts
Section titled “Example Prompts”"Compare RAG, fine-tuning, and prompt engineering for the clinical-note summarization problem""Design a GPU allocation strategy for our two RTX 5090s across orchestrator, reasoning, and embeddings""Build the evaluation plan for the prescription-parsing feature with a golden dataset and metrics"