Case Study
A Support System Architecture I Couldn't Have Imagined a Year Ago
AI capabilities are changing so fundamentally, so fast, that the established patterns aren't the only option anymore. Build your own playbook.
A small company had years of support knowledge trapped in WhatsApp chats and email threads. Experts answered similar and novel questions from scratch, over and over. All that hard-won knowledge was lost and unusable.
So I built them an AI-driven support platform with Claude Code - a tool I'd been using extensively (almost addictively) since its launch. This project evolved alongside it, rebuilt multiple times as we both got better.
Over the past year, every update to Claude prompted a rethink of what the "best possible buildable solution" was. The biggest change was when Anthropic released the Agent SDK, which changed everything from data structures to UI. It landed somewhere I couldn't have even imagined a year ago.
After all that rebuilding, here are my five main learnings - patterns that ended up enabling more than just support.
1. Collaboration Over Automation
For this client, the human touch was the point
Full automation makes sense in a lot of contexts. But it's not the only model.
This client valued the human connection. The goal wasn't to replace experts - it was to make their jobs more pleasant while capturing knowledge that was otherwise disappearing into WhatsApp threads.
So instead of an answer-bot, I built a system where AI drafts and experts refine. Every response goes through someone who knows the domain. The human stays in control.
Customer question enters queue
Right person picks it up
Using knowledge base + catalog
Edits, approves, sends
AI handles the grunt work - finding relevant history, drafting a starting point - so experts can focus on judgment and nuance, informed by the right, targeted supporting information.
Their jobs get more interesting. Customers still get that personal connection. And the learnings compound instead of vanishing.
Every response is expert-verified before it goes out. When experts do make corrections, the system captures what changed and why to highlight it when future agents search for similar situations - drawing on the accumulated wisdom of every past interaction. Expert knowledge compounds. Repetitive questions get easier. Experts spend their time on genuinely hard problems.
Even if the end goal is full automation, this is a useful first step, with observability on where the system is most fragile. When you're ready to pull humans out of simple cases, the knowledge is already there.
2. Context Engineering
Not RAG, not brute force send-every-token - the middle path
The system draws on two data sources: a knowledge base of expert-verified Q&As that grows with every interaction, and a full product catalog with specs, price ranges, and compatibility notes.
There's a spectrum of approaches for making this searchable.
One extreme: full vector infrastructure - embeddings, retrieval, reranking. For a few hundred Q&As and products, it's overkill and not quite the right optimization. Semantic similarity can't combine "budget + experience level + use case" into a coherent filter. You end up with complex infrastructure that still can't express the queries you actually need.
The other extreme: dump everything into a giant context window. Modern context windows make this tempting, but bigger isn't better - it's just hoping. Performance degrades in surprising ways: irrelevant information competes with relevant, and the degradation isn't smooth or predictable.
The middle path is context engineering: curating the smallest set of high-signal tokens that maximize the likelihood of good outcomes. The carefully structured data of a database, combined with the reasoning of an LLM.
But first, the data needs structure.
You wouldn't hand a colleague a 200-page Word doc of copy-pasted emails and say "figure it out" - you'd organize it, highlight what matters, give them context. LLMs deserve the same setup.
LLMs don't mean "free-for-all, no structure needed." The opposite: thoughtful structure lets you leverage LLMs more intelligently. I extracted the Q&A knowledge base from email chaos into structured records, consolidated the product catalog from scattered brand materials into an organized database. Subagents made both scalable - what would have been tedious for humans or disastrous for scripts became consistent and effective.
Structuring the Chaos
The first task: turn 500+ email Q&As and product marketing and informational docs into structured data. I used Claude Code with subagents - specially trained parallel Claude instances that can divide and conquer large tasks - to parse the document in chunks, extracting each Q&A into a consistent schema.
Extract Q&As from this chunk. For each: - question: the customer's actual question - answer: the expert's response - budget_range: if mentioned (e.g., "$1500-2000") - experience_level: if mentioned (beginner/enthusiast/pro) - products_mentioned: array of product names - use_case: what they're trying to shoot Return as JSON array. Skip email headers and signatures.
{
"id": "qa_127",
"question": "Budget $2k, landscapes + portraits, what camera?",
"answer": "For that budget the Alpha 200 is great...",
"budget_range": "$1500-2500",
"experience_level": "enthusiast",
"products_mentioned": ["Alpha 200", "Alpha 100"],
"use_case": "landscape and portrait photography",
"summary": "Mid-budget enthusiast seeking versatile full-frame"
}Each subagent processed 20-30 emails per chunk. The prompts instructed them to flag malformed data, ambiguity, or potential errors - making human review targeted rather than exhaustive. Each time we resolved an agent's question, it fed back into the agent's training to improve the next round, reducing human intervention over time.
LLM Filtering
Once the data was structured, the next challenge: giving the model efficient access to it. My early first (early 2025, pre-agent-SDK) pass at this was simple - feed an LLM all the Q&A summaries and carefully chosen structured data (significantly smaller than full text!), and have it return a list of the most "relevant" Q&As, with instructions on how "relevant" was defined.
Customer question: "{question}"
Below are summaries of previous Q&As. Select up to 15 that would
help answer this question. Consider:
- Similar customer profiles (budget, experience level)
- Related products or product categories
- Similar use cases or concerns
Return only the IDs of selected Q&As as JSON array.
Q&As:
{qa_summaries}Summaries + structured fields give you the coverage of brute force with the precision of a database query.
3. Workflow → Agent
Iterative, adaptive search instead of a fixed pipeline
V1: The Workflow
Three sequential LLM calls, deterministic path.
That first version was a workflow: a fixed sequence of LLM calls that ran the same way every time. Filter relevant Q&As, generate a response, potential re-generate data based on expert feedback, and extract metadata and organize the information when the expert indicated that the answer was ready.
LLM reads all summaries, picks the ones that match customer profile and question
Filtered Q&As + product catalog + system prompt. Model scales with complexity.
Metadata, expert edits, correction patterns. Every interaction grows the knowledge base.
Workflows are a classic in software processes, and it worked for straightforward questions. But complex queries, multi-part requests, and edge cases all struggled. The workflow couldn't zoom in, couldn't refocus, couldn't try a different approach if the first one missed.
V2: The Agent
When Anthropic released the Agent SDK, I rewrote the whole system. Same data sources, same underlying operations - but now the LLM reasons about what it needs instead of following a fixed path.
The difference was immediate: fewer failed searches, better handling of ambiguous questions, and the ability to cross-reference between Q&As dynamically.
search_qas → find relevant Q&As by semantic query get_qa_by_id → fetch full details for a specific Q&A search_products → look up specs from the product catalog submit_answer → return structured response with sources
"Best camera for a beginner, low light, around $2k?"
→ Found 12 Q&As about beginner + low light
→ Retrieved full answer about $2k budget enthusiast
→ Got specs: 24MP, full-frame, ISO 51200
→ Structured response with sources, alternatives
The agent searched Q&As, found a relevant one, then looked up the product specs to verify the recommendation was still current. If it hadn't found enough matches, it could have tried a keyword search, or broadened the query, or checked the product catalog directly.
The agent also unlocked the product catalog. The workflow dumped a static, summarized product catalog into context - it was noisy to include full specs for every product. The agent searches it on demand, pulling detailed specs only when relevant.
This is the power of agents: the same underlying data, but the system adapts to what each question actually needs. Instead of telling the model to read everything, it can intelligently choose what to read.
Multi-Turn Context
The agent maintains state across turns
Experts can probe the agent's reasoning, request alternatives, or add constraints without starting over. The agent remembers what it found and builds from there.
Multi-Turn Conversation
Context flows forward. The agent remembers.
Click "Play Demo" to see the agent in action
The challenge is surfacing this context and making expert feedback effortless, without making them wade through chat transcripts or type repetitively.
4. Agent + UI
More than chat: structured output becomes interactive interface
The answer isn't "give experts a chatbot." Rather, take everything it figured out - the reasoning, the sources it used, the alternatives it considered - and turn it into an interface designed for speed.
Glance and click, not read and type.
The expert sees a draft response they can edit directly. Sources are there if they want to verify. Alternatives are one click away. Feedback goes back to the agent in natural language.
The reasoning that would have been buried in a chat transcript becomes interactive UI elements.
The key is structured output. Instead of returning plain text, the agent returns data the UI can render: confidence levels, source references, alternative products with explanations. Each field becomes a component. The agent's reasoning becomes the interface.
{
"response": "For your budget the Alpha 200 is great...",
"confidence": "high",
"confidence_explanation": "Multiple similar Q&As with consistent advice",
"sources": [
{ "id": "qa_127", "relevance": "Same budget, same low-light priority",
"caveat": "Customer was enthusiast-level, not beginner" },
{ "id": "qa_203", "relevance": "Direct comparison in same price tier",
"caveat": "Focused on specs, not beginner-friendliness" }
],
"alternatives": [
{ "product": "Alpha 100", "considered": "Under budget, beginner-friendly",
"why_not": "APS-C sensor struggles in low light" },
{ "product": "Alpha 300", "considered": "Best low-light in lineup",
"why_not": "Over budget at $2,400" }
]
}Best camera for a beginner, low light, around $2k?
View reasoning & sources
Customer is a beginner with $2k budget prioritizing low-light. Found 2 Q&As with similar profiles recommending Alpha 200 for its full-frame sensor and high ISO performance. No conflicting recommendations in knowledge base.
Enthusiast with ~$2k looking for low-light performance at concerts
Why relevant: Same budget range, same priority on low-light. Ended up with Alpha 200.
Caveat: That customer was enthusiast-level, not beginner. May need to emphasize the learning curve more.
Comparing full-frame options under $2500
Why relevant: Direct comparison of Alpha 200 vs alternatives in same price tier.
Caveat: Comparison focused on specs, not beginner-friendliness. Supplement with usability notes.
Considered: Under budget at $699, beginner-friendly
Not recommended: APS-C sensor struggles in low light
Considered: Best low-light in lineup, pro features
Not recommended: Over budget at $2,400
Considered: Competitive specs, in budget
Not recommended: Smaller ecosystem, fewer tutorials for beginners
AI drafts, expert refines. The response area is fully editable before approval.
One-click product swaps. Hover to see why each was considered and why not chosen.
The learning loop. Every approved response joins the knowledge base.
You can tailor any part of this interface by asking the agent to re-structure the data - for example, for the Q&As the agent found most relevant, it generates custom summaries of why the example is relevant to the current customer question, including caveats on how the situation is different and similar, and how that should affect the expert's reasoning.
Instead of typing, the "top alternatives" are surfaced as easy-click buttons; when you hover over them, the expert sees a text box explaining why it considered that alternative - rather than an enormous transcript, targeted information and reasoning is exposed in actionable ways.
Once you have structured output and well-designed tools, those same building blocks work for different purposes. The support agent wasn't the end - it became the foundation.
5. The Power of Tools
The agent is only as good as its tools
Tools aren't utility functions you bolt on at the end. They're the API between the agent's reasoning and your data. A poorly designed tool leads to hallucination - the agent guesses because it can't get what it needs. A well-designed tool guides the agent toward grounded, verifiable answers.
Every tool in this system was shaped by real usage. What worked, what broke, what edge cases emerged. Most have been rewritten multiple times.
Search as Zoom Controls
The tools aren't just utilities - they're designed as zoom controls. Broad sweeps to find candidates, targeted lookups to verify, the judgment to know which to use when.
LLM reads all summaries, returns relevant matches. Understands meaning, not just keywords.
Direct text search when you need specific terms. Works well because all Q&As are LLM-processed - no typos, consistent formatting.
After a search finds a promising match, fetch the full conversation with all context.
submit_answer: Forcing Structure
The output tool is just as important as the search tools. Instead of letting the agent return free-form text, submit_answer requires a specific structure: the customer-facing response, the internal reasoning, confidence level, sources used, and alternatives considered.
This is what enables Section 4's UI. Without structured output, you can't build interactive elements.
Tools Recombine
Once you have good tools, you can recombine them.
The same Q&A search, product lookup, and pattern analysis tools that power the support agent also power a second agent - this one focused on strategic brand intelligence rather than customer support.
And here's where the compounding pays off. The Q&A knowledge base isn't a static dump of historical emails anymore - it's a living knowledge base that grows with every customer interaction.
Every question answered, every expert refinement, every edge case resolved becomes searchable institutional knowledge. A research agent tapping into that knowledge base isn't just searching old data; it's drawing on the accumulated wisdom of months or years of customer conversations.
- • Q&A search and pattern analysis
- • Product catalog and specs
- • Brand knowledge base
- • Web search for competitive intel
- • Google Docs read/write
- • Spreadsheet analysis and editing
Two agents, same core tools, completely different purposes. Build tools well and they become infrastructure, not one-off solutions.
Camera gear used as example domain. The real system is in a different industry.
Built with Next.js · Patterns applicable to any LLM-powered system