Case Study

A Support System Architecture I Couldn't Have Imagined a Year Ago

AI capabilities are changing so fundamentally, so fast, that the established patterns aren't the only option anymore. Build your own playbook.

A small company had years of support knowledge trapped in WhatsApp chats and email threads. Experts answered similar and novel questions from scratch, over and over. All that hard-won knowledge was lost and unusable.

So I built them an AI-driven support platform with Claude Code - a tool I'd been using extensively (almost addictively) since its launch. This project evolved alongside it, rebuilt multiple times as we both got better.

Over the past year, every update to Claude prompted a rethink of what the "best possible buildable solution" was. The biggest change was when Anthropic released the Agent SDK, which changed everything from data structures to UI. It landed somewhere I couldn't have even imagined a year ago.

After all that rebuilding, here are my five main learnings - patterns that ended up enabling more than just support.

1. Collaboration Over Automation

For this client, the human touch was the point

Full automation makes sense in a lot of contexts. But it's not the only model.

This client valued the human connection. The goal wasn't to replace experts - it was to make their jobs more pleasant while capturing knowledge that was otherwise disappearing into WhatsApp threads.

So instead of an answer-bot, I built a system where AI drafts and experts refine. Every response goes through someone who knows the domain. The human stays in control.

Support adds question

Customer question enters queue

↓→

Expert claims

Right person picks it up

↓→

AI generates draft

Using knowledge base + catalog

↓→

Expert refines

Edits, approves, sends

AI handles the grunt work - finding relevant history, drafting a starting point - so experts can focus on judgment and nuance, informed by the right, targeted supporting information.

Their jobs get more interesting. Customers still get that personal connection. And the learnings compound instead of vanishing.

The learning loop

Every response is expert-verified before it goes out. When experts do make corrections, the system captures what changed and why to highlight it when future agents search for similar situations - drawing on the accumulated wisdom of every past interaction. Expert knowledge compounds. Repetitive questions get easier. Experts spend their time on genuinely hard problems.

Expert reviews response→Verified Q&A stored→Agent searches for similar→Knowledge compounds

Even if the end goal is full automation, this is a useful first step, with observability on where the system is most fragile. When you're ready to pull humans out of simple cases, the knowledge is already there.

2. Context Engineering

Not RAG, not brute force send-every-token - the middle path

The system draws on two data sources: a knowledge base of expert-verified Q&As that grows with every interaction, and a full product catalog with specs, price ranges, and compatibility notes.

There's a spectrum of approaches for making this searchable.

One extreme: full vector infrastructure - embeddings, retrieval, reranking. For a few hundred Q&As and products, it's overkill and not quite the right optimization. Semantic similarity can't combine "budget + experience level + use case" into a coherent filter. You end up with complex infrastructure that still can't express the queries you actually need.

The other extreme: dump everything into a giant context window. Modern context windows make this tempting, but bigger isn't better - it's just hoping. Performance degrades in surprising ways: irrelevant information competes with relevant, and the degradation isn't smooth or predictable.

The middle path is context engineering: curating the smallest set of high-signal tokens that maximize the likelihood of good outcomes. The carefully structured data of a database, combined with the reasoning of an LLM.

But first, the data needs structure.

You wouldn't hand a colleague a 200-page Word doc of copy-pasted emails and say "figure it out" - you'd organize it, highlight what matters, give them context. LLMs deserve the same setup.

LLMs don't mean "free-for-all, no structure needed." The opposite: thoughtful structure lets you leverage LLMs more intelligently. I extracted the Q&A knowledge base from email chaos into structured records, consolidated the product catalog from scattered brand materials into an organized database. Subagents made both scalable - what would have been tedious for humans or disastrous for scripts became consistent and effective.

Structuring the Chaos

The first task: turn 500+ email Q&As and product marketing and informational docs into structured data. I used Claude Code with subagents - specially trained parallel Claude instances that can divide and conquer large tasks - to parse the document in chunks, extracting each Q&A into a consistent schema.

Subagent extraction prompt (simplified)

Extract Q&As from this chunk. For each:
- question: the customer's actual question
- answer: the expert's response
- budget_range: if mentioned (e.g., "$1500-2000")
- experience_level: if mentioned (beginner/enthusiast/pro)
- products_mentioned: array of product names
- use_case: what they're trying to shoot

Return as JSON array. Skip email headers and signatures.

Structured output

{
  "id": "qa_127",
  "question": "Budget $2k, landscapes + portraits, what camera?",
  "answer": "For that budget the Alpha 200 is great...",
  "budget_range": "$1500-2500",
  "experience_level": "enthusiast",
  "products_mentioned": ["Alpha 200", "Alpha 100"],
  "use_case": "landscape and portrait photography",
  "summary": "Mid-budget enthusiast seeking versatile full-frame"
}

Each subagent processed 20-30 emails per chunk. The prompts instructed them to flag malformed data, ambiguity, or potential errors - making human review targeted rather than exhaustive. Each time we resolved an agent's question, it fed back into the agent's training to improve the next round, reducing human intervention over time.

LLM Filtering

Once the data was structured, the next challenge: giving the model efficient access to it. My early first (early 2025, pre-agent-SDK) pass at this was simple - feed an LLM all the Q&A summaries and carefully chosen structured data (significantly smaller than full text!), and have it return a list of the most "relevant" Q&As, with instructions on how "relevant" was defined.

Filter stage prompt (Stage 1)

Customer question: "{question}"

Below are summaries of previous Q&As. Select up to 15 that would
help answer this question. Consider:
- Similar customer profiles (budget, experience level)
- Related products or product categories
- Similar use cases or concerns

Return only the IDs of selected Q&As as JSON array.

Q&As:
{qa_summaries}

Summaries + structured fields give you the coverage of brute force with the precision of a database query.

3. Workflow → Agent

Iterative, adaptive search instead of a fixed pipeline

V1: The Workflow

Three sequential LLM calls, deterministic path.

That first version was a workflow: a fixed sequence of LLM calls that ran the same way every time. Filter relevant Q&As, generate a response, potential re-generate data based on expert feedback, and extract metadata and organize the information when the expert indicated that the answer was ready.

Filter

Full knowledge base → 15 relevant

LLM reads all summaries, picks the ones that match customer profile and question

↓→

Generate

Draft response

Filtered Q&As + product catalog + system prompt. Model scales with complexity.

↓→

Extract

Save to knowledge base

Metadata, expert edits, correction patterns. Every interaction grows the knowledge base.

Workflows are a classic in software processes, and it worked for straightforward questions. But complex queries, multi-part requests, and edge cases all struggled. The workflow couldn't zoom in, couldn't refocus, couldn't try a different approach if the first one missed.

V2: The Agent

When Anthropic released the Agent SDK, I rewrote the whole system. Same data sources, same underlying operations - but now the LLM reasons about what it needs instead of following a fixed path.

The difference was immediate: fewer failed searches, better handling of ambiguous questions, and the ability to cross-reference between Q&As dynamically.

Core tools

search_qas      → find relevant Q&As by semantic query
get_qa_by_id    → fetch full details for a specific Q&A
search_products → look up specs from the product catalog
submit_answer   → return structured response with sources

Question:

"Best camera for a beginner, low light, around $2k?"

search_qas("beginner camera for low light")

→ Found 12 Q&As about beginner + low light

get_qa_by_id("qa_127")

→ Retrieved full answer about $2k budget enthusiast

search_products("Alpha 200")

→ Got specs: 24MP, full-frame, ISO 51200

submit_answer({...})

→ Structured response with sources, alternatives

The agent searched Q&As, found a relevant one, then looked up the product specs to verify the recommendation was still current. If it hadn't found enough matches, it could have tried a keyword search, or broadened the query, or checked the product catalog directly.

The agent also unlocked the product catalog. The workflow dumped a static, summarized product catalog into context - it was noisy to include full specs for every product. The agent searches it on demand, pulling detailed specs only when relevant.

This is the power of agents: the same underlying data, but the system adapts to what each question actually needs. Instead of telling the model to read everything, it can intelligently choose what to read.

Multi-Turn Context

The agent maintains state across turns

Experts can probe the agent's reasoning, request alternatives, or add constraints without starting over. The agent remembers what it found and builds from there.

Multi-Turn Conversation

Context flows forward. The agent remembers.

Click "Play Demo" to see the agent in action

The challenge is surfacing this context and making expert feedback effortless, without making them wade through chat transcripts or type repetitively.

4. Agent + UI

More than chat: structured output becomes interactive interface

The answer isn't "give experts a chatbot." Rather, take everything it figured out - the reasoning, the sources it used, the alternatives it considered - and turn it into an interface designed for speed.

Glance and click, not read and type.

The expert sees a draft response they can edit directly. Sources are there if they want to verify. Alternatives are one click away. Feedback goes back to the agent in natural language.

The reasoning that would have been buried in a chat transcript becomes interactive UI elements.

The key is structured output. Instead of returning plain text, the agent returns data the UI can render: confidence levels, source references, alternative products with explanations. Each field becomes a component. The agent's reasoning becomes the interface.

submit_answer tool output

{
  "response": "For your budget the Alpha 200 is great...",
  "confidence": "high",
  "confidence_explanation": "Multiple similar Q&As with consistent advice",
  "sources": [
    { "id": "qa_127", "relevance": "Same budget, same low-light priority",
      "caveat": "Customer was enthusiast-level, not beginner" },
    { "id": "qa_203", "relevance": "Direct comparison in same price tier",
      "caveat": "Focused on specs, not beginner-friendliness" }
  ],
  "alternatives": [
    { "product": "Alpha 100", "considered": "Under budget, beginner-friendly",
      "why_not": "APS-C sensor struggles in low light" },
    { "product": "Alpha 300", "considered": "Best low-light in lineup",
      "why_not": "Over budget at $2,400" }
  ]
}

Expert ReviewAgent Mode

← Release & Exit

CUSTOMER QUESTION

Best camera for a beginner, low light, around $2k?

Response · editable

Confidence: HighSimilar Q&As with consistent advice

View reasoning & sources

Reasoning

Customer is a beginner with $2k budget prioritizing low-light. Found 2 Q&As with similar profiles recommending Alpha 200 for its full-frame sensor and high ISO performance. No conflicting recommendations in knowledge base.

Sources (2)

Q&A #127

Enthusiast with ~$2k looking for low-light performance at concerts

Why relevant: Same budget range, same priority on low-light. Ended up with Alpha 200.

Caveat: That customer was enthusiast-level, not beginner. May need to emphasize the learning curve more.

Q&A #203

Comparing full-frame options under $2500

Why relevant: Direct comparison of Alpha 200 vs alternatives in same price tier.

Caveat: Comparison focused on specs, not beginner-friendliness. Supplement with usability notes.

Try a different product instead

Considered: Under budget at $699, beginner-friendly

Not recommended: APS-C sensor struggles in low light

Considered: Best low-light in lineup, pro features

Not recommended: Over budget at $2,400

Considered: Competitive specs, in budget

Not recommended: Smaller ecosystem, fewer tutorials for beginners

Editable Response

AI drafts, expert refines. The response area is fully editable before approval.

Try Instead

One-click product swaps. Hover to see why each was considered and why not chosen.

Approve & Save

The learning loop. Every approved response joins the knowledge base.

You can tailor any part of this interface by asking the agent to re-structure the data - for example, for the Q&As the agent found most relevant, it generates custom summaries of why the example is relevant to the current customer question, including caveats on how the situation is different and similar, and how that should affect the expert's reasoning.

Instead of typing, the "top alternatives" are surfaced as easy-click buttons; when you hover over them, the expert sees a text box explaining why it considered that alternative - rather than an enormous transcript, targeted information and reasoning is exposed in actionable ways.

Once you have structured output and well-designed tools, those same building blocks work for different purposes. The support agent wasn't the end - it became the foundation.

5. The Power of Tools

The agent is only as good as its tools

Tools aren't utility functions you bolt on at the end. They're the API between the agent's reasoning and your data. A poorly designed tool leads to hallucination - the agent guesses because it can't get what it needs. A well-designed tool guides the agent toward grounded, verifiable answers.

Every tool in this system was shaped by real usage. What worked, what broke, what edge cases emerged. Most have been rewritten multiple times.

Search as Zoom Controls

The tools aren't just utilities - they're designed as zoom controls. Broad sweeps to find candidates, targeted lookups to verify, the judgment to know which to use when.

search_qas

Semantic search

LLM reads all summaries, returns relevant matches. Understands meaning, not just keywords.

keyword_search

Exact matching

Direct text search when you need specific terms. Works well because all Q&As are LLM-processed - no typos, consistent formatting.

get_qa_by_id

Zoom in

After a search finds a promising match, fetch the full conversation with all context.

submit_answer: Forcing Structure

The output tool is just as important as the search tools. Instead of letting the agent return free-form text, submit_answer requires a specific structure: the customer-facing response, the internal reasoning, confidence level, sources used, and alternatives considered.

This is what enables Section 4's UI. Without structured output, you can't build interactive elements.

submit_answer requires:

customer_response → what they see

reasoning → why you said it (internal)

confidence → high / medium / low

sources → which Q&As, with relevance + caveats

alternatives → why considered + why not chosen

Tools Recombine

Once you have good tools, you can recombine them.

The same Q&A search, product lookup, and pattern analysis tools that power the support agent also power a second agent - this one focused on strategic brand intelligence rather than customer support.

And here's where the compounding pays off. The Q&A knowledge base isn't a static dump of historical emails anymore - it's a living knowledge base that grows with every customer interaction.

Every question answered, every expert refinement, every edge case resolved becomes searchable institutional knowledge. A research agent tapping into that knowledge base isn't just searching old data; it's drawing on the accumulated wisdom of months or years of customer conversations.

Same tools, different agent

Reused from support agent:

• Q&A search and pattern analysis
• Product catalog and specs
• Brand knowledge base

Added for research:

• Web search for competitive intel
• Google Docs read/write
• Spreadsheet analysis and editing

Two agents, same core tools, completely different purposes. Build tools well and they become infrastructure, not one-off solutions.

View more projects →

Camera gear used as example domain. The real system is in a different industry.

Built with Next.js · Patterns applicable to any LLM-powered system