The principle first, because the steps only make sense with it: the agent owns the first response, a human owns the close. Get that division right and a triage agent is one of the few agent deployments that reliably works in 2026. Get it backwards, let the agent close tickets unsupervised on day one, and you become a line item in Gartner’s prediction that over 40 percent of agentic AI projects get canceled by end of 2027.

Here’s the path. Six steps, each with its reason.

Step 1: pick one ticket category, not your inbox

Choose a single high-volume, low-stakes category: order status, password resets, “where is my invoice.” Not refunds. Not cancellations. One category.

Why: an agent’s error rate compounds with the size of the task space. A narrow category gives you a clean baseline, a small knowledge surface to ground, and a contained blast radius when it gets something wrong. You can expand later; you can’t un-burn a customer.

Step 2: ground it in your content, then read what it retrieves

Connect the agent to your help center, macros, and a sample of solved tickets in that category. Then spot-check the retrieval: ask it the ten most common questions and read what sources it pulled.

Why: nearly every wrong answer a triage agent gives traces back to a gap or a contradiction in the source content, not to the model. Fix the docs and you fix the agent. This is also the cheapest improvement you will ever make to it.

Step 3: define “resolved” before your vendor does

Write down, in one sentence, what counts as a resolution for your team. Then compare it to the vendor’s billing definition. Intercom’s Fin, the reference point for this market, charges $0.99 per resolution with a 50-resolution monthly minimum, and a resolution includes the customer simply not replying again after the answer. Teams reviewing transcripts have found that “assumed resolutions” can include customers who gave up and emailed someone instead.

Why: the gap between marketed and audited numbers is the budget. Intercom claims 67 percent resolution across 40M+ conversations, while its own published case studies put real-world rates between 42 and 50 percent. Forecast your costs on the audited number, not the keynote number. And remember the structural point: on per-resolution pricing you pay only for wins, while per-conversation or per-session models bill you for the failures too.

Step 4: build the review gate

Configure the agent so that, for the first two to four weeks, every drafted response in your chosen category routes to a human queue before sending. The human approves, edits, or rejects, and you log which.

Why: the approve/edit/reject log is your eval set. It tells you the true accuracy rate on your tickets, in your voice, before a single customer sees an unsupervised answer. When the edit rate falls below a threshold you chose in advance (one in ten is a reasonable bar for a low-stakes category), you let the agent send directly and keep sampling 10 percent for review.

Here is a system prompt skeleton that enforces the division of labor. Adapt the brackets; keep the structure.

You are a support triage assistant for [COMPANY]. You handle ONLY
[CATEGORY] questions.

Rules:
1. Answer only from the provided help articles and ticket macros.
   If the answer is not in them, say so and escalate. Never guess.
2. You may look up order/account status via the [TOOL] tool.
   You may NOT issue refunds, change account state, or promise
   timelines not stated in the articles.
3. If the customer is angry, mentions legal action, churn, or a
   payment dispute, escalate immediately with the tag [ESC-HUMAN]
   and a one-line summary.
4. End every reply by asking if that resolved the issue. If the
   customer says no twice, escalate.
5. Output format: {category, confidence: high/medium/low,
   draft_reply, escalate: yes/no, reason}.
   Anything below high confidence routes to human review.

Step 5: wire the escalation paths before launch

Three paths, minimum: low-confidence drafts to the review queue, trigger phrases (legal, refund, cancel, complaint) straight to a human, and a hard stop after two failed clarification turns.

Why: customers forgive a bot that hands off fast. They do not forgive a bot that loops. The escalation design is the customer experience; the answers are just the happy path.

Step 6: measure the only three numbers that matter

Audited resolution rate (your definition, from transcript samples, not the dashboard), edit rate at the review gate, and escalation latency. Review weekly for the first month.

Why: these three tell you when to widen the category, when to loosen the gate, and when something upstream broke. Everything else on the analytics page is decoration.

Six steps to a supervised triage agent

  1. Pick one narrow ticket category
  2. Ground in help content, audit retrieval
  3. Define 'resolved' before the vendor does
  4. Human review gate on every draft — the gate
  5. Wire three escalation paths
  6. Track audited resolution, edit rate, escalation latency

Where this goes wrong in practice

The failure I see most: teams skip step 3, trust the vendor dashboard, and discover at renewal that they paid for hundreds of “resolutions” that were customers walking away. The guard is a monthly transcript audit of 50 random resolved tickets. Thirty minutes, once a month. It will pay for itself the first time.

Second failure: the prompt forgets to tell the agent what it may NOT do. Models fill silence with initiative. Rule 2 above exists because an agent that can check an order will, eventually, try to fix one unless told otherwise.

Third: launching across the whole inbox because the pilot category went well. Each new category resets your accuracy to unknown. Expand one category at a time, each with its own two-week gated period.

Where the workflow stops being enough

Be honest about the ceiling. This playbook covers informational and lookup tickets. The moment your roadmap includes agent-executed refunds, plan changes, or anything with money attached, you’ve left playbook territory: that’s an engineering project with permissioning, spending limits, and audit logs, and at the enterprise end vendors like Decagon price accordingly, starting around $95K a year with weeks of deployment. Our June 18 piece covers how to eval an agent before you extend that kind of trust, and the July 2 guardrails explainer covers the permission layer. Until you’ve read both, the agent drafts, the human closes.

FAQ

How do I set up an AI support agent for the first time? Start with one high-volume, low-stakes ticket category and connect the agent to your help center and solved tickets for that category only. Route every drafted reply through human review for the first weeks, and only allow direct sending once the human edit rate drops below a threshold you set in advance. Expand to new categories one at a time, each with its own review period.

What resolution rate should I expect from an AI support agent? Plan around audited rates of roughly 40 to 50 percent for a well-grounded agent on suitable ticket types, which matches Intercom’s published case-study range of 42 to 50 percent for Fin. Vendor headline claims run higher (Intercom cites 67 percent across 40M+ conversations), partly because billing definitions count customers who simply stop replying. Audit transcripts monthly and forecast costs on your own measured rate.

Should an AI agent be allowed to issue refunds? Not in a first deployment. Refunds and account changes carry financial and trust risk that an unproven agent should not own, and they require permissioning, spending limits, and audit infrastructure beyond a triage setup. Keep money-touching actions behind human approval until the agent has months of measured accuracy and you have a tested evaluation and guardrail layer in place.

The Counter Brief — one email, every Monday.

The week's AI-for-revenue moves in a 5-minute read: which tools are worth the budget and which to skip, plus what to do this week. Source-checked, no vendor decks.

Edited by Aditya Marin Gasga

Free. One click to unsubscribe.

About Adithya Sulaiman

Contributor · CEO, Demand Nexus

Adithya Sulaiman is the CEO of Demand Nexus, a B2B demand-generation company built on a pay-for-performance model: clients pay only when a BANT-qualified meeting reaches a rep's calendar. He built the firm's client-acquisition engine and its network of niche B2B publications, and writes on pipeline economics from the operator's seat.

More from Adithya Sulaiman →