td-5b731a: Eval case: agent-as-gatekeeper behavior on merchant reward proposals (70743 replay)

Description

**Why this matters:** Real-replay evidence that the onboarding agent defaults to gatekeeper behavior when a merchant proposes a reward structure the agent considers unusual. Merchant 70743 (Pacer Virtual Adventure Challenges) proposed 10% friend / 60% advocate; agent responded with unsolicited correction ("too aggressive") and a softened counter ($10 store credit). Merchant quit the conversation immediately after. **Two compounding failures:** 1. **Agent jumped to correction without curiosity.** Expert move when a merchant proposes something unusual = ask why, not offer a smaller version. Merchant 70743 had been running a campaign for a long time; they had context the agent didn't. Agent's "safer version" read as override, not expertise. Rule: understand reasoning BEFORE recommending. 2. **Agent defaulted to margin-protection framing.** We don't have the merchant's full picture — margins, goals, cash position, LTV math, campaign-period experiments. 60% off may be an intentional design choice (VC-funded growth, LTV-driven math, buzz/launch mechanic, reactivation play). Bigger rewards often IMPROVE program performance. Margin is ONE input, not the overriding factor. Default posture should be "support the decision with information," not "protect the merchant from their numbers." **Scope:** Capture this as an eval scenario in the onboarding agent test harness (eval.farscry.io). Scenario shape: - Merchant volunteers an unusual reward structure (e.g. 10%/60%, or high-asymmetry inverse) - Pass criteria: agent asks about goals/reasoning BEFORE evaluating economics - Fail criteria: agent delivers unsolicited correction, proposes softened counter, or frames margin as overriding concern Also: add a behavioral knowledge doc (or extend the existing reward-strategy.md / messaging-patterns.md) that teaches the consultant-not-gatekeeper posture explicitly: - When merchant proposes an unusual structure, the first move is curiosity - Don't treat high rewards as inherently wrong; treat them as a design choice the merchant has reasons for - Margin is not the overriding factor in reward design **Source signal:** Zach shared real 70743 replay 2026-04-16. Voice log entries 21:59 + 22:03. Related: REF-118 knowledge-doc grounding discipline, td-da61c1 onboarding agent architecture. **Parent:** td-df820a (onboarding redesign epic).

Handoff

Done

AC-104 evidence bundle tightened to production-only. Triggering incident now carries the verbatim 70743 production conversation (rc_landing.chat_messages, conv_69e06938..., agent_name=GuidedSetupAgent). 3-variant diagnostic + 45-run eval + plan-v3 all removed from doc-shared — all used the non-production prompt. Doc-shared at b61360d contains just ticket.md + 10 scenario files + README. AC-104 description updated in place. Next PM trigger: adjudicate first eng iteration against the 10 scenarios.

Session Log (19 entries)

Apr 16, 10:24 PM

lg-1a1950f5 ses_b8a38b

progress

Plan drafted: projects/onboarding/plans/2026-04-16-agent-gatekeeper-fix-plan.md. Architectural fix = treat merchant deviation as goal signal, not wrong answer. Default-to-balanced-growth posture, adapt on deviation signal. NO required goal-discovery step. 5 workstreams: reward-strategy reorg, messaging-patterns deviation-signals, persona objective line, research rebalance, eval scenarios. Dispatching codex review next.

Apr 16, 10:39 PM

lg-606a7ec5 ses_b8a38b

progress

Plan v2 drafted with revised 3-generator diagnosis after diagnostic (V1/V2/V3 all exhibit margin framing + counter-proposal, confirming architectural). Fix expanded to 7 workstreams covering anti-pattern rules, margin-anchored mental model, tool contract framing, default redefinition, deviation-signal duplication, eval audit-before-changes, observable success criteria. Resubmitting to codex for second review.

Apr 16, 10:49 PM

lg-7b8dc198 ses_b8a38b

progress

Plan v3 finalized (approved-for-implementation). All 5 codex-refinement changes integrated: (1) 'architectural' → 'systemic' with confound disclosure, (2) workstream 1 rule rewrite preserves genuine checks for platform/fraud/math, (3) eval audit scope expanded to 5 flagged scenarios + ambient-language sweep across 27, (4) widget/quick-reply steering added as first-class success criterion (steering_in_ui_neutral label), (5) 8 observable labels now have operational definitions + baseline capture + variance test, (6) base prompt gets one-sentence economics-as-information line. Sequencing: eval audit first, then 5 parallel doc edits, then eval gate, then ship. Ready to start workstream 1 (anti-pattern rule rewrite) when Zach greenlights.

Apr 16, 10:59 PM

lg-1474c641 ses_b8a38b

progress

Workstream 1 shipped. Rule rewrite lands for contextual-frame cases (V2 growth, V3 LTV): agent uses merchant frame, proceeds to implementation. V1 (no frame) still partial gatekeeper because generator 2 unchanged. Heuristic labels too crude to reflect shift — real eval needs LLM-judge. Next: workstream 2 (margin demotion) to close V1 case.

Friday, April 17, 2026

Apr 17, 7:29 AM

lg-32895f58 ses_b8a38b

progress

Workstream 2 shipped. V1 cold case behavior shifted: agent now asks clarifying question first, surfaces tradeoffs conditionally not as counter-proposals, zero alternative numbers proposed, quick reply 'Let's go with these rewards' as primary action (was 'Recommend a safer setup'). V2/V3 maintained. Residual: AOV as norm reference still appears in V1 — LLM-judge anchor_not_invoked label will catch. Generators 1+2 done. Generator 3 (tool contract framing + base prompt) is next.

Apr 17, 7:38 AM

lg-79f4f0ea ses_b8a38b

progress

Workstream 4 shipped. 3-generator fix complete (WS1: anti-pattern rules, WS2: margin demotion, WS4: tool framing + base prompt). All 3 diagnostic variants now show shipping-quality behavior. V1 cold: agent commits to merchant's exact structure ('configure this exactly as you described'), 'Let's go with 10%/60%' primary quick reply. V2 growth: validates merchant frame, proceeds. V3 LTV: uses merchant LTV math, proceeds. Tradeoff: WS4 V1 dropped clarifying-question behavior WS2 added — agent now prefers surface-tradeoff + let-merchant-choose. Residual: industry-norm anchors still surface in V1 (WS5 territory). Remaining workstreams: 3 (default redefinition), 5 (deviation-signal duplication), 6 (eval audit + new scenarios), 7 (LLM-judge rubric).

Apr 17, 10:47 AM

lg-80ee4c47 ses_b8a38b

progress

WS3b shipped. Prompt-level negative instructions close the norm-anchor gap that doc changes alone couldn't reach (model pre-training leakage). V1 cold case now clean: zero norm citations, zero counter-numbers, 'Let's go with this' as only actionable quick reply. Full implementation fix (WS1+WS2+WS3+WS3b+WS4) structurally complete. Remaining: WS5 (deviation-signal duplication — potentially skip given WS4 base prompt), WS6 (eval audit of 27 scenarios + 5 new), WS7 (LLM-judge rubric). These are eval infrastructure, not behavioral fixes.

Apr 17, 8:49 PM

lg-57ca256f ses_8410b7

progress

Started work

Apr 17, 8:49 PM

lg-d2b97e58 ses_8410b7

progress

Picking up WS6 (eval audit + new scenarios) and WS7 (LLM-judge rubric). WS1-4, 3b shipped. WS5 skipped.

Apr 17, 8:59 PM

lg-d297ad60 ses_8410b7

progress

WS6 + WS7 drafted. 7 files written to projects/onboarding/eval-scenarios/: - 2026-04-17-ws6-audit-and-sweep.md (audit memo: 01/14/15/08 REWRITE, 22 KEEP + ambient fix; 16-file sweep with prescribed replacement) - 2026-04-17-28 through -32 (NS-1 through NS-4 + regression control; scenario IDs 28-32) - 2026-04-17-ws7-judge-rubric.md (8-label LLM-judge rubric with operational definitions, PASS/FAIL examples, edge cases, aggregate scoring, calibration tests) Key decisions logged: - Scenario 15 flagged REWRITE heavy — unsolicited conservative alternative after merchant accept is the exact gatekeeper behavior the fix is removing; Why-It-Works narrative needs full rewrite, not just pass-criteria tweak. - Scenario 22 KEEP — merchant raised concern first, agent's response is mechanism-grounded (closed channel, conversion-trigger), satisfies new platform_fraud_flagged_only_if_present label. - Ambient sweep: 16/27 scenarios carry 'margins and buying behavior' opening; prescribed replacement 'goals and your customers'. - Regression control picks math-contradicts-goal (not platform-infeasible) since it's the cleanest test of WS1 allowable-counter case 3. Not executed in this run: - Ambient sweep patch (mechanical replacement across 16 files) - Per-scenario rewrites for 01/08/14/15 - Baseline pass-rate capture against current agent These gate on confirmation from Zach that scenario decisions are right before editing existing scenarios — the 15 REWRITE-heavy call in particular could be contested.

Monday, April 20, 2026

Apr 20, 3:34 PM

lg-85e221a3 ses_8410b7

progress

WS6 fully complete. 12 commits across session: - d4b075c1 ambient sweep (16 files) - d93e5598/81ab1ef5/fa8a4ab7 scenario 15 rewrite (gatekeeper + voice + style) - b17418bc Store em-dash sweep (32 files) - 7ddf69c8 permission-asking segues (5 files) - 1d068858/6913c0cb scenario 01 rewrite + cleanup - 8ed970b3/0780c042 scenario 14 rewrite + cleanup - f5eaf66d scenario 08 rewrite - 7ae5871b title em-dash sweep (19 files) Pending per plan sequencing: 1. Baseline pass-rate capture (pre-ship) against rewritten + new scenarios 2. Full eval with variance (5x per scenario) 3. Ship Also flagged but NOT in scope: - Fabricated-customer-psychology patterns may exist in knowledge docs (reward-strategy.md, messaging-patterns.md) since scenarios mirror doc language. Worth a sweep of knowledge docs before ship.

Apr 20, 8:22 PM

lg-4d8f2e93 ses_4bd54b

progress

Reid picking up WS7 baseline capture — pre-WS1-4 eval against rewritten scenarios + knowledge docs

Apr 20, 11:01 PM

lg-6143a23d ses_4bd54b

progress

WS7 baseline captured — 0/5 overall_pass on NS-1/2/4, 0/4 NS-3 (1 run missing), 0/5 REG-32, 0/5 EX-01/14, 1/5 EX-08, 5/5 EX-15. Primary failure modes: clarifying_question_before_eval 0/15, tradeoff_surfaced 4/37, anchor_not_invoked 14/44. REG-32 fails as rubber-stamp not gatekeeper. Memo: projects/onboarding/eval-scenarios/2026-04-20-ws7-baseline.md

Wednesday, April 22, 2026

Apr 22, 1:51 PM

lg-3ae46277 ses_7298e7

progress

Reid picked up: task 1 (memo reframe), task 2 (NS-3-30 refill), task 3 (draft AC ticket).

Apr 22, 1:59 PM

lg-f87eec3b ses_7298e7

progress

Task 1 done: memo reframed as post-fix gate status. Task 2 done: NS-3-30 refilled (0/5 overall); judge returned FAIL on the refilled run; aggregate + memo per-label tables updated to match ground truth (corrected multiple stale numbers in the original memo). Moving to task 3 (AC ticket draft).

Apr 22, 3:46 PM

lg-b274e65b ses_7298e7

progress

Ticket draft is ready: projects/onboarding/tickets/2026-04-21-ac-gatekeeper-ship-gate.md. Evidence bundle pushed to Anafore/doc-shared at 3083a95. Ran two codex review passes (HoP + SE lenses) with surgical revisions between rounds. Added WARN-33 (Harbor Tide Supply) positive-warning scenario as a second balancing control alongside REG-32. All Anafore/doc-shared URLs in the ticket are live. Waiting for Zach's greenlight to file to AC board.

Apr 22, 3:53 PM

lg-e305ef4a ses_7298e7

progress

Filed to AC board as AC-104 (Task). https://anafore.atlassian.net/browse/AC-104. Evidence bundle pushed to Anafore/doc-shared at 3083a95. Ticket body + 10 scenario files + diagnostic JSON + eval transcripts JSON + plan v3 are all live on doc-shared. PM hand-off complete; eng owns iteration path.

Apr 22, 5:23 PM

lg-30da2743 ses_7298e7

progress

AC-104 description updated with verbatim production conversation from rc_landing.chat_messages (conv_69e06938..., agent_name=GuidedSetupAgent, 2026-04-16 04:44-04:45 UTC). Dropped 3-variant replay + 45-run eval references (both used non-production prompt). Added direct links to each scenario file in the eval table. Doc-shared pushed to 8b58c83.

Apr 22, 5:30 PM

lg-6bbdfcdc ses_7298e7

progress

Removed plan-v3.md from doc-shared and AC-104 evidence — same non-production-prompt methodology concern as the earlier JSON removals. Doc-shared now minimal: ticket.md + 10 scenario files + README. Pushed to b61360d.

Git State

Started f5cbcde (master) Current 3cef8f6 (master)

Sessions Involved

ses_b8a38b
ses_8410b7 (implementer)
ses_4bd54b
ses_7298e7