td-2038d7

Intercom MCP → Merchant KB pipeline

closed task P2 Parent: td-5a8ff4
Created Mar 21, 2026 8:49 PM Updated Mar 23, 2026 11:37 AM
Description
Connect Intercom via MCP to extract verbatim customer conversations into merchant-knowledge repo. **Context:** - Exploring using official Intercom MCP server (https://mcp.intercom.com/mcp) — US region confirmed - Bearer token auth configured in .mcp.json but needs restart to activate - RC Intercom app ID: udx320s0 **Design decisions made:** - Store verbatim conversations, not summaries — nuance matters, flexible querying [[td-verbatim-decision]] - Each Intercom conversation → one markdown file in transcripts/, same pattern as meeting transcripts - New interaction type: 'support' (maybe 'live-chat' for sales) - Run through existing extraction prompt for structured fields (pain_points, feature_requests, etc.) - Profile stays lean (structured), transcript stays rich (verbatim) **Research completed:** - 4 Intercom MCP servers evaluated: official (Intercom), raoulbia-ai, fast-intercom, fabian1710 - Official chosen for exploration phase — clean tools, 6 endpoints (search, fetch, get_conversation, get_contact, search_conversations, search_contacts) - raoulbia-ai is backup if official doesn't fit production extraction (better batch processing, no region lock) **Intercom API constraints to know:** - 500 conversation parts max per conversation - 83 requests/10 seconds rate limit - Each conversation needs individual GET for full message content - source.body search only covers initial message, not replies - Emails must resolve to contact_id before filtering **Next steps:** 1. Restart Claude Code to activate Intercom MCP server 2. Browse real conversations — understand structure, volume, signal vs noise 3. Pull 10-20 sample conversations 4. Test mapping to existing transcript format 5. Design filtering heuristics (what's noise vs signal) 6. Decide: extend process.sh or write parallel ingestion script See voice log [[2026-03-21]] for POV entries on verbatim preference.
Handoff
Session Log (12 entries)
Mar 21, 8:53 PM
lg-6e11b694 ses_d1015b
progress
Started work
Mar 21, 9:14 PM
lg-18c20181 ses_d1015b
progress
Exploration complete. Intercom MCP live and working. Browsed conversation structure — identified tag-based filtering (Shopify Merchant, Other Platform Merchant = signal; Advocate/Affiliate, Spam = noise). Rich custom_attributes available: AI Title, Conversation Type/Sub-type, merchant goal summaries. Reviewed merchant-kb repo structure at ~/Claude/merchant-knowledge/ — profiles in merchants/profiles/{id}.json, transcripts in transcripts/, pipeline via process.sh. 314 merchants with data currently.
Mar 21, 9:25 PM
lg-a18ae21b ses_d1015b
progress
Sample extraction complete. Created 2 sample transcript+profile pairs in pipeline/intercom-samples/: Govee (API data deletion gap — high signal, feature request) and Wipebook (widget code onboarding — positive sentiment, review). Format validated — maps cleanly to existing merchant-kb schema with new 'intercom_metadata' sub-object for Intercom-specific fields. Key additions to interaction schema: source (intercom), intercom_id, intercom_metadata (conversation_type, subtype, merchant_goal, ai_title, fin_resolution, response_time, handling_time).
Mar 21, 9:47 PM
lg-2805aff4 ses_d1015b
progress
Code review complete (senior eng + data analyst). P0: SQL injection in ENRICH_QUERY, is_already_processed substring matching, is_signed_up operator precedence. P1: merchant_contact string/array mismatch, missing reward_structure/canonical fields, intercom_id dedup location, empty insight arrays, strip_html losing linebreaks. Fixing now.
Monday, March 23, 2026
Mar 23, 8:09 AM
lg-bcf2e83f ses_a29fc2
progress
Resuming: testing real batch Intercom extraction
Mar 23, 10:17 AM
lg-dbc6ab43 ses_a29fc2
progress
Intercom auth token added to .env.local. MCP config already in .mcp.json. Need Claude restart to activate. Next: test batch extraction of 10-20 conversations, validate output, then the weekly cron task (rc-mkb-intercom-sync) handles ongoing sync.
Mar 23, 11:31 AM
lg-2775ad70 ses_c0a399
progress
Resuming: Intercom auth token added, testing MCP pipeline
Mar 23, 11:51 AM
lg-8e550682 ses_f60aaf
progress
Intercom MCP pipeline tested end-to-end. 20 conversations fetched (21 qualifying, 1 pending), 19 converted (1 skipped as idle), all 19 merged into merchant-kb (16 new profiles, 3 merged). 14 profiles enriched via Redshift. Index regenerated: 336 total merchants. Embeddings blocked by disk space (988MB free, needs ~1.5GB for torch+sentence-transformers). Fixed None date bug in regenerate_index. 4 inbox transcripts still pending (Drive format, not Intercom).
Mar 23, 11:58 AM
lg-ed7fc335 ses_f60aaf
progress
Pipeline complete. 4 inbox Drive transcripts still pending — process.sh times out on claude -p calls. These are low priority (BlueNile 2018, Haylen DiFusco Oct 2025, Dylan Yuska x2 Feb/Mar 2026). Embeddings also pending — disk too full for torch (988MB free). Core Intercom pipeline fully validated end-to-end.
Mar 23, 12:09 PM
lg-911e6bf3 ses_f60aaf
progress
Inbox transcripts: 2/4 processed via process.sh (IPG/BlueNile, Dylan Yuska onboarding). 2 remain (Haylen DiFusco, Dylan Yuska #2) — Claude extraction failed to produce valid JSON. These can be retried or manually extracted next session.
Mar 23, 2:00 PM
lg-16a5ddfd ses_f60aaf
progress
Cron task created: rc-mkb-intercom-sync (Mon 08:15, 15min timeout). Timer installed and enabled. First test run timed out at 10min — bumped to 15min. Will validate on next Monday's scheduled run. Pipeline is fully operational for manual use; cron automation is set up pending successful unattended run.
Mar 23, 5:50 PM
lg-2c363af5 ses_873043
progress
Backfill started. Reviews perception analysis complete (13 years, 1505 reviews). Intercom backfill batch 1 done: 162 raw files fetched, 145 converted, 488 total merchants. ~700 remaining IDs saved in backfill summaries. Created td-2f7de5 for autonomous pickup. Qualifying rate is ~89% (much higher than initial 25% estimate). Updated should_extract filter to handle older conversations without tags.
Git State
Started f1f0a80 (master) Current eac1af2 (master)
Sessions Involved