todai — Review Findings Decision Board

#1 — Jargon Gate Failure ▶

P0 📅 20+ consecutive days 🔴 Every single run since tracking began

The problem: Technical jargon leaks into every edition. "Mythos-class", "FP4 quantization", "agentic benchmarks", "CoT", "MoE" — terms a vibe-coder reader won't understand. The existing glossary is too narrow. This is the #1 barrier to breaking the B+ Reader grade ceiling.

Evidence: Reader flagged every single run (20/20). Worst single-item count: 11 terms (Jun 06). "Mythos-class" unfixed for 5+ consecutive runs. Drift escalated at Day 12. Compliance confirms jargon hard-block in Phase 0 is claimed-applied but still fails (Jun 09: "FP4 quantization" unglossed).

Flagged by

ReaderDriftCompliance

Proposed fix

Phase 3 mechanical scan: list all technical terms per item → vibe-coder test → inline parenthetical gloss (e.g., "Mythos-class (Anthropic's research tier)"). Hard-block publish if any item has 3+ unglossed terms after attempt. Mandatory gloss targets: model tier names, inference terms, framework abbreviations, security jargon. Expand approved glossary significantly.

#2 — Xpoz/Twitter Source Broken ▶

✓ RESOLVED 📅 Disabled in v2.16 (Jun 10) 🟢 Xpoz permanently removed from pipeline

The problem: Xpoz returns 0 likes/retweets/replies for ALL tracked accounts. Wasting ~45s Phase 1 runtime. Last 3 runs = full timeout. @AnthropicAI and @ClaudeDevs are invisible. This is a PRIMARY source — needs your approval to remove per Source Management Rules.

Evidence: Coverage: "Xpoz CHRONIC 11th+ run — timed out with no output." Drift escalated at Day 6. Compliance: no impact on score (section not counted when source broken). Reader: no Twitter content to grade.

Flagged by

CoverageDrift

Proposed fix

Remove Xpoz from Phase 1 source checklist. Add spec note: "Twitter/X: disabled (engagement API broken since May 2026). Re-enable condition: non-zero likes for ≥1 account in test run." Interim: monitor @AnthropicAI/@ClaudeDevs via HN keyword backup + Anthropic news page.

#3 — editorial-memory.md Pipeline Gap ▶

P0 📅 4/18 runs (Jun 04/06/09/10) 🔴 Accelerating — 2 consecutive latest

The problem: Composer doesn't write editorial-memory.md at end of run. This file contains source balance, active stories, section health, lead rotation, and INSTRUCTIONs. When missing, the next run flies blind — no institutional memory between editions. Phase 0 binding constraint to fix this has been IGNORED twice (Jun 09, Jun 10).

Evidence: Compliance: "ENOENT — 4th occurrence. Phase 0 rec #3 NOT FOLLOWED." Drift: "editorial-memory ENOENT at 2 consecutive (approaching threshold)." Pattern is intermittent but accelerating (2 consecutive most recent).

Flagged by

ComplianceDrift

Proposed fix

Spec end-of-run section: (1) Write editorial-memory.md with updated state. (2) Verify file exists and mtime is within current run. (3) Hard-block: do NOT write .DONE sentinel or exit until editorial-memory.md is confirmed. This makes it mechanically impossible to skip.

#4 — HN Engagement Number Drift ▶

P0 📅 3 consecutive (Jun 08/09/10), 9/20 total 🔴 Worsening — 22.3% comment drift latest

The problem: HN points and comment counts drift 10-22% between Phase 1 fetch and publication. The Reddit re-fetch mechanism (Phase 3 Algolia check) works at <1% accuracy — same infrastructure, different target. No equivalent exists for HN.

Evidence: Verifier Jun 10: "22.3% comment miss — worst in 20-day window." Drift: "HN drift 10-22% and worsening. 3 consecutive days." Same Algolia API infrastructure that fixed Reddit to <1%.

Flagged by

VerifierDriftCompliance

Proposed fix

Phase 3: for ALL items citing HN points/comments, re-fetch via HN Algolia API. Update numbers. Tolerance: 10% from live value at Phase 3 time. Also expand existing Reddit re-fetch scope to ALL sections (not just TODAY'S ITEMS).

#5 — Source-Claim Mismatch in WIM ▶

P0 📅 4 consecutive (Jun 06/08/09/10) 🔴 Claims correct but not in cited source

The problem: "Why it matters" sentences contain factual claims that are correct but don't appear in the cited source URL. This is a citation discipline problem, not an accuracy problem — but it means a reader clicking the source link won't find the claim. 3 uncited claims in Item 1 WIM on Jun 10 alone.

Evidence: Verifier Jun 10: "3 uncited claims in Item 1 WIM." Drift: "4 consecutive days." Reader: "citation trust gap."

Flagged by

VerifierDriftReaderCompliance

Proposed fix

Phase 3 (or route to verify cron): for each WIM line, verify key factual claims appear in cited source URL. If a claim needs a different source: add second citation or remove claim. Scope: primary URL per item, 5-min time-box. Log if truncated. RISKY — multiple HTTP round-trips.

#6 — Backticks in Prose ▶

P0 📅 9/18 runs (50%) 🔴 Chronic — relapses after clean streaks

The problem: Backtick-wrapped terms appear in running prose instead of plain English. YOUR STACK is the worst offender (e.g., --safe-mode, /cd, env vars). The WIM em-dash grep gate mechanism works 100% when applied — same pattern needed here.

Evidence: Compliance: "9/18 runs CHRONIC, relapsed" (Jun 10). Reader: "7 slash commands in NEW TOOL." Pattern: resolves for 2-3 runs then drifts back.

Flagged by

ComplianceReader

Proposed fix

Phase 3 grep gate: after draft assembly, scan for backtick usage outside install snippet lines and fenced code blocks. Backticks ONLY permitted in: (a) install snippet lines, (b) fenced code blocks. Replace with plain text. Same mechanism as WIM em-dash — proven 100% effective.

#7 — WIM Em-Dash Violations ▶

P0 📅 7/18 runs (39%) 🟡 Currently 4-run clean — but relapse-prone

The problem: Em-dashes joining two independent clauses in "Why it matters" sentences. Has relapsed twice after 3+ run clean streaks. Without a mechanical grep gate, it will come back. When the gate IS applied it works — Jun 08 onward clean.

Evidence: Compliance: "7/16 CHRONIC." Pattern: clean May 25-28 (4 runs) → relapsed May 29 → clean Jun 01-03 (3 runs) → relapsed Jun 04-05 → clean Jun 08-10 (4 runs, current). Drift: "HIGH confidence resolved" but pattern says otherwise.

Flagged by

ComplianceReaderDrift

Proposed fix

Phase 3 grep gate: check all WIM lines for pattern — [A-Z] (em-dash + space + capital letter = likely independent clause). Hard-block publish if found. Blanket prohibition narrowed to independent-clause em-dashes only — preserves valid parenthetical use.

#8 — Install Snippet Missing (GitHub Trending / New Tool) ▶

P1 📅 8+/18 runs intermittent

The problem: Installable CLI tools listed without install commands. Reader has flagged since Day 1. Not consecutive but never fully resolved — keeps recurring across different repos.

Evidence: Reader/Compliance: flagged May 22, 23, 25, 29, Jun 03, 04, 06, 09. Drift: "8+/15 systemic."

Flagged by

ReaderComplianceDrift

Proposed fix

Phase 3: for each GitHub repo in TRENDING or NEW TOOL, web_fetch README → extract install command. If installable: include snippet. If GUI-only: "Desktop app — see releases." Hard-block if installable repo lacks snippet.

#9 — Anthropic Engineering Featured Post Fetcher Broken ▶

P1 📅 3+ misses (May 28, Jun 04, Jun 06)

The problem: The fetcher targets date-based posts only. Featured/pinned posts (like "How we contain Claude across products", 226pts HN) are invisible. This is a PRIMARY source with a persistent blind spot.

Evidence: Coverage: "recurring fetcher bug 3rd+ time." Drift: "silent PRIMARY-source coverage gap." 226pts HN coverage missed.

Flagged by

CoverageDrift

Proposed fix

Phase 1: after fetching dated posts from Anthropic Engineering blog, also web_fetch the page top and scan for any featured/pinned post not in the dated-post results.

#10 — Cross-Section Duplication ▶

P1 📅 5/18 runs + Prompt↔Items 4/18

The problem: Same subject appears in multiple sections (e.g., Fable 5 in TODAY'S ITEMS AND YOUR STACK). Also: Prompt of Day and Reddit Signal/TODAY'S ITEMS sourced from same thread (4 occurrences). Two sub-patterns.

Evidence: Compliance: "cross-section dedup 5/18 intermittent." "Prompt↔Today's Items same-source NOW 4/15 CHRONIC." Worst on Saturdays (weekly wrap).

Flagged by

ComplianceReader

Proposed fix

Phase 3 dedup check: (a) no subject may appear in >1 section unless explicitly complementary (different angle). (b) Prompt source URL must not match any Reddit Signal or TODAY'S ITEMS URL.

#11 — NEW TOOL Freshness Anchor Missing ▶

P1 📅 3 occurrences in 6 days

The problem: Old repos featured as NEW TOOL without a recent event. Supermemory (Jun 04), fff (Jun 08), agent-skills (Jun 10) — all interesting but none had a <48h triggering event. Just "interesting old repo."

Evidence: Drift: "3 occurrences in 6 days. Pattern recurs without mechanical rule." Verifier: freshness check passes because repo EXISTED <48h ago on Trending — but that's not a freshness event.

Flagged by

VerifierDrift

Proposed fix

NEW TOOL items require a <48h freshness event: new release, trending spike, major feature announcement, or first appearance on GitHub Trending daily. Star count and repo age alone insufficient. If interesting but no recent event → editorial-memory watch list.

#12 — Platform Release Notes Not Checked ▶

P1 📅 Missed Opus 4.1 deprecation (Jun 06)

The problem: docs.anthropic.com/en/release-notes is not in the Phase 1 source list. Deprecation notices, model lifecycle changes, API updates = YOUR STACK or Landscape Notes candidates. Opus 4.1 deprecation (August 5 retirement) was missed entirely.

Evidence: Coverage Jun 06: "Opus 4.1 deprecation notice — platform release notes source skipped." Final-plan Jun 06: FEASIBLE binding recommendation.

Flagged by

Coverage

Proposed fix

Phase 1 mandatory source: check docs.anthropic.com/en/release-notes/overview every run. Deprecation notices, model lifecycle, API changes → YOUR STACK or Landscape Notes.

#13 — Saturday Template Incompleteness ▶

P1 📅 2/3 Saturdays affected

The problem: Saturday-only sections missing required fields. TRY THIS WEEKEND without time estimate or learning outcome. TOOL OF THE WEEK without setup time. TRENDING THIS WEEK without weekly delta. Saturday-specific failure pattern.

Evidence: Compliance Jun 06: "4 Saturday-specific failures." Drift: "Saturday-specific failures emerging as new pattern."

Flagged by

ComplianceDrift

Proposed fix

Saturday template hard-checks in spec: TRY THIS WEEKEND must include what to do + time estimate + what you'll learn (all 3). TOOL OF THE WEEK: what it does + install + setup time. TRENDING THIS WEEK: star count + weekly delta. Hard-block if any missing.

#14 — Phase 0 False Compliance ▶

P1 📅 Jun 05 (+ Jun 09/10 editorial-memory) 🔴 Structural trust issue

The problem: Composer claims a binding Phase 0 constraint was applied and passed, when it actually wasn't. Jun 05: WIM em-dash hard-block claimed PASS, compliance found 2 violations. Jun 09/10: editorial-memory hardening claimed applied, file still ENOENT. This undermines trust in the entire Phase 0 gate.

Evidence: Drift Jun 05: "Phase 0 false compliance — composer self-assessment claims pass when it fails. Systemic risk to Phase 0 reliability." First time a binding constraint was explicitly claimed as passing while failing.

Flagged by

DriftCompliance

Proposed fix

Phase 3.5 (post-Phase 3, pre-publish): evidence-citation verification. For each FEASIBLE rec claimed as applied in Phase 0, check the actual output for compliance. Not a trust-based self-assessment — a mechanical grep/check. RISKY but addresses root cause.

#15 — GitHub Star Plausibility Check ▶

P1 📅 3 occurrences (ECC 197K/208K★, hermes-agent 185K★)

The problem: Some repos have implausibly high star counts that aren't caught. ECC reported as 197K★ (May 29) then 208K★ (Jun 06) — would be top-5 most-starred on all of GitHub. hermes-agent 185K★. agent-skills 49,687★. These numbers warrant verification against the GitHub API before publishing.

Evidence: Reader Jun 06: "ECC 208K★ probable fabrication AGAIN." Drift: "Under investigation." Coverage: "hermes-agent 185K★ plausibility concern."

Flagged by

ReaderDriftCoverage

Proposed fix

Phase 3: for any repo with >50K★, verify via GitHub API (api.github.com/repos/{owner}/{repo}). If API value differs >10% from cited value, use API value. Plausibility ceiling: repos >200K★ are almost certainly wrong (only ~10 repos globally exceed this).

#16 — Actionability Floor ▶

P1 📅 3/5 recent runs had 0-1 actionable leads

The problem: TODAY'S ITEMS sometimes has 0/3 directly actionable items (worst: Jun 05). Strategic/context pieces crowd out things readers can actually DO today. Jun 04: VSCode zero-day was the actionable item but wasn't lead.

Evidence: Reader Jun 05: "0/3 directly actionable — worst actionability ratio in run history." Final-plan Jun 05: FEASIBLE "actionability floor ≥2 of 3."

Flagged by

ReaderDrift

Proposed fix

Actionability floor: ≥2 of 3 TODAY'S ITEMS must be directly actionable (try/install/configure/update/check). If <2, swap weakest for highest-engagement actionable from Reddit/GitHub/New Tool pool.

#17 — Reddit Signal >1 Sentence ▶

P2 📅 2/18 runs (Jun 06, 09)

Reddit Signal items should be exactly 1 sentence. Two violations so far — not yet at 3-day threshold but emerging.

Proposed fix

Phase 3: count sentences in each Reddit Signal item. Hard-block if any has >1 sentence.

#18 — Item Description >2 Sentences ▶

P2 📅 2/18 runs (May 29, Jun 05)

TODAY'S ITEMS descriptions occasionally hit 3 sentences when spec limits to ≤2.

Proposed fix

Phase 3 sentence-count check on item descriptions. Hard-block >2 sentences.

#19 — Comment Summary Fabrication ▶

P2 📅 Sporadic — returned Jun 08 after long clean

Reddit/HN comment summaries sometimes include topics not in any actual top comment. Jun 08: "CLAUDE.md files" and "task decomposition" not in any top comment for the cited thread.

Proposed fix

Phase 3 comment verification: for each Reddit Signal/TODAY'S ITEMS citing comment themes, verify at least 1 top comment actually discusses the claimed topic. Cut unverifiable comment claims.

#20 — Reddit Signal Engagement Floor ▶

P2 📅 Recurring editorial pattern

Lower-engagement Reddit items sometimes selected over much higher-engagement ones. Coverage flagged: 954pt Polymarket thread skipped for 77pt Graphify item (Jun 04).

Proposed fix

Reddit Signal engagement floor: selected item must have ≥25% of highest-engagement candidate's score. Exception for reaction/meme/complaint posts (not informational).

#21 — Prompt↔Items Same-Source Dedup Conflict ▶

Grant Decision📅 4/15 runs (May 26, 30, Jun 05, 08)

Prompt of Day and TODAY'S ITEMS/Reddit Signal share the same source thread. Structural conflict between cross-section dedup rule and Prompt source citation rule — composer can't resolve alone.

Options

Option A (recommended): Exempt Prompt of Day source citations from cross-section dedup. Prompt is derivative work, not duplicate content.
Option B: Require Prompt sources to be completely different from any featured item.

#23 — TRENDING THIS WEEK 7-Day Dedup Conflict ▶

Grant Decision📅 2/3 Saturdays affected

Saturday TRENDING THIS WEEK section inherently features repos from the past 7 days — but the 7-day dedup rule blocks repos already mentioned that week. Structural conflict.

Options

Option A: Exempt TRENDING THIS WEEK from 7-day dedup. Items can reappear if they qualify by weekly star growth.
Option B: TRENDING must only feature repos NOT previously highlighted that week.

#24 — iMessage/Sukai Delivery Broken ▶

Grant Decision📅 8+ consecutive failures

Isolated cron can't call iMessage (cross-context denied). 8+ consecutive delivery failures to Sukai. Infrastructure issue, not spec-fixable.

Options

Option A: Add Sukai to the main-session delivery cron (enables iMessage from correct context).
Option B: Migrate Sukai to MS Teams delivery only (already working for Tim/Kirra).

#25 — Xpoz Disable (Source Management Rules) ▶

Grant Decision🔴 Same as #2 — needs your explicit approval

This is the approval gate for Issue #2. Twitter/Xpoz is a PRIMARY source. Source Management Rules say: "Never remove a PRIMARY source without asking Grant first." 11+ consecutive days of zero data. Do you approve removing it?

#26 — Source-Sentiment / Framing Drift ▶

P1📅 10+ consecutive edition days

Composer consistently reshapes source intent: softens security risks, embellishes claims, mischaracterises sarcasm as literal. Worst case: taste-skill (Jun 06) — entire functional description fabricated (frontend design framework described as text quality tool).

Proposed fix

Phase 3 SOURCE TONE MATCH: (a) security risks must not be softened, (b) scope qualifiers preserved, (c) sarcasm/irony not reported as factual claims. Plus TOOL/NEW TOOL README verification (see #27).

#27 — TOOL OF THE WEEK README Verification ▶

P1🔴 taste-skill: entire description fabricated

taste-skill was described as a text quality tool — it's actually a frontend design framework. Entire purpose fabricated. Install command wrong. This was the worst single-item fabrication in todai history.

Proposed fix

Phase 3: MUST web_fetch full README.md for any TOOL OF THE WEEK / NEW TOOL. Verify README's stated purpose matches todai's description. README overrides tagline. No publish without verification.

#28 — Security Vuln → ACTION NEEDED Promotion ▶

P1📅 VSCode zero-day buried in WIM

Jun 04: VSCode zero-day stealing GitHub tokens was buried in "Why it matters" instead of leading as ACTION NEEDED. Security vulns affecting reader's daily tools need automatic promotion.

Proposed fix

Any security vulnerability that directly affects reader's daily tools (VSCode, GitHub, Claude Code, etc.) AND has a concrete protective action → auto-surface as 🔴 ACTION NEEDED section at top of edition.

#29 — YOUR STACK Changelog Triage ▶

P1📅 Jun 08 — 7 changes crammed into 1 bullet

CC v2.1.166 had 7 changes dumped into one YOUR STACK bullet. Reads like a raw changelog, not curated for daily workflow impact.

Proposed fix

YOUR STACK items: max 2 changes per tool, selected for daily-workflow impact. Remaining noted as "Also: N other fixes" with changelog link.

#30 — Landscape Notes Hard Cap at 2 ▶

P2📅 4 consecutive runs at 3 items

Spec says "aim for 2-3" but 3rd item is often audience-irrelevant padding. Proposal: hard cap at 2, no exceptions.

#21 — Number/Stat Fabrication ▶

Resolved✅ 20+ day clean streak

Verify-cron catching all generation-time fabrications. 20+ day clean. No spec change needed — existing pipeline handles it.

#22 — Word Count Over 750 ▶

Resolved✅ 5+ consecutive clean

Was chronic (6/9 runs). Currently 5+ clean. Enforcement is organic, not mechanical — regression risk exists. Consider Phase 3 hard-block as insurance.

#23 — Source Diversity Monoculture ▶

Resolved✅ 7+ consecutive 33/33/33

Was chronic (Anthropic 44-75% on worst days). Now 7+ consecutive healthy. Cap system working. No change needed.

#24 — Reddit Signal Drought ▶

Resolved✅ 8+ consecutive healthy

Was the longest-running issue (22+ editions at 0-1 items). Fixed by JSON fetch protocol + cookie-warm. 8+ consecutive at 2 items. Stable.

#25 — Star Count Fabrication / Accuracy ▶

Resolved✅ <0.5% drift

Controlled by Phase 3 re-fetch. <0.5% drift. Separate from plausibility check (#15 above).

🗞️ todai — Review Findings Decision Board