AI-Suggested, Human-Approved: Building MyFantasy.ai at $0.031 per Episode

The Honest Starting Point

Most "AI case studies" lead with user counts and retention curves. This one leads with what we built before opening the doors. MyFantasy.ai is live in beta with one show — RuPaul’s Drag Race All Stars — and the discipline you see below is what we proved on that show before promising a second.

Fantasy sports for reality TV means scoring events extracted from episodes — lip syncs, eliminations, alliances, untucked drama — with point values driven by a commissioner-editable ruleset. The wrong way to build it is to let an LLM write directly to the scoring table. The right way is to treat the model as a research assistant and treat the database as a court reporter.

Context Engineering is what made that boundary easy to enforce: the trust model is documented before the code is written, so every Lambda knows what it’s allowed to do.

Why It Shipped in 22 Days

The first commit is dated April 17, 2026. The beta launched against All Stars in early May. Between those two dates: 178 commits, 60 Lambda functions, 34 UI pages, and a working AI scoring pipeline. The speed is real, and it is repeatable.

The Trick Isn’t Speed — It’s the Pattern Library

A separate repository (outcomeops-adrs) holds the Architectural Decision Records covering Terraform, Lambda, testing, secrets, CI/CD, frontend, and analytics. The OutcomeOps MCP server indexes them as a RAG and serves them on demand. When Claude Code writes a Lambda, it queries the MCP for handler patterns. When it writes Terraform, it queries for module versions. The full library never enters the context window — only what the current task touches.

Every Lambda function in this app uses the community module that ADR-001 mandates, fetched from the MCP on demand:

module "auth_me_lambda" {source  = "terraform-aws-modules/lambda/aws" version = "8.1.2" ...}

Sixty of these. No hand-rolled IAM. No bespoke logging glue. The module is battle-tested, the version is pinned by ADR, and the AI never had to decide.

What Claude Code Didn’t Have to Invent

• Terraform module versions (ADR-001)
• Lambda handler structure & auth pattern (ADR-004)
• Test layout and pytest fixtures (ADR-003, ADR-012)
• Secrets handling (ADR-006)
• CI/CD workflow shape (ADR-007)
• React + Tailwind v4 conventions (ADR-005, ADR-010)
• AWS resource tagging (ADR-011)
• Decimal-over-float for money/math (ADR-009)

What Claude Code Did Decide

ADRs cover the patterns — not every choice. One example: every Lambda in this app runs on arm64 / Graviton (roughly 20% cheaper than x86 at identical performance). That call isn’t in any ADR yet. Claude Code made it independently on the first Lambda and stayed consistent across the other 59.

Good outcome — and a candidate for the next ADR.

The Scoring Pipeline

Four stages. Only one of them writes to the scoring table, and a human gates that write.

Stage 1

Extract

Bedrock Haiku 4.5 reads episode transcripts and produces candidate scoring events. Cost is metered per call; CloudWatch tracks tokens in/out per episode.

Stage 2

Dedupe

Each candidate is fingerprinted as SHA256(event_type + first_quote)[:8]. Duplicates from re-runs collapse to the same row — the model can hallucinate the same event twice and the index will not.

Stage 3

Review

An admin opens /admin/scoring/review/ and approves, edits, or rejects each candidate. Nothing is auto-scored. The model is a draftsman; the human is the editor.

Stage 4

Finalize

Approved events become SCORE# rows in the single-table DynamoDB design. Standings recompute atomically. A weekly Sonnet analyst run summarizes performance per league.

Real Decisions That Made It Work

Every one of these decisions is documented before it’s coded. That is what Context Engineering means in practice — the AI doesn’t pick a pattern, it executes one that’s already approved.

Deterministic Event Hashing

A model that’s asked to extract events from the same transcript twice should produce the same row, not two. The 8-character SHA256 prefix of event_type + first_quote turns idempotency into a constraint instead of a hope.

Frequency-Bucketed Rulesets

Some events score once per season (winning the show). Some score once per partner (forming an alliance with someone new). Some score every time (an argument). The ruleset encodes the frequency semantics — once_per_season, once_per_partner, per_event, per_episode — and the extraction handler pre-computes the dedup key so the rule applies before the scoring math.

Commissioner-Editable Rulesets

Every show owner gets a deep-merge of the default ruleset with their own overrides — gated by an allowlist of editable fields and a state machine that restricts what can change mid-season. Once a season is active, only lineup_size and late_lineup_penalty_multiplier stay mutable. The rules are flexible, the boundaries aren’t.

Single-Table DynamoDB

One DynamoDB table per environment holds leagues, shows, events, scores, standings, sessions, and chat. Composite keys (PK/SK) and per-entity prefixes do the work that schemas usually do. No Redis. No DAX. No leaderboard cache. DynamoDB is fast enough, and the design stays legible.

Bedrock Economics, Measured Not Estimated

These numbers come from real All Stars episodes, with CloudWatch token counts logged at every call. Not a pricing-page calculator — the receipts.

$0.031

per episode

Bedrock Haiku 4.5 candidate extraction. Includes transcript context window plus structured-output overhead.

$0.020

per league per week

Bedrock Sonnet 4.5 analyst rollup. Generates the weekly recap that summarizes which players scored what and why.

At current beta scale (one season of All Stars) the inference bill rounds to a coffee. The cost model holds linearly — adding the next show is a finance question, not a re-architecture question.

Beta Discipline

Beta with real users on a real show is the test — not a soft launch where the rough edges hide. The choices below are the ones the ADRs settled before the first line of code shipped.

Magic-link auth, no passwords. SES-delivered 15-minute codes exchange for 7-day JWTs. Session revocation is a DynamoDB SESSION# row with TTL. Secrets follow ADR-006.
Terraform workspaces per env, community modules everywhere. ADR-001 pins terraform-aws-modules/lambda/aws at 8.1.2; every one of the 60 Lambdas uses it. Dev applies are automated; production applies need a human on the keyboard.
Tests on every Lambda from day one. ADR-003 sets the pytest layout; ADR-012 autouses AWS mocks so a missing moto fixture never leaks to billable AWS calls. The replay harness extracts past episodes against the current model to catch ruleset regressions before live shows.
Lambda arm64 across the board. Roughly 20% cheaper than x86 at the same performance. Not in any ADR yet — Claude Code defaulted to it on the first Lambda and stayed consistent. A pattern worth promoting back into the library.

The Point

A blank repo on April 17. A beta live against All Stars 22 days later. 60 Lambda functions, 178 commits, $0.031 per episode. The speed is unremarkable when you stop asking the AI to invent the patterns.

Context Engineering is a pattern library Claude Code can query. MyFantasy.ai is what that looks like.

Back to Case Studies

AI Suggests. Humans Decide. The Database Records.