The Honest Starting Point
Most "AI case studies" lead with user counts and retention curves. This one leads with what we built before opening the doors. MyFantasy.ai is live in beta with one show — RuPaul’s Drag Race All Stars — and the discipline you see below is what we proved on that show before promising a second.
Fantasy sports for reality TV means scoring events extracted from episodes — lip syncs, eliminations, alliances, untucked drama — with point values driven by a commissioner-editable ruleset. The wrong way to build it is to let an LLM write directly to the scoring table. The right way is to treat the model as a research assistant and treat the database as a court reporter.
Context Engineering is what made that boundary easy to enforce: the trust model is documented before the code is written, so every Lambda knows what it’s allowed to do.
Why It Shipped in 22 Days
The first commit is dated April 17, 2026. The beta launched against All Stars in early May. Between those two dates: 178 commits, 60 Lambda functions, 34 UI pages, and a working AI scoring pipeline. The speed is real, and it is repeatable.
The Trick Isn’t Speed — It’s the Pattern Library
A separate repository (outcomeops-adrs) holds the Architectural Decision Records covering Terraform, Lambda, testing, secrets, CI/CD, frontend, and analytics. The OutcomeOps MCP server indexes them as a RAG and serves them on demand. When Claude Code writes a Lambda, it queries the MCP for handler patterns. When it writes Terraform, it queries for module versions. The full library never enters the context window — only what the current task touches.
Every Lambda function in this app uses the community module that ADR-001 mandates, fetched from the MCP on demand:
module "auth_me_lambda" {source = "terraform-aws-modules/lambda/aws" version = "8.1.2" ...}Sixty of these. No hand-rolled IAM. No bespoke logging glue. The module is battle-tested, the version is pinned by ADR, and the AI never had to decide.
What Claude Code Didn’t Have to Invent
- • Terraform module versions (ADR-001)
- • Lambda handler structure & auth pattern (ADR-004)
- • Test layout and pytest fixtures (ADR-003, ADR-012)
- • Secrets handling (ADR-006)
- • CI/CD workflow shape (ADR-007)
- • React + Tailwind v4 conventions (ADR-005, ADR-010)
- • AWS resource tagging (ADR-011)
- • Decimal-over-float for money/math (ADR-009)
What Claude Code Did Decide
ADRs cover the patterns — not every choice. One example: every Lambda in this app runs on arm64 / Graviton (roughly 20% cheaper than x86 at identical performance). That call isn’t in any ADR yet. Claude Code made it independently on the first Lambda and stayed consistent across the other 59.
Good outcome — and a candidate for the next ADR.
The Scoring Pipeline
Four stages. Only one of them writes to the scoring table, and a human gates that write.
Extract
Bedrock Haiku 4.5 reads episode transcripts and produces candidate scoring events. Cost is metered per call; CloudWatch tracks tokens in/out per episode.
Dedupe
Each candidate is fingerprinted as SHA256(event_type + first_quote)[:8]. Duplicates from re-runs collapse to the same row — the model can hallucinate the same event twice and the index will not.
Review
An admin opens /admin/scoring/review/ and approves, edits, or rejects each candidate. Nothing is auto-scored. The model is a draftsman; the human is the editor.
Finalize
Approved events become SCORE# rows in the single-table DynamoDB design. Standings recompute atomically. A weekly Sonnet analyst run summarizes performance per league.
Real Decisions That Made It Work
Every one of these decisions is documented before it’s coded. That is what Context Engineering means in practice — the AI doesn’t pick a pattern, it executes one that’s already approved.
Deterministic Event Hashing
A model that’s asked to extract events from the same transcript twice should produce the same row, not two. The 8-character SHA256 prefix of event_type + first_quote turns idempotency into a constraint instead of a hope.
Frequency-Bucketed Rulesets
Some events score once per season (winning the show). Some score once per partner (forming an alliance with someone new). Some score every time (an argument). The ruleset encodes the frequency semantics — once_per_season, once_per_partner, per_event, per_episode — and the extraction handler pre-computes the dedup key so the rule applies before the scoring math.
Commissioner-Editable Rulesets
Every show owner gets a deep-merge of the default ruleset with their own overrides — gated by an allowlist of editable fields and a state machine that restricts what can change mid-season. Once a season is active, only lineup_size and late_lineup_penalty_multiplier stay mutable. The rules are flexible, the boundaries aren’t.
Single-Table DynamoDB
One DynamoDB table per environment holds leagues, shows, events, scores, standings, sessions, and chat. Composite keys (PK/SK) and per-entity prefixes do the work that schemas usually do. No Redis. No DAX. No leaderboard cache. DynamoDB is fast enough, and the design stays legible.
Bedrock Economics, Measured Not Estimated
These numbers come from real All Stars episodes, with CloudWatch token counts logged at every call. Not a pricing-page calculator — the receipts.
At current beta scale (one season of All Stars) the inference bill rounds to a coffee. The cost model holds linearly — adding the next show is a finance question, not a re-architecture question.
Beta Discipline
Beta with real users on a real show is the test — not a soft launch where the rough edges hide. The choices below are the ones the ADRs settled before the first line of code shipped.
- Magic-link auth, no passwords. SES-delivered 15-minute codes exchange for 7-day JWTs. Session revocation is a DynamoDB
SESSION#row with TTL. Secrets follow ADR-006. - Terraform workspaces per env, community modules everywhere. ADR-001 pins
terraform-aws-modules/lambda/awsat 8.1.2; every one of the 60 Lambdas uses it. Dev applies are automated; production applies need a human on the keyboard. - Tests on every Lambda from day one. ADR-003 sets the pytest layout; ADR-012 autouses AWS mocks so a missing
motofixture never leaks to billable AWS calls. The replay harness extracts past episodes against the current model to catch ruleset regressions before live shows. - Lambda arm64 across the board. Roughly 20% cheaper than x86 at the same performance. Not in any ADR yet — Claude Code defaulted to it on the first Lambda and stayed consistent. A pattern worth promoting back into the library.
The Point
A blank repo on April 17. A beta live against All Stars 22 days later. 60 Lambda functions, 178 commits, $0.031 per episode. The speed is unremarkable when you stop asking the AI to invent the patterns.
Context Engineering is a pattern library Claude Code can query. MyFantasy.ai is what that looks like.