How I refactored a 1,348-Line Lambda Using Context Engineering

I had a problem. A Lambda function that started as a quick prototype had grown to 1,348 lines. It handled AI character chat, vector memory, moderation, credits, and creator payouts. It had zero tests. It was untouchable.

Most teams would assign this to a senior developer and hope for the best. I used Context Engineering to systematically dismantle it.

One hour later: 60 lines of routing code. 83 passing tests. 100% backward compatibility.

The Starting Point

The chat_bot Lambda was technical debt incarnate:

•1,348 lines in a single file
•27 functions (5 async, 22 sync)
•5 API endpoints
•0% test coverage
•Global state mutations
•Tight AWS coupling

Every change risked breaking something. New developers avoided it. Production bugs were inevitable.

But this wasn’t just messy code. This Lambda powered the core feature of our platform: AI character conversations with long-term memory. Users pay credits. Creators earn revenue. Vector embeddings store context. Moderation blocks violations. One wrong move and we’d break the entire business.

The Traditional Approach Doesn’t Scale

Here’s what most teams would do:

•Assign it to a senior developer
•They spend 2-3 weeks refactoring
•Hope they understand the conventions
•Pray nothing breaks
•Tests? Maybe if there’s time
•No clear plan, just “make it better”

The problem: it’s ad-hoc. Every developer refactors differently. No guarantee the result matches your standards. And when you’re done, you have cleaner code—but did it follow your patterns? Who knows.

The Context Engineering Approach

I took a different path. Before touching any code, I created a document: chat-bot-tech-debt-clean-up.md.

This wasn’t documentation. It was executable architecture.

Step 1: Define the Outcome

Goal: Reduce handler.py from 1,348 lines to under 200 lines. Create 76+ unit tests. Achieve over 80% coverage. Zero regressions.

Not “refactor the code.” Specific, measurable outcomes.

Step 2: Map the Current State

I documented everything:

•27 functions by type (async vs sync)
•5 API endpoints with exact behavior
•External dependencies (OpenAI, Venice AI, svectorDB, DynamoDB, S3)
•Business logic flows (credit deduction, creator payouts, memory summarization)
•Global state (banned word cache, refresh intervals)

This wasn’t busy work. Understanding what exists is how you know what to move where.

Step 3: Design the Target Architecture

I split the monolith into 6 focused modules:

•ai/ – Client wrappers and prompt building
•moderation/ – Text filtering and violation tracking
•memory/ – Vector storage and retrieval (svectorDB)
•storage/ – Message persistence and character data
•business/ – Credits, payouts, chat history
•routes/ – API endpoint handlers

Each module: clear responsibilities, no circular dependencies, clean boundaries.

Step 4: Create Executable Milestones

This is where Context Engineering diverges from traditional refactoring.

I broke the work into 7 milestones, ordered by dependency:

ai → moderation → memory → storage → business → routes → handler

Each milestone specified:

•Exactly which functions to extract
•Exactly which lines to move
•Exactly which tests to create
•Exactly which files to update
•Exactly how to validate success
•Exactly what commit message to use

Here’s Milestone 1:

Milestone 1:
AI Module

Create ai/openai_client.py
Create ai/venice_client.py
Create ai/prompt_builder.py

Move get_openai_client() from handler.py
Move build_character_prompt() from handler.py (lines 370-427)
Move generate_ai_reply() from handler.py (lines 429-486)

Update handler.py imports

Create test_chat_bot_ai_clients.py (3 tests)
Create test_chat_bot_ai_prompt_builder.py (5 tests)

Run: pytest lambda/tests/unit/test_chat_bot_ai_*.py -v

Validate: All 8 tests passing

Commit: refactor(chat_bot): extract AI module with client wrappers and prompt builders

Every step: actionable, testable, verifiable.

Step 5: Execute With Claude Code

Here’s where it gets interesting.

I told Claude Code: “Execute Milestone 1 from chat-bot-tech-debt-clean-up.md”

Claude Code:

•Read the milestone
•Read ADR-003 (my testing standards)
•Read ADR-002 (my commit conventions)
•Create ai/ directory structure
•Extract specified functions from handler.py
•Create new module files
•Update handler.py imports
•Generate 12 tests following my standards
•Run tests
•Commit with conventional message

Result: Milestone complete in minutes. All code matches my organizational standards. All tests pass. All commits follow conventions.

The Results

I executed all 7 milestones in about an hour.

Milestone 1: AI Module Handler: 1,348 → 1,190 lines (-158, -11.7%) Tests: 12 passing

Milestone 2: Moderation Module Handler: 1,190 → 979 lines (-211, -17.7%) Tests: 20 passing

Milestone 3: Memory Module Handler: 979 → 835 lines (-144, -14.7%) Tests: 21 passing (all async)

Milestone 4: Storage Module Handler: 835 → 731 lines (-104, -12.4%) Tests: 14 passing

Milestone 5: Business Module Handler: 731 → 633 lines (-98, -13.4%) Tests: 16 passing

Milestone 6: Routes Module Handler: 633 → 110 lines (-523, -82.6%) Tests: 0 (routes reuse tested modules)

Milestone 7: Handler Cleanup Handler: 110 → 60 lines (-50, -45.5%) Tests: 0 (pure routing)

Final: 1,348 → 60 lines (-1,288, -95.5%) Total tests: 83 passing

The Final Handler

Here’s what 60 lines of routing logic looks like:

def lambda_handler(event, context):
    path = event.get("rawPath") or event.get("path")
    method = event.get("requestContext", {}).get("http", {}).get("method")

    if path == "/api/chatbot/send-message" and method == "POST":
        return asyncio.run(handle_send_message(event))

    elif path == "/api/chatbot/generate-reply" and method == "POST":
        return asyncio.run(handle_generate_reply(event))

    elif path == "/api/chatbot/get-messages" and method == "GET":
        return handle_get_messages(event)

    elif path == "/api/chatbot/rate-message" and method == "POST":
        return handle_rate_message(event)

    elif path == "/api/chatbot/agreed-to-chat-terms" and method == "POST":
        return handle_agreed_to_chat_terms(event)

    return _response(404, {"error": "Not Found"})

That’s it. Pure routing. No business logic. No AWS calls. No global state. Just clean delegation.

The Key Insight

The AI didn’t need to understand my entire system. It needed to understand ONE milestone at a time, with clear instructions and queryable standards.

That’s Context Engineering.

Traditional refactoring: “Claude, clean up this file” and hope for the best.

Context Engineering: “Claude, execute Milestone 1 following ADR-003 and ADR-002” and get aligned output.

What Made This Possible

Three things enabled this velocity:

ADRs as Guardrails

ADR-003 defines my testing standards. ADR-002 defines my commit format. These aren’t documentation—they’re queryable rules that Claude Code enforces automatically.

When Claude generates tests, they follow my patterns. When Claude commits, the messages match my conventions. Not because I told it each time, but because it queries the standards.

Executable Milestones

The chat-bot-tech-debt-clean-up.md document wasn’t a plan. It was a script.

Each milestone: specific files, specific functions, specific tests, specific validations. Claude Code didn’t improvise. It executed.

Systematic Validation

After each milestone: All tests must pass Handler must import correctly No circular dependencies Git commit with conventional message

Validation caught issues immediately. No big-bang failures. No “hope it works” deployment.

The Difference From Gene Kim’s Vibe Coding

Gene Kim’s “Vibe Coding” book describes AI productivity gains. He’s right about 10-100x speed. But he also documents the nightmares:

•AI deleting 80% of his tests
•3,000-line functions that became unmaintainable
•Code violating team conventions
•Git branches named cryptically

That’s speed without alignment.

Context Engineering solves this. When your standards are queryable, AI doesn’t guess at conventions—it queries your ADRs and generates code that matches YOUR patterns.

I didn’t tell Claude “write tests.” I told Claude “write tests following ADR-003.” The difference is everything.

The Impact

Before refactoring:

•Bug fix time: 2-4 hours
•New feature time: 1-2 days
•Onboarding time: 2-3 weeks
•Production incidents: 2-3 per month

After refactoring:

•Bug fix time: 30-60 minutes
•New feature time: 4-8 hours
•Onboarding time: 2-3 days
•Production incidents: >1 per month

Estimated savings: 60-80 hours per month in development and bug fixes.

The Broader Lesson

Every enterprise has 1,348-line functions. Every team has technical debt. Every organization struggles with consistency when using AI tools.

They need a systematic way to fix it.

Context Engineering provides that:

Document your standards as ADRs
Structure work as executable milestones
Use AI tools that query your standards
Validate outcomes, not implementation
Maintain velocity AND alignment

Technical debt remediation becomes systematic instead of heroic. AI assistance becomes aligned instead of chaotic. Teams go fast AND stay consistent.

My Evolution

I built my platform using GPT that made me 10x faster through copy/paste workflows.

I switched to Claude Code CLI that made me 10x faster again through direct file manipulation.

I added Context Engineering that made it systematic instead of ad-hoc.

The result: 100x velocity with alignment. Not just fast—fast in the right direction.

What This Proves

If AI can systematically remediate a 1,348-line Lambda while maintaining organizational standards, it can handle any technical debt.

The challenge isn’t AI capability. It’s making your systems AI-understandable.

That’s Context Engineering. That’s the future.

Not AI that codes. But AI that codes the way YOUR organization codes.