Context Engineering vs. Nova Forge: $100K to Bake In What You Could Retrieve for $0.05

Nova Forge trains your knowledge into the weights. Context Engineering retrieves it live. One of these costs $100K/year. The other costs five cents per query.

Brian Carpio
Context EngineeringAWSNova ForgeEnterprise AIOutcomeOps

When AWS launched Nova Forge at re:Invent 2025, it answered a question enterprises have been asking since GPT-3: How do I make AI that actually knows my business?

Nova Forge lets you take Amazon's Nova models and bake your proprietary data into the weights. Start from early training checkpoints. Blend your data with Amazon's curated training data. Deploy the result as a private model on Bedrock.

It costs $100,000 per year. Plus compute. And that doesn't include help from Amazon engineers.

Early adopters include Reddit, Sony, Booking.com, and Nimbus Therapeutics. It's a real capability. For certain problems, it's the right answer.

But for AI Engineering and Organizational Intelligence — the problems most enterprises are actually trying to solve — it's the wrong tool entirely.

What Nova Forge Actually Does

Nova Forge introduces what Amazon calls "open training." You get access to Nova model checkpoints at different stages — pre-training, mid-training, and post-training — and blend your proprietary data in at whichever phase makes sense. Amazon provides curated datasets you mix with yours to prevent catastrophic forgetting, where the model loses its general capabilities while learning your domain.

The result is what Amazon calls a "Novella" — a private, custom version of a Nova model with your domain knowledge embedded in its weights. Deploys on Bedrock with the same security and APIs as any other model.

This is legitimately powerful. Reddit trained a Nova-based moderation model on their proprietary data and it outperformed commercially available LLMs on their internal tasks. Nimbus Therapeutics built a drug discovery assistant that showed 20-50% improvement over Claude Sonnet on property prediction benchmarks. Sony is building domain-specific models for review and assessment workflows.

Good use cases. They all share one pattern: the knowledge they're embedding rarely changes, and the model needs to reason differently — not just know more.

What Context Engineering Actually Does

Context Engineering takes the opposite approach. Keep the model generic. Feed it exactly the right context at query time.

Instead of training a model that "knows" your architecture standards, you maintain those standards as living documents — ADRs, runbooks, compliance policies, code-maps — and a retrieval system pulls the relevant ones into the model's context window when someone asks a question or when an automated workflow needs to make a decision.

The model doesn't need to "know" your organization. It needs to read your organization's current state. Right now. Every time.

Here's what that looks like in production. When a developer on OutcomeOps creates a pull request, the system:

  1. 1.Queries the RAG pipeline during planning to retrieve relevant ADRs, code-maps, and standards
  2. 2.Caches everything into a story file (~$0.06)
  3. 3.Every subsequent step — code generation, test generation, PR review — reads from that cached context
  4. 4.The model produces code that follows your architecture, not generic best practices

When an ADR changes — say you adopt a new error handling pattern or deprecate a library — you update the markdown file, reindex, and every future query reflects the change.

Minutes. Not a training cycle.

Snapshot vs. Live Feed

This is the fundamental difference, and it's the one most enterprises get wrong.

Nova Forge: The Snapshot

  • • Train on your data at a point in time
  • • Knowledge is baked into the weights
  • • Standards evolve? Retrain
  • • Codebase shifts? Retrain
  • • Compliance requirements change? Retrain
  • • Each retrain = compute costs + validation + deployment

Context Engineering: The Live Feed

  • • Knowledge retrieved at query time
  • • Always current — reads from your actual systems
  • • Standards evolve? Already reflected
  • • Codebase shifts? Already indexed
  • • Compliance requirements change? Update the doc
  • • Each update = edit a file, reindex, done

For Reddit's content moderation, a snapshot makes sense. Their moderation policies don't change hourly. They need the model to reason differently about content — to understand the nuanced culture of thousands of subreddits. That's a behavioral change, not a knowledge change. Train it in.

For an enterprise engineering team? Your ADRs evolve monthly. Your codebase changes daily. Your compliance requirements update quarterly. Your Confluence is a living organism.

A snapshot of that knowledge is outdated before the training run finishes.

Where Nova Forge Wins (Be Honest)

I'm not going to pretend Forge doesn't have real use cases. It does.

You need the model to reason differently, not just know more.

Reddit needed moderation judgment. Nimbus needed drug property prediction patterns. These are behavioral changes that benefit from weight-level training.

Your domain knowledge is large but stable.

Medical literature for pharma. Historical case law for legal research. Geological survey data for mining. Massive corpus, doesn't change weekly — train it in.

Latency is critical and context windows are a bottleneck.

A trained model doesn't need to retrieve and process 50K tokens of context on every request. The knowledge is in the weights. For high-throughput, latency-sensitive inference, that matters.

You need to consolidate multiple ML models into one.

Reddit replaced several separate models with a single Forge-trained model. If you're running five fine-tuned models for different tasks, Forge might unify them.

Real use cases. All of them.

None of them describe the problem most enterprise engineering teams are trying to solve.

Where Context Engineering Wins (And It's Not Close)

Your knowledge changes.

Architecture decisions. Compliance policies. Code standards. Team processes. Onboarding docs. If it evolves — and in engineering organizations, everything evolves — you need runtime context, not frozen weights.

You need model flexibility.

OutcomeOps swaps the underlying model with a single environment variable change. Today it's Haiku. Tomorrow it could be Sonnet, Nova 2 Lite, or whatever frontier model drops next month. Your $100K Novella doesn't run on Claude.

You need Organizational Intelligence.

Querying across Confluence, Jira, GitHub, SharePoint, Outlook — federating knowledge from where it already lives. Nova Forge trains on a dataset you prepare. Context Engineering retrieves from your live systems.

Your budget is engineering-team-sized, not ML-lab-sized.

$100K/year for Forge, plus SageMaker compute, plus ML engineering to prepare data, manage runs, evaluate results, and handle retraining. Context Engineering runs on standard Bedrock inference. A complex query costs $0.05–$0.40.

You need AI-assisted engineering workflows.

PR review against your standards. Jira-to-code pipelines that respect your architecture. Vulnerability scanning against your security policies. These are context-dependent, not model-dependent.

The Cost Comparison Nobody's Running

Here's where the procurement spreadsheet lies to you. Again.

Nova Forge

  • $100K/year subscription
  • SageMaker compute for training runs
  • ML engineer(s) to prepare data and manage training
  • Validation and evaluation cycles
  • Retraining every time knowledge changes
  • Locked to Nova models

$200K+/year all-in

Context Engineering (OutcomeOps)

  • Standard Bedrock inference pricing
  • $0.05–$0.40 per complex query
  • $2–$4 per feature (plan + code + tests + review)
  • No ML engineers required
  • Knowledge updates in minutes, not training cycles
  • Swap models with an environment variable

Pay per query. No subscription.

Cost per trained model is the wrong metric. Cost per useful engineering outcome is what matters.

Sound familiar? It's the same mistake I wrote about in Same Context, Three Models. Enterprises optimize for the wrong denominator. They compare cost-per-token when they should compare cost-per-answer. They compare cost-per-training-run when they should compare cost-per-outcome.

The Question Most Enterprises Are Actually Asking

Here's what I hear from engineering leaders every week:

"Our developers are using ChatGPT and getting generic answers that don't follow our standards. How do we make AI that knows how we build software?"

That's not a Nova Forge problem. They don't need to train a custom model. They need their existing standards, architecture decisions, and codebase context fed to a capable model at the right time, in the right format, for the right task.

That's Context Engineering.

The developer asking "how should I implement error handling in this service?" doesn't need a model that was trained on your error handling patterns six months ago. They need a model that can read your current ADR on error handling, see how it's implemented in your actual codebase today, and generate code that matches.

Right now. Not six months ago.

They're Not Competitors. They're Different Tools.

Nova Forge and Context Engineering solve different problems. I respect what AWS built. The Reddit and Nimbus use cases are legitimate.

The danger is reaching for Nova Forge when your actual problem is context delivery.

Spending $100K/year and months of ML engineering effort to bake in knowledge that could be retrieved at runtime for pennies per query. That's not an AI strategy. That's an expensive misdiagnosis.

The Decision Framework:

  • Stable knowledge + behavioral change needed? Nova Forge.
  • Living knowledge + organizational intelligence needed? Context Engineering.
  • Not sure? Ask yourself: does my knowledge change monthly? If yes, you need a live feed, not a snapshot.

Most enterprise engineering teams? Their knowledge is living. Their standards evolve. Their code changes daily. Their Confluence is a living organism.

They need the live feed, not the snapshot.

See the Live Feed in Action

We'll show you how Context Engineering retrieves your organizational knowledge at query time — no model training, no $100K subscriptions, no ML engineers.

  • • How ADRs and code-maps power real-time context retrieval
  • • Why $0.05 per query beats $100K per year for engineering teams
  • • Live demo: PR review against your actual architecture standards
  • • How to swap models without retraining anything

Don't bake in what you can retrieve. Your knowledge is alive. Your AI should be too.

$100K to freeze your knowledge into weights. Five cents to read it live. The math isn't hard — the procurement committee just hasn't seen it yet.