Context Engineering Examples: The Five Components, From Corpus to Enforcement — With Working Code You Can Clone and Run
Most teams treat context engineering as a vague concept — a synonym for “better prompts” or “RAG, but we mean it.” It is not vague. It is a discipline with five components, and once you can name them you can evaluate any vendor claim, design your own implementation, and tell the difference between a context engineering platform and a RAG wrapper with a marketing budget.
This post walks through the five components that make context engineering a discipline, illustrated with working code you can clone and run against a real corpus — Spring PetClinic plus a set of Architecture Decision Records — on Amazon Bedrock. By the end you will have a mental model that lets you evaluate any platform’s claims, and a reference implementation you can extend. This is the architectural framework I walked through at Tech Alley here in Las Vegas, expanded for written form with links to the running code and the original slides. For context on where I’m coming from: I’m an AWS Community Builder and one of the official AWS User Group Leaders for Las Vegas, where I co-organize the AWS Las Vegas Meetup — this five-component pipeline is exactly the kind of architecture we take apart there.
The Five-Component Framework
Context engineering is a pipeline. Material flows through five stages, and each stage has a job. Here is the whole framework in one table before we walk through the code for each stage.
| Component | What it does | Why it matters |
|---|---|---|
| 1. Corpus | The authoritative source material: ADRs, code, runbooks, internal docs. | If the source isn’t authoritative and weighted, the model grounds on noise. |
| 2. Retrieval | Hybrid semantic + keyword search that returns the relevant slice. | Pure vector search loses recall in production; hybrid wins. |
| 3. Injection | Assembling retrieved context, task, and instructions into the prompt. | Token budget and ordering decide what the model actually attends to. |
| 4. Output | A structured, schema-validated artifact — a PR, not a chat turn. | An unreviewable chat turn is not a deliverable. |
| 5. Enforcement | Validating that the artifact actually used the context it was given. | Drift is caught at submission, not in code review or production. |
Here is the load-bearing observation, and it is the entire reason this framework matters: a system with only components 1 through 3 is a RAG system. The output and enforcement layers are what make context engineering different — they make the generated content reviewable and governable. Hold onto that sentence. Everything below is an argument for why those last two components are the ones that actually change how an engineering organization operates.
The Pipeline (1/2): Corpus → Retrieval → Injection
These are the three stages that any RAG system has. In the reference implementation they run against the same Spring PetClinic source plus three ADRs — Spring Boot, the H2/Postgres split, and Thymeleaf — so you can watch each stage operate on a corpus you can read in an afternoon.
The Pipeline (1/2)
Corpus → Retrieval → Injection
How the org thinks, builds, decides
Spring PetClinic source plus three ADRs (Spring Boot, H2/Postgres split, Thymeleaf).
ingest_adrs.py → corpus.jsonlFind the relevant slice
FAISS index over Amazon Titan embeddings. Hybrid: semantic + keyword. Returns ranked chunks with metadata.
embed_corpus.py · query.pyAssemble the prompt the model sees
Retrieved ADRs + the diff + the instructions, packed into context within a token budget.
build_prompt.pyCorpus — the authoritative source
The corpus is what goes in: ADRs, code, runbooks, internal documentation — the material that encodes how your organization thinks, builds, and decides. What separates a corpus from a pile of documents is authoritative metadata weighting. An ADR that says “we use Postgres in production and H2 only for local tests” should outrank a stale wiki page that says the opposite. In the reference implementation, ingest_adrs.py reads the ADRs and the source and emits corpus.jsonl — a flat, inspectable corpus with the metadata each chunk carries into retrieval. It lives under 01-corpus/. I made the case for ADRs as the corpus primitive in How 3 ADRs Changed Everything.
Retrieval — the relevant slice, not the whole document
Retrieval pulls the relevant section of the corpus into play — not the entire document, and not everything that is vaguely related. The reference implementation builds a FAISS index over Amazon Titan embeddings and retrieves with a hybrid strategy: semantic similarity for meaning, keyword matching for the exact identifiers and acronyms that semantic search routinely misses. Pure vector search looks impressive in a demo and loses recall in production the moment someone searches for a literal class name or an internal term that has no semantic neighbors. embed_corpus.py builds the index; query.py returns ranked chunks with their metadata. Both live under 02-retrieval/. The argument for why retrieval over code needs more than vectors is in Why RAG Isn’t Enough for Code: Adding a Graph.
Injection — assembling the prompt the model sees
Injection is where retrieved context becomes the model’s working memory. The retrieved ADRs, the diff under review, and the instructions get packed into the prompt — and the two decisions that matter here are the token budget and the ordering. You cannot inject everything; you inject the highest-signal chunks until the budget is spent, and you order them so the model attends to the authoritative context rather than burying it. build_prompt.py under 03-injection/ assembles it, and a companion with_vs_without.py runs the same query with and without the injected context so you can see the difference in the output directly.
Stages 1 through 3 alone equal a RAG system. Most vendors claiming “context engineering” stop here. They retrieve, they inject, they hand you a chat response, and they call it a platform. The next two components are where the discipline actually begins.
The Pipeline (2/2): Output → Enforcement
This is what separates context engineering from “just RAG.” The first three stages get authoritative context in front of the model. These two stages make what comes out of the model into something you can review, merge, and audit.
The Pipeline (2/2)
Output → Enforcement
An artifact, not a chat turn
JSON schema is the single source of truth: it constrains Bedrock’s output via tool-use, validates the result, then renders Markdown ready to paste into a PR.
schema.py · generate_pr_description.pyDid it actually use the context?
Validates the PR cites the ADRs that retrieval returned. Drift is caught at submission, not at code review. Reviewable, auditable, queryable.
check_pr_cites_adrs.pyOutput — an artifact, not a chat turn
A chat turn is not a deliverable. It is a thing a human has to read, interpret, copy, and trust before any of it becomes real work. The output component replaces the chat turn with a structured artifact whose shape is defined by a JSON schema — and that schema is the single source of truth. In the reference implementation it constrains Bedrock’s generation via tool-use, validates the result against the same schema, and then renders Markdown ready to paste into a pull request. The schema lives in schema.py; generate_pr_description.py produces the artifact; both are under 04-output/, and the generated pr.json is checked in so you can see exactly what the pipeline produced against the PetClinic corpus — no human wrote it. The point is structural: when output is a schema-validated PR, the model’s work enters the same review and version-control surface as everyone else’s.
Enforcement — did it actually use the context?
Enforcement is the component nobody else ships, and it is the one that turns “the model had access to our standards” into “the model demonstrably used our standards.” check_pr_cites_adrs.py, under 05-enforcement/, validates that the generated PR cites the ADRs that retrieval actually returned. If the model was handed the Postgres ADR and then wrote code that wires up H2 for production, enforcement catches it — at submission time, not three weeks later in a code review, and not in an incident. That is the difference between hoping the context worked and proving it did. Reviewable. Auditable. Queryable.
This is not a toy pattern that only survives on a 12-file demo. The same five components run at production scale behind RetrieveIT — sixteen OAuth integrations on AWS Lambda, SQS, and S3 Vectors — built solo on exactly this architecture.
How to Evaluate a Vendor’s “Context Engineering” Claim
The framework is also an evaluation rubric. Every platform on the market right now claims context engineering. Most of them are RAG systems — components 1 through 3 — with a confident landing page. Here is one question per component to ask any vendor, and what a good answer sounds like.
- Corpus: “Where does your corpus live, and how is authoritativeness scored?” You want a real answer about source weighting and freshness, not “we index your repos.”
- Retrieval: “Is retrieval hybrid or pure-vector, and how do you measure recall?” If they only do vectors and can’t quote a recall number, expect them to miss exact identifiers.
- Injection: “How do you handle token-budget enforcement at injection time, and does injection order matter?” “We send everything to a big context window” is the wrong answer.
- Output: “What is the structured output format, and is it schema-validated?” If the deliverable is a chat response, it is not an artifact, and a human is the validation layer.
- Enforcement: “Can I audit which corpus items the model cited and which it ignored?” This is the question that separates a platform from a wrapper. Most vendors have no answer at all.
A platform that cannot answer the output and enforcement questions is selling you RAG. That may be fine for your use case — but you should know which one you are buying. I went deeper on the regulated-vs-unregulated buyer split in Context Engineering Platforms: A Comparison Guide.
The Repo, the Prerequisites, the Quickstart
The reference implementation is the open-source context engineering examples repo on GitHub, and it is fully runnable. It runs against Spring PetClinic plus three ADRs and uses Amazon Bedrock for both embeddings (Amazon Titan) and generation (a Claude model). The only prerequisite that takes more than a minute is Bedrock model access in your AWS account — the repo README has the authoritative list and exact flags.
The folder layout maps one-to-one onto the five components, and the scripts run in pipeline order:
git clone https://github.com/outcomeops/context-engineering
cd context-engineering
# Prereqs: AWS account with Amazon Bedrock model access
# (Amazon Titan embeddings + a Claude model), AWS creds configured.
# 01 corpus -> 02 retrieval -> 03 injection
python 01-corpus/ingest_adrs.py # build corpus.jsonl
python 02-retrieval/embed_corpus.py # embed into a FAISS index
python 02-retrieval/query.py # retrieve ranked chunks
python 03-injection/build_prompt.py # assemble the prompt
# 04 output -> 05 enforcement
python 04-output/generate_pr_description.py # schema-validated PR
python 05-enforcement/check_pr_cites_adrs.py # prove it used the ADRsClone it, run it, and change the corpus. Swap PetClinic for one of your own repositories and a handful of your own ADRs, and watch the output change. That is the fastest way to internalize why the corpus is the moat and the model is the commodity.
What This Implies for Engineering Organizations
The five-component framework is not just a technical pattern. It changes which roles matter and which artifacts get version-controlled. When output is a schema-validated PR and enforcement proves the PR honored the corpus, the scarce skill stops being “writes code fast” and becomes “curates the corpus and the ADRs the whole organization generates against” — the shift I described in The Rise of the Outcome Engineer and The Death of the Traditional Product Owner.
It also reframes the platform question. A context engineering platform is not a better autocomplete; it is the layer where corpus, retrieval, injection, output, and enforcement become an organizational capability rather than a per-team science project. That is the argument in What Is an AI Engineering Platform? (2026 Guide), and the methodology that operationalizes it is in The OutcomeOps Way: Stop Prompting, Start Co-Engineering.
Context Engineering Is a Discipline
Five components. Corpus is the authoritative source. Retrieval finds the relevant slice. Injection assembles the prompt. Output produces a structured artifact. Enforcement proves the artifact used the context. The first three are RAG. The last two are what make the work reviewable and governable — which is to say, the last two are what make it engineering.
Context engineering is a discipline. The five components are the structure. The code is the proof.
Clone the Five Components. Then Build the Nucleus.
The reference implementation runs all five components against a real corpus on Amazon Bedrock. The OutcomeOps platform is the same pattern at organizational scale — deployed in your own AWS account, with the corpus, the audit trail, and the enforcement built into the surface.
The model is the commodity. The context is the moat.
Related reading
- What Are Context Engineering Platforms? — the definitional companion to this architectural deep-dive.
- Context Engineering Platforms: A Comparison Guide — how to compare platforms, with the regulated-buyer split.
- Why RAG Isn’t Enough for Code: Adding a Graph — the retrieval architecture behind component 2.
- How 3 ADRs Changed Everything — the corpus primitive, proven on the same PetClinic codebase.
- What Is an AI Engineering Platform? (2026 Guide) — the platform layer the five components live inside.
- The OutcomeOps Way: Stop Prompting, Start Co-Engineering — the methodology that operationalizes the framework.