OutcomeOps + OpenAI on Bedrock: The Layer Above Every Frontier Model

Brian Carpio·

OpenAI’s GPT-5.5 went generally available on Amazon Bedrock a few days ago. For OutcomeOps customers, the migration plan was: change a few lines in a Terraform file.

bedrock_advanced_model_id = "openai.gpt-5.5"
bedrock_basic_model_id    = "openai.gpt-5.4"  # the cheaper sibling -- often the right pick for grounded retrieval
bedrock_default_backend   = "responses"

That’s it. No product release. No vendor coordination call. No retraining. The substrate — the corpus, the retrieval pipeline, the role-scoped workspaces, the audit trail — kept doing its job. The only thing that changed was which model handled the synthesis, and which Bedrock API path the dispatch layer routed the call through.

This post is about why that’s the architecture, what it actually took to build it, and why — once you understand what Bedrock is really shipping — this is a much bigger story than “OutcomeOps now supports OpenAI.”

Why OpenAI on Bedrock Changes the Calculus

Until last week, frontier-model choice on AWS Bedrock was effectively a single name: Anthropic. That was fine for most enterprises. It was not fine for everyone.

Regulated buyers — defense agencies, federal contractors, certain financial-services clearance tiers, healthcare systems with active federal grants — have been increasingly cautious through 2026 about placing strategic bets on a single frontier model whose regulatory exposure profile is still moving. Some of those buyers want OpenAI specifically. Some of them want optionality. Some of them already have an OpenAI procurement vehicle in place from earlier programs and would rather not stand up a second one. Either way, “we only support Anthropic” was a procurement objection waiting to happen.

OpenAI on Bedrock takes that objection off the table. For OutcomeOps customers, switching frontier models is now a question of tfvars, not architecture. Same Terraform module. Same VPC. Same KMS keys. Same audit trail. Same corpus. Different model.

Why It Was Easy: Configuration, Not Code

I made the architectural argument in What Does a Good Organization’s Intelligence Layer Look Like? — specifically in Pillar 4:

Everything that varies between customers is configuration, not code. Which model handles which class of question. Which cloud region the layer runs in. Which corpora are ingested. How token spend is bounded per workspace. All of it lives in a deployment configuration the customer controls. Swapping to a newer or cheaper model is a configuration change, not a product release.

Today’s post is the proof — and the honest engineering story behind it. Because the pitch “swap your frontier model with a config change” is real only if a layer somewhere is absorbing the API-surface differences. Bedrock didn’t do that work. We did.

Bedrock Is Not One API. It Is Two.

The first thing to understand about “OpenAI on Bedrock” is that AWS Bedrock is not a single API surface. It is two surfaces, and they are deliberately not interchangeable:

  • Converse API. Bedrock’s unified chat-completion surface. Same JSON shape across Anthropic Claude, Amazon Titan and Nova, Meta Llama, Mistral, and Cohere. It is the API you reach for when you want one client to drive several model families with the same code path. OutcomeOps uses Converse for the Claude families.
  • Responses API (internal AWS codename: Mantle). Bedrock’s newer surface, originally stood up to host OpenAI’s GPT-5 family on AWS infrastructure. Its payload shape mirrors OpenAI’s own /v1/responses API rather than Bedrock’s Converse shape. AWS made the deliberate choice not to normalize it into Converse — the reasoning-token and tool-use semantics don’t flatten cleanly without losing information.

Same Bedrock. Same IAM. Same VPC endpoint. Same KMS keys. But two completely different payload shapes depending on which model family you are calling.

If you want a customer-facing tfvars-swap experience that crosses that boundary, somebody has to write the dispatch layer that hides it.

What We Built: The Dispatch Layer

For OutcomeOps, that somebody was us. The work lives behind a single shared.model_client (formalized as ADR-035) that:

  • Inspects the configured model ID and the per-deployment bedrock_default_backend tfvars variable (converse | responses)
  • Routes to the correct Bedrock API surface
  • Translates parameter names where they are spelled differently for the same concept
  • Drops the parameters the target backend rejects, with a logged warning rather than a silent failure
  • Normalizes reasoning-effort vocabulary across vendors (the OpenAI surface accepts minimal/low/medium/high; the Bedrock Mantle equivalent accepts none/low/medium/high/xhigh; same dial, different words)
  • Applies family-aware budget ceilings so a single “give me an answer” call cannot burn the whole Lambda runtime on internal reasoning

A single call site in the application can target Claude on Converse this deploy and GPT-5.5 on Mantle the next, without ever knowing the difference.

Two Production War Stories

None of this is theoretical. Two of the production lessons that shaped the dispatch layer:

Token-budget semantics are not the same concept across vendors.

Claude’s max_tokens is a ceiling on visible output. GPT-5’s max_output_tokens is a ceiling on reasoning tokens plus visible output combined. With a high reasoning effort and a generous budget, GPT-5 will happily spend the entire allocation thinking and emit zero user-visible tokens.

We hit this in production exactly the way you would expect. 32K-token budget. The model reasoned for the full 900-second Lambda max. Emitted nothing. Timed out. The fix was a hard ceiling at 8K and dialing reasoning effort down to none by default. The dispatch layer encodes that family-aware ceiling so no call site has to remember it.

Stream-idle behavior is invisible to your load balancer.

GPT-5 streams reasoning tokens silently — they never reach the client, but the model is working. From the ALB’s perspective, the connection looks idle. The default idle_timeout of 60 seconds killed the stream before the first visible byte arrived. We had to raise it to 900. Claude doesn’t have this problem because its first stream chunk arrives within seconds. Knowing the semantic exists is what stops the next outage.

Those are two of about five divergences we hit. Each one is small in isolation. In aggregate, they are the difference between “we support OpenAI” as a tfvars line and “we support OpenAI” as a quarterly engineering migration. We did that work once, in the platform. The customer doesn’t.

That is the architecture. That is what it means to be the layer above the model.

What Mantle Actually Hosts (Spoiler: A Lot More Than OpenAI)

When AWS shipped Mantle, the press coverage focused on OpenAI — fairly, because GPT-5.5 GA on Bedrock is the big news. But Mantle is not OpenAI-only. The same OpenAI-compatible /v1/responses shape has become the de-facto open-model gateway on Bedrock.

Pull the model list from a current Mantle endpoint and you find, alongside the OpenAI GPT-5 family:

  • xAI Grok 4.3 * — live on Bedrock today. The “rumored Grok on Bedrock” everyone was waiting for is already here, behind the same Mantle surface as OpenAI.
  • Mistral — Mistral Large 3 (675B), Magistral, the Ministral 3/8/14B family, the Devstral coding model, and the Voxtral audio family.
  • Google Gemma — the Gemma 3 and Gemma 4 families, multi-size.
  • Alibaba Qwen 3 — Qwen 3 Coder (including the 480B variant), Qwen 3 VL for multimodal, Qwen 3 Next.
  • NVIDIA Nemotron — multiple sizes, Nano through Super (120B).
  • Moonshot Kimi, DeepSeek v3, ZAI GLM, MiniMax, Writer Palmyra Vision, plus OpenAI’s own open-weights GPT-OSS family.

For OutcomeOps customers, this means model choice is not a Claude-vs-OpenAI binary. It is access to the open-model frontier — Grok for the customers who want xAI specifically, Qwen Coder for engineering-heavy workloads, Mistral or DeepSeek for cost-tier flexibility, Gemma or NVIDIA Nemotron for organizations standardizing on open-weights models for export-control or sovereignty reasons.

All of it gated by Bedrock’s IAM and VPC perimeter. All of it in the customer’s own cloud account. All of it accessible through the same dispatch layer with a tfvars change.

* A note on what “support” means here: model-agnostic doesn’t mean untested. The dispatch path reaches every model above, but we validate each one against the production workflows before we claim full support. Models marked with an asterisk are routable today, but their full workflow validation pass is still in progress.

Side-by-Side: One File, Three Frontier Models

For a customer who wants Anthropic Claude on the Converse path:

bedrock_advanced_model_id = "us.anthropic.claude-sonnet-4-6"
bedrock_basic_model_id    = "us.anthropic.claude-haiku-4-5-20251001-v1:0"
bedrock_default_backend   = "converse"

For a customer who wants OpenAI’s GPT-5.5 on the Mantle path:

bedrock_advanced_model_id = "openai.gpt-5.5"
bedrock_basic_model_id    = "openai.gpt-5.4"
bedrock_default_backend   = "responses"

For a customer who wants xAI’s Grok *, also on Mantle:

bedrock_advanced_model_id = "xai.grok-4.3"  # dispatch path validated; full workflow validation in progress
bedrock_basic_model_id    = "xai.grok-4.3"
bedrock_default_backend   = "responses"

That is three real options. There are more — Mistral Large 3, Qwen 3 Coder 480B, DeepSeek v3, Moonshot Kimi, ZAI GLM 5, NVIDIA Nemotron Super, the GPT-OSS open-weights variants — all reachable through the same bedrock_default_backend = "responses" configuration. The dispatch layer routes whichever one you pick.

What Stays the Same When the Model Changes

When a customer swaps frontier models with OutcomeOps, the surface they care about doesn’t move:

  • The corpus stays the same. The same ADRs, runbooks, compliance frameworks, code-maps, Confluence pages, and Jira tickets feed the same retrieval pipeline.
  • The retrieval pipeline stays the same. The precision-targeted multi-source retrieval that gathers exactly the right slice of the corpus for each question runs identically regardless of which Bedrock backend the dispatch layer is targeting.
  • The role-scoped workspaces stay the same. Security has its workspace. Operations has theirs. Compliance has theirs. The angle of access each role gets into the substrate is unchanged.
  • The audit trail stays the same. Every prompt, every response, every refusal, every admin action continues to land in the customer’s own cloud account, under the customer’s own encryption keys, exportable to the customer’s SIEM via OCSF. The audit row records which model was used and which backend handled it, so the customer always knows.
  • The orchestration stays the same. The chat synthesizes the same way. The MCP interface exposes the same primitives to the same agents and IDEs.

The model changes. The Bedrock API path may change. The intelligence layer doesn’t. That is the entire point.

Why “Better Together” Actually Means Something

The marketing phrase “better together” usually means two vendors have a partnership PDF. In this case it means something more specific.

OpenAI alone — even GPT-5.5, even at Bedrock pricing — doesn’t know what your organization’s PHI-handling standard says. It doesn’t know which ADR superseded which architectural decision two years ago. It doesn’t know what your on-prem application’s integration patterns are, or which of them survive a cloud migration. The model is generic; your organization is specific.

OutcomeOps is what makes OpenAI useful inside your organization. The same way it’s what makes Claude useful inside your organization. The same way it will be what makes Grok, Mistral Large 3, Qwen 3 Coder, or whatever model your security team clears next useful inside your organization. The model brings the synthesis capability; OutcomeOps brings the substrate the synthesis runs against.

That’s “better together” as architecture, not as a logo lockup.

The Procurement Angle: Model Selection Is a Compliance Artifact

For the regulated and government buyer specifically: this is the architecture that survives a five-year procurement cycle. You are not committing to a model. You are not committing to a model vendor. You are committing to a substrate that runs in your own cloud account, against your own corpus, with your own audit trail, and that swaps frontier models in a configuration change when the procurement landscape shifts under you.

Regulated customers don’t pick models the way a startup does. The model selection is a compliance artifact. It lives in vendor risk assessments, in CISO-approved model lists, in agency-specific procurement vehicles. Three years from now, the model your security team has cleared will not be the model your security team has cleared today. The customers who built on a substrate that treats model choice as configuration will absorb that change with a tfvars update. The customers who built on a tightly-coupled stack will negotiate, integrate, and rebuild — for months.

A dispatch layer that hides the API-surface fragmentation between Converse and Mantle is what makes “we’ll meet you on whatever model your CISO clears” a credible claim instead of marketing.

If you are in defense, federal, healthcare, or financial services and you are evaluating an AI engineering platform, this is the architectural question to ask: when the model changes, what else changes with it?

For OutcomeOps customers, the honest answer is: almost nothing.

Closing

OpenAI on Bedrock changes which frontier model you can run on AWS. It doesn’t change anything about your intelligence layer — and that’s exactly the point.

The model is the commodity. The dispatch layer is plumbing. The context is the moat. The customer picks the model. That’s how it should work.

OutcomeOps: The Future of AI Engineering

Opens Substack in a new tab to confirm. No spam — unsubscribe anytime.

Run Your Intelligence Layer on Whichever Frontier Model You Choose.

OutcomeOps deploys into your own cloud account, runs against Amazon Bedrock’s Converse and Mantle (Responses) APIs through a single dispatch layer, and lets you pick between Anthropic Claude, OpenAI GPT-5, xAI Grok, Mistral, Qwen, DeepSeek, NVIDIA Nemotron, and whatever frontier model lands on Bedrock next — without rebuilding your corpus or rewriting code.

Model selection is a compliance artifact. The substrate is the moat.

Related reading