Loading...


Updated 20 May 2026 • 7 mins read

A complete, practical guide to Amazon Bedrock, the managed generative AI service from AWS. It covers what Bedrock is, its core features, how token-based pricing works, the main billing modes, foundation model rates, the hidden costs that inflate bills, real-world cost examples, and proven ways to keep spend under control.
Generative AI has moved from experiment to line item. Teams everywhere are shipping AI features, and then watching the bill arrive. If you build on AWS, that bill very often has one name on it: Amazon Bedrock.
Bedrock makes it easy to use powerful foundation models through a single API. The pricing, though, is famously layered. Tokens, billing modes, regional surcharges, knowledge bases, agents, each is its own line item, and most teams only connect the dots after a surprising invoice.
This guide explains everything you need to know about Amazon Bedrock and its pricing. We will cover what the service does, how it charges, where costs hide, and how to keep spend predictable as you scale.
Here is the quick version. Amazon Bedrock is a fully managed AWS service that gives you access to foundation models from Anthropic, Meta, Mistral, Cohere, Amazon, and others through one API. There are no servers to manage and no GPUs to provision.
Pricing is pay-as-you-go and token-based. You are billed per input and output token, and the rate depends on which model you pick and which billing mode you use. Costs range from a few dollars a month for light experiments to several thousand dollars a month once agents, knowledge bases, and high-volume inference enter the picture.
Amazon Bedrock is a fully managed, serverless service from AWS for building and scaling generative AI applications. Instead of provisioning GPUs or managing model infrastructure, you call a single API and get access to a wide catalog of foundation models, often shortened to FMs.
Think of Bedrock as an AI model marketplace with the plumbing already done. You pick a model, send a prompt, and pay only for what you use. The security, scaling, and availability are AWS's problem, not yours.
As of 2026, Bedrock offers access to more than 85 foundation models from providers including:
The appeal is simple. You can test, compare, and switch between models without integrating a separate API for each one.
Bedrock is more than a model gateway. Its main capabilities include:
Enterprise security, with IAM access control, PrivateLink, and encryption in transit and at rest.
Understanding Bedrock costs is no longer optional. Generative AI spending is climbing fast, and most of it lands directly on a cloud bill.
According to Gartner, worldwide generative AI spending was forecast to reach roughly 644 billion dollars in 2025, a jump of more than 76 percent year over year. For teams building on Bedrock, that growth is not abstract. It shows up as a larger, faster-growing AWS invoice.
EXPERT INSIGHT
In practice, many teams spend 1.5 to 2 times their initial Bedrock estimate. The overrun rarely comes from hidden fees. It comes from costs that are simply hard to forecast up front, such as retries, experimentation, agent token amplification, and idle vector-store charges. Budgeting for those four things early is what separates a calm invoice from a stressful one.
Bedrock pricing comes down to four cost drivers. Get these four right, and the rest of the bill makes sense.
For text models, you pay per token. A token is a small chunk of text, and roughly 1,000 tokens equals about 750 words. Bedrock bills input tokens, which are your prompt and context, separately from output tokens, which are the model's response. Output tokens usually cost three to five times more than input tokens, because generating text is more compute-heavy than reading it.
Rates differ enormously across models. A lightweight model like Amazon Nova Micro can be more than 100 times cheaper per token than a frontier model like Claude Opus. Picking the right model for each task is the single biggest cost lever you have.
The same model can cost very different amounts depending on the billing mode. On-demand, batch, provisioned throughput, and newer flexible tiers all change the math, and we will break them down next.
Knowledge Bases, Agents, Guardrails, customization, and data transfer all bill independently. They can quietly become the largest part of the invoice if no one is watching them.
Bedrock offers several billing modes, and each one trades flexibility for cost or performance in a different way.
| Pricing Mode | How It Works | Commitment | Best For |
|---|---|---|---|
| On-Demand | Pay per token, no commitment | None | Variable, spiky, or experimental workloads |
| Batch | Asynchronous processing at about 50 percent off | None (24-hour turnaround) | Bulk, non-real-time jobs like summarization |
| Provisioned Throughput | Reserve dedicated capacity, billed hourly | 1-month or 6-month | High, steady, predictable production traffic |
| Flex / Priority | Newer tiers trading latency for price or speed | None | Workloads tuning the latency and cost balance |
As a rule of thumb, on-demand is where almost everyone starts. The moment a workload can tolerate a 24-hour wait, move it to batch for an instant 50 percent saving. Consider provisioned throughput only when a single model consistently runs above roughly 30 to 40 dollars per day on-demand.
Below are representative on-demand rates per one million tokens, based on the US East region in early-to-mid 2026. They show the scale of the differences between models. They are not a fixed price list.
| Foundation Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical Use Case |
|---|---|---|---|
| Amazon Nova Micro | $0.035 | $0.14 | Classification, routing, extraction |
| Amazon Nova Lite | $0.06 | $0.24 | Lightweight chat and summarization |
| Amazon Nova Pro | $0.80 | $3.20 | Balanced general-purpose tasks |
| Meta Llama 3.3 70B | about $0.72 | about $0.72 | Open-weight general reasoning |
| Claude Haiku 4.5 | about $1.00 | about $5.00 | Fast, low-cost everyday tasks |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Strong reasoning, production chat |
| Claude Opus | about $5.00 | about $25.00 | Complex, high-stakes reasoning |
| Mistral Large | about $3.00 | about $9.00 | Multilingual, EU data residency |
Model rates and versions change often, so always confirm current numbers on the official AWS Bedrock pricing page before you build a budget. Note that image models bill per image, and embedding models bill input tokens only.
Abstract token rates only become real when you apply them to actual workloads. Here are three representative scenarios.
A small team testing prompts and comparing models, running 10,000 to 50,000 requests a month. The expected cost is roughly 500 to 2,000 dollars a month, mostly driven by experimentation and model selection.
A customer-facing assistant serving thousands of conversations a day on a mid-tier model. The expected cost is roughly 2,000 to 8,000 dollars a month, depending on prompt size and whether prompt caching is used.
An automation handling 5,000 monthly runs on a frontier model with Agents and Guardrails enabled. Because agents amplify token usage, this commonly lands at 2,500 to 6,000 dollars a month, far above a naive per-request estimate. The pattern is consistent: the more orchestration you add, the wider the gap between the pricing page and the invoice.
The good news is that Bedrock costs are highly controllable. Most teams can cut their bill by 40 to 60 percent without hurting quality. Here is the priority order.
Tag and monitor spend. AWS bills at the account level, so without tagging you cannot tell which app or team is driving cost.
A common question in 2026 is whether Bedrock charges a markup over calling model providers directly. The honest answer is that it depends on the model.
| Factor | Amazon Bedrock | Direct Provider API |
|---|---|---|
| Claude model pricing | Matches Anthropic's direct rates | Same per-token rates |
| Open-weight model pricing | Can run 10 to 70 percent higher | Often cheaper via specialist hosts |
| AWS integration | Native, with IAM, PrivateLink, VPC | Requires separate key management |
| Model variety | More than 85 models, one API | One provider per integration |
| Best for | Teams already standardized on AWS | Single-provider, non-AWS stacks |
For most AWS-native teams, Bedrock's value is integration and security, not a lower sticker price. For Claude models specifically, pricing is at parity, so there is no cost penalty for the convenience
The hardest part of Bedrock is rarely getting a model to respond. It is keeping the bill predictable as usage scales. That is where Opslyft focuses.
Opslyft is a FinOps platform that brings visibility and accountability to cloud and AI spend across AWS, including Bedrock workloads. Instead of discovering cost problems at month-end, teams see them as they happen.
In practice, Opslyft supports Bedrock cost management in a few concrete ways:
The goal is simple. It makes Bedrock spend something you plan for, not something you explain after the fact.
Amazon Bedrock makes powerful foundation models genuinely easy to use. The trade-off is a bill with many moving parts, and that bill rewards the teams who pay attention to it.
If you master the four cost drivers, keep an eye on the hidden costs, and right-size every model choice, Bedrock spend becomes predictable instead of alarming. In AI, the cheapest token is the one you never needed to send.
Amazon Bedrock is used to build and scale generative AI applications such as chatbots, content generation, summarization, RAG search, and AI agents. It does this through a single managed API, without you having to provision or manage any infrastructure.
Bedrock uses pay-as-you-go, token-based pricing. You pay separately for input and output tokens, at a rate set by the model you choose. Billing modes include on-demand, batch, which is about 50 percent cheaper, and provisioned throughput, plus optional charges for features like Knowledge Bases and Agents.
Bedrock has no permanent free tier, so you pay from the first API call. New AWS accounts do receive starter credits that work across many AWS services, including Bedrock, but those credits expire after a few months
The most common reasons are agent token amplification, which can be 5 to 10 times the visible tokens, idle Knowledge Base vector-store charges, cross-region surcharges, retries and experimentation, and defaulting to expensive frontier models for simple tasks.
For Claude models, Bedrock pricing matches Anthropic's direct API, so there is no premium. For open-weight models like Llama, Bedrock can cost more than specialist providers. Bedrock's main advantage is deep AWS integration, not lower per-token rates.