Updated 20 May 2026 • 7 mins read

Amazon Bedrock Pricing Explained

Cloud Services

Khushi Dubey
Author

Table of Content

A complete, practical guide to Amazon Bedrock, the managed generative AI service from AWS. It covers what Bedrock is, its core features, how token-based pricing works, the main billing modes, foundation model rates, the hidden costs that inflate bills, real-world cost examples, and proven ways to keep spend under control.

Generative AI has moved from experiment to line item. Teams everywhere are shipping AI features, and then watching the bill arrive. If you build on AWS, that bill very often has one name on it: Amazon Bedrock.

Bedrock makes it easy to use powerful foundation models through a single API. The pricing, though, is famously layered. Tokens, billing modes, regional surcharges, knowledge bases, agents, each is its own line item, and most teams only connect the dots after a surprising invoice.

This guide explains everything you need to know about Amazon Bedrock and its pricing. We will cover what the service does, how it charges, where costs hide, and how to keep spend predictable as you scale.

What Is Amazon Bedrock, and How Is It Priced?

Here is the quick version. Amazon Bedrock is a fully managed AWS service that gives you access to foundation models from Anthropic, Meta, Mistral, Cohere, Amazon, and others through one API. There are no servers to manage and no GPUs to provision.

Pricing is pay-as-you-go and token-based. You are billed per input and output token, and the rate depends on which model you pick and which billing mode you use. Costs range from a few dollars a month for light experiments to several thousand dollars a month once agents, knowledge bases, and high-volume inference enter the picture.

What Is Amazon Bedrock?

Amazon Bedrock is a fully managed, serverless service from AWS for building and scaling generative AI applications. Instead of provisioning GPUs or managing model infrastructure, you call a single API and get access to a wide catalog of foundation models, often shortened to FMs.

Think of Bedrock as an AI model marketplace with the plumbing already done. You pick a model, send a prompt, and pay only for what you use. The security, scaling, and availability are AWS's problem, not yours.

As of 2026, Bedrock offers access to more than 85 foundation models from providers including:

Anthropic, the Claude family of Haiku, Sonnet, and Opus models.
Amazon, the Nova family of Micro, Lite, Pro, and Premier, plus the Titan models.
Meta, the Llama family of open-weight models.
Mistral AI, Cohere, AI21 Labs, Stability AI, DeepSeek, and Google, a broad mix of text, image, and embedding models.

The appeal is simple. You can test, compare, and switch between models without integrating a separate API for each one.

Key Features of Amazon Bedrock

Bedrock is more than a model gateway. Its main capabilities include:

Unified API access, one endpoint for many model providers.
Knowledge Bases, managed Retrieval-Augmented Generation, also called RAG, using your own data.
Agents, multi-step automation that can call tools and APIs to complete tasks.
Guardrails, safety filters for blocking harmful or off-topic content.
Customization, fine-tuning and model distillation on proprietary data.
Flows and evaluation, a visual workflow builder and built-in model comparison tools.

Enterprise security, with IAM access control, PrivateLink, and encryption in transit and at rest.

Why Bedrock Pricing Matters in 2026

Understanding Bedrock costs is no longer optional. Generative AI spending is climbing fast, and most of it lands directly on a cloud bill.

According to Gartner, worldwide generative AI spending was forecast to reach roughly 644 billion dollars in 2025, a jump of more than 76 percent year over year. For teams building on Bedrock, that growth is not abstract. It shows up as a larger, faster-growing AWS invoice.

EXPERT INSIGHT
In practice, many teams spend 1.5 to 2 times their initial Bedrock estimate. The overrun rarely comes from hidden fees. It comes from costs that are simply hard to forecast up front, such as retries, experimentation, agent token amplification, and idle vector-store charges. Budgeting for those four things early is what separates a calm invoice from a stressful one.

How Amazon Bedrock Pricing Works

Bedrock pricing comes down to four cost drivers. Get these four right, and the rest of the bill makes sense.

1. Tokens, the fundamental billing unit

For text models, you pay per token. A token is a small chunk of text, and roughly 1,000 tokens equals about 750 words. Bedrock bills input tokens, which are your prompt and context, separately from output tokens, which are the model's response. Output tokens usually cost three to five times more than input tokens, because generating text is more compute-heavy than reading it.

2. Model choice, the 100x variable

Rates differ enormously across models. A lightweight model like Amazon Nova Micro can be more than 100 times cheaper per token than a frontier model like Claude Opus. Picking the right model for each task is the single biggest cost lever you have.

3. Pricing mode, how you choose to be billed

The same model can cost very different amounts depending on the billing mode. On-demand, batch, provisioned throughput, and newer flexible tiers all change the math, and we will break them down next.

4. Add-on features, the easy-to-miss layer

Knowledge Bases, Agents, Guardrails, customization, and data transfer all bill independently. They can quietly become the largest part of the invoice if no one is watching them.

Amazon Bedrock Pricing Models Explained

Bedrock offers several billing modes, and each one trades flexibility for cost or performance in a different way.

Pricing Model Comparison

Pricing Mode	How It Works	Commitment	Best For
On-Demand	Pay per token, no commitment	None	Variable, spiky, or experimental workloads
Batch	Asynchronous processing at about 50 percent off	None (24-hour turnaround)	Bulk, non-real-time jobs like summarization
Provisioned Throughput	Reserve dedicated capacity, billed hourly	1-month or 6-month	High, steady, predictable production traffic
Flex / Priority	Newer tiers trading latency for price or speed	None	Workloads tuning the latency and cost balance

As a rule of thumb, on-demand is where almost everyone starts. The moment a workload can tolerate a 24-hour wait, move it to batch for an instant 50 percent saving. Consider provisioned throughput only when a single model consistently runs above roughly 30 to 40 dollars per day on-demand.

Amazon Bedrock Foundation Model Pricing

Below are representative on-demand rates per one million tokens, based on the US East region in early-to-mid 2026. They show the scale of the differences between models. They are not a fixed price list.

Foundation Model Pricing Comparison

Foundation Model	Input (per 1M tokens)	Output (per 1M tokens)	Typical Use Case
Amazon Nova Micro	$0.035	$0.14	Classification, routing, extraction
Amazon Nova Lite	$0.06	$0.24	Lightweight chat and summarization
Amazon Nova Pro	$0.80	$3.20	Balanced general-purpose tasks
Meta Llama 3.3 70B	about $0.72	about $0.72	Open-weight general reasoning
Claude Haiku 4.5	about $1.00	about $5.00	Fast, low-cost everyday tasks
Claude Sonnet 4.5	$3.00	$15.00	Strong reasoning, production chat
Claude Opus	about $5.00	about $25.00	Complex, high-stakes reasoning
Mistral Large	about $3.00	about $9.00	Multilingual, EU data residency

Model rates and versions change often, so always confirm current numbers on the official AWS Bedrock pricing page before you build a budget. Note that image models bill per image, and embedding models bill input tokens only.

The Hidden Costs of Amazon Bedrock

Token rates are the visible part of the bill. The costs below are the ones that blindside teams most often.

Hidden AI Infrastructure Costs

Hidden Cost	What It Is	Why It Surprises Teams
Knowledge Base vector store	OpenSearch Serverless minimum of about 2 compute units	Costs around $345 a month even with zero queries
Agent token amplification	Agents make several internal model calls per request	Token usage runs 5 to 10 times the visible prompt
Cross-region inference	Routing requests to other regions for capacity	Adds a flat 10 percent surcharge on token pricing
Customization storage	Monthly storage for fine-tuned model weights	Billed continuously, often $0.02 to $0.10 per GB
Retries and experimentation	Failed requests and developer testing	Invisible in estimates, very real on the invoice
Adjacent AWS services	CloudWatch, data transfer, parsing, reranking	Stack up independently of model costs

A practical note from the field. The OpenSearch Serverless baseline is the single most common Bedrock surprise. If you are starting a new Knowledge Base in 2026, Amazon S3 Vectors, launched in late 2025, can be up to 90 percent cheaper and is often the smarter default.

Real-World Amazon Bedrock Cost Examples

Abstract token rates only become real when you apply them to actual workloads. Here are three representative scenarios.

Example 1: Early-stage experimentation

A small team testing prompts and comparing models, running 10,000 to 50,000 requests a month. The expected cost is roughly 500 to 2,000 dollars a month, mostly driven by experimentation and model selection.

Example 2: A production chatbot

A customer-facing assistant serving thousands of conversations a day on a mid-tier model. The expected cost is roughly 2,000 to 8,000 dollars a month, depending on prompt size and whether prompt caching is used.

Example 3: An agentic workflow

An automation handling 5,000 monthly runs on a frontier model with Agents and Guardrails enabled. Because agents amplify token usage, this commonly lands at 2,500 to 6,000 dollars a month, far above a naive per-request estimate. The pattern is consistent: the more orchestration you add, the wider the gap between the pricing page and the invoice.

How to Optimize Amazon Bedrock Costs

The good news is that Bedrock costs are highly controllable. Most teams can cut their bill by 40 to 60 percent without hurting quality. Here is the priority order.

Right-size the model. Start with the cheapest model that meets your quality bar. Around 40 to 60 percent of tasks do not need a frontier model.
Use Batch mode. Any workload that tolerates a 24-hour wait gets an instant 50 percent discount.
Turn on prompt caching. Reusing system prompts and context can cut cached-input costs by up to 90 percent.
Enable Intelligent Prompt Routing. It auto-routes simple queries to cheaper models in the same family, and AWS reports up to 30 percent savings.
Trim your prompts. Many prompts carry 30 to 50 percent unnecessary tokens. Shorter prompts cost less on every single call.
Cap agent loops. Set iteration limits so agentic workflows cannot spiral into runaway token consumption.

Tag and monitor spend. AWS bills at the account level, so without tagging you cannot tell which app or team is driving cost.

Bedrock vs Direct API Access: Which Is Cheaper?

A common question in 2026 is whether Bedrock charges a markup over calling model providers directly. The honest answer is that it depends on the model.

Amazon Bedrock vs Direct Provider API

Factor	Amazon Bedrock	Direct Provider API
Claude model pricing	Matches Anthropic's direct rates	Same per-token rates
Open-weight model pricing	Can run 10 to 70 percent higher	Often cheaper via specialist hosts
AWS integration	Native, with IAM, PrivateLink, VPC	Requires separate key management
Model variety	More than 85 models, one API	One provider per integration
Best for	Teams already standardized on AWS	Single-provider, non-AWS stacks

For most AWS-native teams, Bedrock's value is integration and security, not a lower sticker price. For Claude models specifically, pricing is at parity, so there is no cost penalty for the convenience

How Opslyft Helps Businesses Manage Amazon Bedrock Costs

The hardest part of Bedrock is rarely getting a model to respond. It is keeping the bill predictable as usage scales. That is where Opslyft focuses.

Opslyft is a FinOps platform that brings visibility and accountability to cloud and AI spend across AWS, including Bedrock workloads. Instead of discovering cost problems at month-end, teams see them as they happen.

In practice, Opslyft supports Bedrock cost management in a few concrete ways:

Integration that connects to your AWS accounts and surfaces Bedrock spend alongside the rest of your cloud bill.
Cost visibility and allocation that attributes token spend to the right team, app, or environment, even though AWS bills at the account level.
Optimization that flags expensive model usage, idle vector stores, and workloads that belong in batch or provisioned modes.
Anomaly detection that catches runaway agent loops and usage spikes before they become an invoice surprise.
Consulting and support for forecasting, governance, and building a sustainable FinOps practice around AI spend.

The goal is simple. It makes Bedrock spend something you plan for, not something you explain after the fact.

Conclusion

Amazon Bedrock makes powerful foundation models genuinely easy to use. The trade-off is a bill with many moving parts, and that bill rewards the teams who pay attention to it.

If you master the four cost drivers, keep an eye on the hidden costs, and right-size every model choice, Bedrock spend becomes predictable instead of alarming. In AI, the cheapest token is the one you never needed to send.

FAQs

What is Amazon Bedrock used for?

Amazon Bedrock is used to build and scale generative AI applications such as chatbots, content generation, summarization, RAG search, and AI agents. It does this through a single managed API, without you having to provision or manage any infrastructure.

How is Amazon Bedrock priced?

Bedrock uses pay-as-you-go, token-based pricing. You pay separately for input and output tokens, at a rate set by the model you choose. Billing modes include on-demand, batch, which is about 50 percent cheaper, and provisioned throughput, plus optional charges for features like Knowledge Bases and Agents.

Does Amazon Bedrock have a free tier?

Bedrock has no permanent free tier, so you pay from the first API call. New AWS accounts do receive starter credits that work across many AWS services, including Bedrock, but those credits expire after a few months

Why is my Amazon Bedrock bill higher than expected?

The most common reasons are agent token amplification, which can be 5 to 10 times the visible tokens, idle Knowledge Base vector-store charges, cross-region surcharges, retries and experimentation, and defaulting to expensive frontier models for simple tasks.

Is Amazon Bedrock cheaper than calling AI APIs directly?

For Claude models, Bedrock pricing matches Anthropic's direct API, so there is no premium. For open-weight models like Llama, Bedrock can cost more than specialist providers. Bedrock's main advantage is deep AWS integration, not lower per-token rates.

Related Blogs

The True Cost of GPT-4 Tokens: Why AI Bills Are Higher

AI Cost Optimization: A Simple Guide by Opslyft

AWS Cost Optimization with FinOps: Best Practices, Tools, and Real-World Benefits

Cloud waste? Bench it. Opslyft puts the right players on the field.