AWS Strands Agents Review: Model-First Agent SDK Worth Using?

QUICK VERDICT


Rating	7.5/10
Best For	Developers on AWS who want fast agent prototypes without workflow orchestration headaches
Pricing	SDK is free; pay for model inference (Claude 4 Sonnet on Bedrock: ~$0.003/1K input, $0.015/1K output tokens)
Strength	Under 10 lines of code to a working agent with automatic tool selection
Weakness	Full context sent on every tool call drives up costs for multi-step tasks

AWS released Strands Agents as an open-source SDK in May 2025, and it hit 1 million PyPI downloads within four months. The pitch is straightforward: instead of defining explicit workflows for your agents, hand that job to the model. You provide tools and a goal. The LLM figures out the steps.

I spent three weeks building agents with Strands to see if the simplicity holds up under real use. The short version: it does what it promises, faster than any framework I've used. The longer version involves some caveats about cost, control, and when you should reach for something else entirely.

What I Tested

I built three agents over the testing period. A document analyzer that pulls files from S3, parses them, and answers questions. A code review assistant that scans GitHub repos and flags potential issues. And a simple research agent using web search tools.

All ran against Claude 4 Sonnet on Amazon Bedrock, the default configuration. I tracked token usage, completion times, and the number of tool calls per task. I also tried switching to Ollama with Llama 3 locally to test the model-agnostic claims.

What I did not test: multi-agent orchestration patterns (added in version 1.0), the TypeScript SDK (still in preview), or production deployment on Lambda/ECS. Those would each need their own deep dive.

Features That Actually Matter

The core loop is dead simple. Create an agent, give it tools, call it with a prompt. Three lines of Python gets you something functional:

from strands import Agent
from strands_tools import calculator
agent = Agent(tools=[calculator])
agent("What's the square root of 1764?")

The agent figures out it needs the calculator, calls it, returns "42." No workflow definition. No chain configuration. The @tool decorator turns any Python function into something the model can invoke.

Where this pays off is iteration speed. I had my document analyzer working in about 20 minutes. The equivalent in LangChain took me most of an afternoon the last time I built something similar. Strands gets out of your way.

MCP support is native and works well. I connected an AWS documentation MCP server in maybe five minutes. The agent automatically discovered available tools and used them appropriately. This interoperability matters if you're planning to build agents that talk to other agents or external tool ecosystems.

The observability story is solid. OpenTelemetry integration ships by default. Every reasoning step, tool call, and token count shows up in traces. For a preview SDK, the production instrumentation is more complete than I expected.

Where It Falls Short

The model-driven philosophy has a cost problem.

Every time the agent calls a tool, Strands sends the full conversation context back to the model. For my document analyzer, a single question generated 8 tool calls. Each call included the system prompt, user prompt, and all previous tool results. By the third iteration, I was sending 15,000+ tokens per model invocation.

AWS documentation acknowledges this is "by design" to maintain coherence. They suggest context management strategies like sliding windows or summarization. These help, but they're workarounds for an architecture that trades cost efficiency for simplicity.

My research agent burned through $4.30 in a single 45-minute session of fairly light use. The equivalent tasks in a workflow-based framework would have cost maybe $1.50, because I'd control exactly what context goes where.

No human-in-the-loop primitives. LangGraph and CrewAI both let you inject human approval at any point in a workflow. Strands expects you to build that as a custom tool if you need it. For compliance-heavy use cases, this matters.

The model picks the tools, and sometimes it picks wrong. When tool descriptions overlap (I had both a "search documents" and "find files" tool), the model would sometimes call both redundantly or pick the less appropriate one. The fix is better tool descriptions, but debugging why the model made a particular choice requires digging through traces.

Throttling caught me off guard. AWS Bedrock has rate limits, and Strands agents that make rapid sequential tool calls hit them fast. I got ThrottlingException errors multiple times during testing. The SDK retries automatically, but my agents would hang for 10-15 seconds waiting.

Pricing Reality

Strands itself is free and Apache-2.0 licensed.

The actual cost is model inference. Using Claude 4 Sonnet on Amazon Bedrock (the default), you're looking at roughly $0.003 per 1,000 input tokens and $0.015 per 1,000 output tokens. A typical agent interaction with 3-5 tool calls runs $0.05-0.15 depending on context size.

That adds up. My document analyzer averaged $0.12 per question answered. Across a team using it regularly, that's a meaningful line item.

You can switch to cheaper models. Amazon Nova Pro costs a fraction of Claude pricing. Ollama with local Llama models costs nothing beyond compute. But the model-driven approach depends on model quality. I tried Nova Pro for my research agent and tool selection accuracy dropped noticeably.

For comparison, if you already have a LangChain setup and just need to add agent capabilities, sticking with that ecosystem probably costs less in inference because you control context more precisely.

vs. The Competition

LangChain/LangGraph gives you control at the cost of complexity. You define nodes, edges, state machines. It's verbose but predictable. If you need guaranteed workflow execution order or human approval gates, LangGraph is still the better choice.

CrewAI focuses on role-based agent teams. Agents have personas, goals, backstories. It's optimized for multi-agent collaboration where you want agents with distinct responsibilities. Strands can do multi-agent now (version 1.0), but it's not the primary design focus.

OpenAI Agents SDK is newer and tightly integrated with OpenAI's models. If you're committed to GPT-4 and want something similar to Strands' philosophy, it's worth evaluating. Not an option if you need AWS integration or model flexibility.

The real differentiator for Strands is AWS ecosystem integration. If you're already on Bedrock, already using AWS services, already deploying to Lambda, Strands fits into that stack with minimal friction. The same code runs locally during development and on Lambda in production.

What I Liked

Working agent in under 10 lines of code, genuinely
Native MCP support that just works
Model-agnostic design (tested with Bedrock and Ollama successfully)
OpenTelemetry observability built in from the start
Tool creation via Python decorators takes seconds

What Needs Work

Full context on every tool call makes multi-step tasks expensive
No built-in human-in-the-loop patterns
Throttling issues with Bedrock rate limits during rapid tool calls
Tool selection struggles when descriptions overlap

The Verdict

Strands Agents is the fastest path I've found from "I need an agent" to "I have a working agent." For prototypes, internal tools, and AWS-native teams that value development speed over inference cost optimization, it's a strong choice.

Skip it if you need precise workflow control, human approval gates, or you're cost-sensitive on a high-volume use case. LangGraph handles those scenarios better.

If you're building on AWS and want to ship an agent this week instead of this month, Strands is worth the trade-offs.