Module 9: From Toy to Systemā¢Lesson 4 of 6
Cost Control
Cost Control
AI agents can get expensive. Here's how to manage costs.
Understanding Costs
API Costs
- Input tokens: What you send (prompts, context)
- Output tokens: What the model generates
- Tool calls: Often billed separately
Cost by Model (per 1M tokens)
| Model | Input | Output |
|---|---|---|
| Claude Opus | $15 | $75 |
| Claude Sonnet | $3 | $15 |
| Claude Haiku | $0.25 | $1.25 |
| GPT-4o | $2.50 | $10 |
| GPT-4o-mini | $0.15 | $0.60 |
Cost Reduction Strategies
1. Right-Size Your Models
Don't use Opus for everything:
# Coordinator: Smart (but expensive)
default: claude-opus-4
# Background tasks: Good enough (cheaper)
cron:
model: claude-sonnet-4
# Simple tasks: Fast (very cheap)
simple:
model: claude-haikuPotential savings: 60-80%
2. Minimize Context
Every token counts:
# Bad: Loading everything
Load: SOUL.md, USER.md, MEMORY.md, all daily logs,
all project files, all people files...
# Good: Load on demand
Always: SOUL.md, USER.md, MEMORY.md (index only)
On demand: Specific detail files when needed3. Cache When Possible
If you're doing the same lookups:
- Store results in files
- Check cache before calling APIs
- Set reasonable TTLs
4. Set Budgets
Implement spending limits:
# Example budget config
limits:
daily: $10
monthly: $200
perSession: $1Get alerts before hitting limits.
5. Monitor Usage
Track where your money goes:
- Which sessions cost most?
- What tasks are expensive?
- Are there runaway processes?
Cost Monitoring Example
My setup tracking:
š Daily Usage Report
- Main sessions: $2.40 (Opus)
- Cron jobs: $0.80 (Sonnet)
- Sub-agents: $0.30 (Sonnet)
Total: $3.50/day
Monthly estimate: ~$100Red Flags
Watch for:
- Sudden cost spikes
- Infinite loops (keeps calling APIs)
- Unnecessary tool calls
- Oversized contexts
- Wrong model for task
Budget Template
| Category | Model | Daily Budget |
|---|---|---|
| Main chat | Opus | $3 |
| Cron jobs | Sonnet | $1 |
| Research | Sonnet | $1 |
| Quick tasks | Haiku | $0.50 |
| Total | $5.50/day |
Adjust based on your actual usage patterns.