Adding Guardrails and Safety Limits
Protect Your Agent with Guardrails
Guardrails stop your agent from wasting money, looping forever, or taking dangerous actions. Add them before you deploy.
Guardrail 1: Max Iteration Limit
Stop the agent after N tool calls.
Why: Prevents infinite loops and runaway costs.
How much is enough?
- Simple task (answer a question): 5 steps
- Medium task (research something): 10 steps
- Complex task (multi-step workflow): 20 steps
- Rarely need more than 30 steps
Code example:
max_iterations = 15
for iteration in range(max_iterations):
if iteration >= max_iterations - 1:
return {"status": "max_iterations_reached", "response": agent_response}
tool_call = model.plan()
result = call_tool(tool_call)
agent_response += format_result(result)
Guardrail 2: Cost Cap
Stop the agent when it hits a cost limit.
Why: Prevents $500 bills from a single agent run.
How much is enough?
- Free tier experiments: $0.10
- User-facing feature: $0.50 to $5.00
- Internal tool: $10 to $50
Code example:
from openai import OpenAI
max_cost = 1.00 # $1 per run
total_cost = 0.0
client = OpenAI()
def track_cost(tokens_used, model):
global total_cost
# GPT-4: $0.03 per 1K input, $0.06 per 1K output
cost = (tokens_used / 1000) * 0.045
total_cost += cost
if total_cost > max_cost:
raise Exception(f"Cost limit exceeded: ${total_cost:.2f}")
return total_cost
try:
while True:
response = client.chat.completions.create(...)
cost = track_cost(response.usage.total_tokens, "gpt-4")
print(f"Cost so far: ${cost:.2f}")
except Exception as e:
print(f"Agent stopped: {e}")
Guardrail 3: Tool Call Rate Limiting
Allow the agent to call each tool only N times.
Why: Prevents the agent from spamming the same API.
Example limits:
- Search tool: 5 calls per run
- External API: 3 calls per run
- Email tool: 1 call per run
Code example:
from collections import defaultdict
tool_call_limits = {
"search": 5,
"fetch_webpage": 5,
"send_email": 1,
"database_query": 10
}
tool_call_count = defaultdict(int)
def call_tool_with_rate_limit(tool_name, params):
if tool_name not in tool_call_limits:
return call_tool(tool_name, params)
tool_call_count[tool_name] += 1
limit = tool_call_limits[tool_name]
if tool_call_count[tool_name] > limit:
return {
"error": f"Rate limit exceeded for {tool_name} ({limit} calls max)"
}
return call_tool(tool_name, params)
Guardrail 4: Output Validation
Check the tool output before returning it to the model.
Why: Catches bad data before the model acts on it.
What to validate:
- Is the output the expected type? (dict, list, string)
- Is the output non-empty?
- Does it have required fields?
- Is the data reasonable? (e.g., price is positive)
Code example:
def validate_output(tool_name, output):
# Check it is not None
if output is None:
return {"error": f"{tool_name} returned None"}
# Check type
if tool_name == "search" and not isinstance(output, list):
return {"error": f"{tool_name} should return list, got {type(output)}"}
# Check required fields
if tool_name == "get_user" and "id" not in output:
return {"error": f"{tool_name} missing required field 'id'"}
# Check data is reasonable
if tool_name == "get_price" and output.get("price", 0) < 0:
return {"error": f"{tool_name} returned negative price"}
return output
The Circuit Breaker Pattern
Stop the agent if it behaves badly.
Pattern: Track symptoms (looping, high cost, high token usage). When a symptom is detected, break the circuit and stop the agent.
Symptoms to watch:
- Same tool called 3+ times in a row
- Total tokens exceed 50% of context window
- Total cost exceeds budget
- Iteration count exceeds max
Code example:
class CircuitBreaker:
def __init__(self, max_iterations=15, max_cost=1.0, max_consecutive_same_tool=3):
self.max_iterations = max_iterations
self.max_cost = max_cost
self.max_consecutive_same_tool = max_consecutive_same_tool
self.iteration_count = 0
self.total_cost = 0.0
self.consecutive_same_tool = 0
self.last_tool = None
def check_and_update(self, tool_name, cost):
self.iteration_count += 1
self.total_cost += cost
# Check iteration limit
if self.iteration_count > self.max_iterations:
return False, f"Max iterations ({self.max_iterations}) exceeded"
# Check cost limit
if self.total_cost > self.max_cost:
return False, f"Cost limit (${self.max_cost}) exceeded"
# Check for looping (same tool called repeatedly)
if tool_name == self.last_tool:
self.consecutive_same_tool += 1
else:
self.consecutive_same_tool = 1
self.last_tool = tool_name
if self.consecutive_same_tool > self.max_consecutive_same_tool:
return False, f"Agent called {tool_name} {self.consecutive_same_tool} times in a row (looping)"
return True, "OK"
# Usage
cb = CircuitBreaker(max_iterations=15, max_cost=1.0, max_consecutive_same_tool=3)
for iteration in range(100):
tool_call = model.plan()
result = call_tool(tool_call)
cost = calculate_cost(result)
ok, message = cb.check_and_update(tool_call["name"], cost)
if not ok:
print(f"Circuit breaker triggered: {message}")
break
Allowlisting Tool Calls
For sensitive operations, only allow specific tools or parameters.
Why: Prevents the agent from calling dangerous tools.
Example: Do not let the agent delete data or send emails without approval.
Code example:
# Define safe and unsafe tools
safe_tools = ["search", "fetch_webpage", "summarize"]
unsafe_tools = ["delete_data", "send_email", "modify_database"]
def filter_tool_calls(tool_name, params):
# Safe tools: allow always
if tool_name in safe_tools:
return call_tool(tool_name, params)
# Unsafe tools: require human approval
if tool_name in unsafe_tools:
print(f"REQUIRES APPROVAL: {tool_name}")
approval = input("Approve this action? (yes/no): ")
if approval.lower() == "yes":
return call_tool(tool_name, params)
else:
return {"error": "Action denied by user"}
# Unknown tool
return {"error": f"Unknown tool: {tool_name}"}
Human-in-the-Loop Approval
For high-risk actions, ask a human before the agent proceeds.
When to use:
- Before deleting anything
- Before sending external communications (email, API calls)
- Before large financial transactions
- Before accessing sensitive data
Code example:
def call_tool_with_approval(tool_name, params):
high_risk_tools = ["delete", "send_email", "send_payment"]
if tool_name in high_risk_tools:
print(f"\nTool: {tool_name}")
print(f"Parameters: {json.dumps(params, indent=2)}")
approval = input("\nDo you approve this action? (yes/no): ")
if approval.lower() != "yes":
return {"status": "denied", "message": "User declined approval"}
return call_tool(tool_name, params)
Checklist: Guardrails for Your Agent
- Set max iteration limit (recommend 15)
- Set cost cap (recommend $1 for testing)
- Add tool call rate limits per tool
- Validate all tool outputs
- Implement circuit breaker for looping and cost overruns
- Allowlist or denylist tools based on risk
- Add human approval for high-risk actions
- Test guardrails work before deploying
Guardrails are not fancy, but they save you money and headaches.
Discussion
Sign in to comment. Your account must be at least 1 day old.