Adding Guardrails and Safety Limits

beginner647 reads

Protect Your Agent with Guardrails

Guardrails stop your agent from wasting money, looping forever, or taking dangerous actions. Add them before you deploy.

Guardrail 1: Max Iteration Limit

Stop the agent after N tool calls.

Why: Prevents infinite loops and runaway costs.

How much is enough?

Simple task (answer a question): 5 steps
Medium task (research something): 10 steps
Complex task (multi-step workflow): 20 steps
Rarely need more than 30 steps

Code example:

max_iterations = 15

for iteration in range(max_iterations):
  if iteration >= max_iterations - 1:
    return {"status": "max_iterations_reached", "response": agent_response}
  
  tool_call = model.plan()
  result = call_tool(tool_call)
  agent_response += format_result(result)

Guardrail 2: Cost Cap

Stop the agent when it hits a cost limit.

Why: Prevents $500 bills from a single agent run.

How much is enough?

Free tier experiments: $0.10
User-facing feature: $0.50 to $5.00
Internal tool: $10 to $50

Code example:

from openai import OpenAI

max_cost = 1.00  # $1 per run
total_cost = 0.0
client = OpenAI()

def track_cost(tokens_used, model):
  global total_cost
  # GPT-4: $0.03 per 1K input, $0.06 per 1K output
  cost = (tokens_used / 1000) * 0.045
  total_cost += cost
  
  if total_cost > max_cost:
    raise Exception(f"Cost limit exceeded: ${total_cost:.2f}")
  
  return total_cost

try:
  while True:
    response = client.chat.completions.create(...)
    cost = track_cost(response.usage.total_tokens, "gpt-4")
    print(f"Cost so far: ${cost:.2f}")
except Exception as e:
  print(f"Agent stopped: {e}")

Guardrail 3: Tool Call Rate Limiting

Allow the agent to call each tool only N times.

Why: Prevents the agent from spamming the same API.

Example limits:

Search tool: 5 calls per run
External API: 3 calls per run
Email tool: 1 call per run

Code example:

from collections import defaultdict

tool_call_limits = {
  "search": 5,
  "fetch_webpage": 5,
  "send_email": 1,
  "database_query": 10
}

tool_call_count = defaultdict(int)

def call_tool_with_rate_limit(tool_name, params):
  if tool_name not in tool_call_limits:
    return call_tool(tool_name, params)
  
  tool_call_count[tool_name] += 1
  limit = tool_call_limits[tool_name]
  
  if tool_call_count[tool_name] > limit:
    return {
      "error": f"Rate limit exceeded for {tool_name} ({limit} calls max)"
    }
  
  return call_tool(tool_name, params)

Guardrail 4: Output Validation

Check the tool output before returning it to the model.

Why: Catches bad data before the model acts on it.

What to validate:

Is the output the expected type? (dict, list, string)
Is the output non-empty?
Does it have required fields?
Is the data reasonable? (e.g., price is positive)

Code example:

def validate_output(tool_name, output):
  # Check it is not None
  if output is None:
    return {"error": f"{tool_name} returned None"}
  
  # Check type
  if tool_name == "search" and not isinstance(output, list):
    return {"error": f"{tool_name} should return list, got {type(output)}"}
  
  # Check required fields
  if tool_name == "get_user" and "id" not in output:
    return {"error": f"{tool_name} missing required field 'id'"}
  
  # Check data is reasonable
  if tool_name == "get_price" and output.get("price", 0) < 0:
    return {"error": f"{tool_name} returned negative price"}
  
  return output

The Circuit Breaker Pattern

Stop the agent if it behaves badly.

Pattern: Track symptoms (looping, high cost, high token usage). When a symptom is detected, break the circuit and stop the agent.

Symptoms to watch:

Same tool called 3+ times in a row
Total tokens exceed 50% of context window
Total cost exceeds budget
Iteration count exceeds max

Code example:

class CircuitBreaker:
  def __init__(self, max_iterations=15, max_cost=1.0, max_consecutive_same_tool=3):
    self.max_iterations = max_iterations
    self.max_cost = max_cost
    self.max_consecutive_same_tool = max_consecutive_same_tool
    
    self.iteration_count = 0
    self.total_cost = 0.0
    self.consecutive_same_tool = 0
    self.last_tool = None
  
  def check_and_update(self, tool_name, cost):
    self.iteration_count += 1
    self.total_cost += cost
    
    # Check iteration limit
    if self.iteration_count > self.max_iterations:
      return False, f"Max iterations ({self.max_iterations}) exceeded"
    
    # Check cost limit
    if self.total_cost > self.max_cost:
      return False, f"Cost limit (${self.max_cost}) exceeded"
    
    # Check for looping (same tool called repeatedly)
    if tool_name == self.last_tool:
      self.consecutive_same_tool += 1
    else:
      self.consecutive_same_tool = 1
    
    self.last_tool = tool_name
    
    if self.consecutive_same_tool > self.max_consecutive_same_tool:
      return False, f"Agent called {tool_name} {self.consecutive_same_tool} times in a row (looping)"
    
    return True, "OK"

# Usage
cb = CircuitBreaker(max_iterations=15, max_cost=1.0, max_consecutive_same_tool=3)

for iteration in range(100):
  tool_call = model.plan()
  result = call_tool(tool_call)
  cost = calculate_cost(result)
  
  ok, message = cb.check_and_update(tool_call["name"], cost)
  if not ok:
    print(f"Circuit breaker triggered: {message}")
    break

Allowlisting Tool Calls

For sensitive operations, only allow specific tools or parameters.

Why: Prevents the agent from calling dangerous tools.

Example: Do not let the agent delete data or send emails without approval.

Code example:

# Define safe and unsafe tools
safe_tools = ["search", "fetch_webpage", "summarize"]
unsafe_tools = ["delete_data", "send_email", "modify_database"]

def filter_tool_calls(tool_name, params):
  # Safe tools: allow always
  if tool_name in safe_tools:
    return call_tool(tool_name, params)
  
  # Unsafe tools: require human approval
  if tool_name in unsafe_tools:
    print(f"REQUIRES APPROVAL: {tool_name}")
    approval = input("Approve this action? (yes/no): ")
    if approval.lower() == "yes":
      return call_tool(tool_name, params)
    else:
      return {"error": "Action denied by user"}
  
  # Unknown tool
  return {"error": f"Unknown tool: {tool_name}"}

Human-in-the-Loop Approval

For high-risk actions, ask a human before the agent proceeds.

When to use:

Before deleting anything
Before sending external communications (email, API calls)
Before large financial transactions
Before accessing sensitive data

Code example:

def call_tool_with_approval(tool_name, params):
  high_risk_tools = ["delete", "send_email", "send_payment"]
  
  if tool_name in high_risk_tools:
    print(f"\nTool: {tool_name}")
    print(f"Parameters: {json.dumps(params, indent=2)}")
    
    approval = input("\nDo you approve this action? (yes/no): ")
    if approval.lower() != "yes":
      return {"status": "denied", "message": "User declined approval"}
  
  return call_tool(tool_name, params)

Checklist: Guardrails for Your Agent

Set max iteration limit (recommend 15)
Set cost cap (recommend $1 for testing)
Add tool call rate limits per tool
Validate all tool outputs
Implement circuit breaker for looping and cost overruns
Allowlist or denylist tools based on risk
Add human approval for high-risk actions
Test guardrails work before deploying

Guardrails are not fancy, but they save you money and headaches.

Discussion

Loading…

← Back to Tutorials

Adding Guardrails and Safety Limits

Protect Your Agent with Guardrails

Guardrail 1: Max Iteration Limit

Guardrail 2: Cost Cap

Guardrail 3: Tool Call Rate Limiting

Guardrail 4: Output Validation

The Circuit Breaker Pattern

Allowlisting Tool Calls

Human-in-the-Loop Approval

Checklist: Guardrails for Your Agent

Related tutorials

Discussion