Adding Guardrails and Safety Limits

Protect Your Agent with Guardrails

Guardrails stop your agent from wasting money, looping forever, or taking dangerous actions. Add them before you deploy.

Guardrail 1: Max Iteration Limit

Stop the agent after N tool calls.

Why: Prevents infinite loops and runaway costs.

How much is enough?

  • Simple task (answer a question): 5 steps
  • Medium task (research something): 10 steps
  • Complex task (multi-step workflow): 20 steps
  • Rarely need more than 30 steps

Code example:

max_iterations = 15

for iteration in range(max_iterations):
  if iteration >= max_iterations - 1:
    return {"status": "max_iterations_reached", "response": agent_response}
  
  tool_call = model.plan()
  result = call_tool(tool_call)
  agent_response += format_result(result)

Guardrail 2: Cost Cap

Stop the agent when it hits a cost limit.

Why: Prevents $500 bills from a single agent run.

How much is enough?

  • Free tier experiments: $0.10
  • User-facing feature: $0.50 to $5.00
  • Internal tool: $10 to $50

Code example:

from openai import OpenAI

max_cost = 1.00  # $1 per run
total_cost = 0.0
client = OpenAI()

def track_cost(tokens_used, model):
  global total_cost
  # GPT-4: $0.03 per 1K input, $0.06 per 1K output
  cost = (tokens_used / 1000) * 0.045
  total_cost += cost
  
  if total_cost > max_cost:
    raise Exception(f"Cost limit exceeded: ${total_cost:.2f}")
  
  return total_cost

try:
  while True:
    response = client.chat.completions.create(...)
    cost = track_cost(response.usage.total_tokens, "gpt-4")
    print(f"Cost so far: ${cost:.2f}")
except Exception as e:
  print(f"Agent stopped: {e}")

Guardrail 3: Tool Call Rate Limiting

Allow the agent to call each tool only N times.

Why: Prevents the agent from spamming the same API.

Example limits:

  • Search tool: 5 calls per run
  • External API: 3 calls per run
  • Email tool: 1 call per run

Code example:

from collections import defaultdict

tool_call_limits = {
  "search": 5,
  "fetch_webpage": 5,
  "send_email": 1,
  "database_query": 10
}

tool_call_count = defaultdict(int)

def call_tool_with_rate_limit(tool_name, params):
  if tool_name not in tool_call_limits:
    return call_tool(tool_name, params)
  
  tool_call_count[tool_name] += 1
  limit = tool_call_limits[tool_name]
  
  if tool_call_count[tool_name] > limit:
    return {
      "error": f"Rate limit exceeded for {tool_name} ({limit} calls max)"
    }
  
  return call_tool(tool_name, params)

Guardrail 4: Output Validation

Check the tool output before returning it to the model.

Why: Catches bad data before the model acts on it.

What to validate:

  • Is the output the expected type? (dict, list, string)
  • Is the output non-empty?
  • Does it have required fields?
  • Is the data reasonable? (e.g., price is positive)

Code example:

def validate_output(tool_name, output):
  # Check it is not None
  if output is None:
    return {"error": f"{tool_name} returned None"}
  
  # Check type
  if tool_name == "search" and not isinstance(output, list):
    return {"error": f"{tool_name} should return list, got {type(output)}"}
  
  # Check required fields
  if tool_name == "get_user" and "id" not in output:
    return {"error": f"{tool_name} missing required field 'id'"}
  
  # Check data is reasonable
  if tool_name == "get_price" and output.get("price", 0) < 0:
    return {"error": f"{tool_name} returned negative price"}
  
  return output

The Circuit Breaker Pattern

Stop the agent if it behaves badly.

Pattern: Track symptoms (looping, high cost, high token usage). When a symptom is detected, break the circuit and stop the agent.

Symptoms to watch:

  • Same tool called 3+ times in a row
  • Total tokens exceed 50% of context window
  • Total cost exceeds budget
  • Iteration count exceeds max

Code example:

class CircuitBreaker:
  def __init__(self, max_iterations=15, max_cost=1.0, max_consecutive_same_tool=3):
    self.max_iterations = max_iterations
    self.max_cost = max_cost
    self.max_consecutive_same_tool = max_consecutive_same_tool
    
    self.iteration_count = 0
    self.total_cost = 0.0
    self.consecutive_same_tool = 0
    self.last_tool = None
  
  def check_and_update(self, tool_name, cost):
    self.iteration_count += 1
    self.total_cost += cost
    
    # Check iteration limit
    if self.iteration_count > self.max_iterations:
      return False, f"Max iterations ({self.max_iterations}) exceeded"
    
    # Check cost limit
    if self.total_cost > self.max_cost:
      return False, f"Cost limit (${self.max_cost}) exceeded"
    
    # Check for looping (same tool called repeatedly)
    if tool_name == self.last_tool:
      self.consecutive_same_tool += 1
    else:
      self.consecutive_same_tool = 1
    
    self.last_tool = tool_name
    
    if self.consecutive_same_tool > self.max_consecutive_same_tool:
      return False, f"Agent called {tool_name} {self.consecutive_same_tool} times in a row (looping)"
    
    return True, "OK"

# Usage
cb = CircuitBreaker(max_iterations=15, max_cost=1.0, max_consecutive_same_tool=3)

for iteration in range(100):
  tool_call = model.plan()
  result = call_tool(tool_call)
  cost = calculate_cost(result)
  
  ok, message = cb.check_and_update(tool_call["name"], cost)
  if not ok:
    print(f"Circuit breaker triggered: {message}")
    break

Allowlisting Tool Calls

For sensitive operations, only allow specific tools or parameters.

Why: Prevents the agent from calling dangerous tools.

Example: Do not let the agent delete data or send emails without approval.

Code example:

# Define safe and unsafe tools
safe_tools = ["search", "fetch_webpage", "summarize"]
unsafe_tools = ["delete_data", "send_email", "modify_database"]

def filter_tool_calls(tool_name, params):
  # Safe tools: allow always
  if tool_name in safe_tools:
    return call_tool(tool_name, params)
  
  # Unsafe tools: require human approval
  if tool_name in unsafe_tools:
    print(f"REQUIRES APPROVAL: {tool_name}")
    approval = input("Approve this action? (yes/no): ")
    if approval.lower() == "yes":
      return call_tool(tool_name, params)
    else:
      return {"error": "Action denied by user"}
  
  # Unknown tool
  return {"error": f"Unknown tool: {tool_name}"}

Human-in-the-Loop Approval

For high-risk actions, ask a human before the agent proceeds.

When to use:

  • Before deleting anything
  • Before sending external communications (email, API calls)
  • Before large financial transactions
  • Before accessing sensitive data

Code example:

def call_tool_with_approval(tool_name, params):
  high_risk_tools = ["delete", "send_email", "send_payment"]
  
  if tool_name in high_risk_tools:
    print(f"\nTool: {tool_name}")
    print(f"Parameters: {json.dumps(params, indent=2)}")
    
    approval = input("\nDo you approve this action? (yes/no): ")
    if approval.lower() != "yes":
      return {"status": "denied", "message": "User declined approval"}
  
  return call_tool(tool_name, params)

Checklist: Guardrails for Your Agent

  • Set max iteration limit (recommend 15)
  • Set cost cap (recommend $1 for testing)
  • Add tool call rate limits per tool
  • Validate all tool outputs
  • Implement circuit breaker for looping and cost overruns
  • Allowlist or denylist tools based on risk
  • Add human approval for high-risk actions
  • Test guardrails work before deploying

Guardrails are not fancy, but they save you money and headaches.

Discussion

  • Loading…

← Back to Tutorials