AI Agent Infinite
Loop Detection.
AI agent infinite loop detection is the practice of identifying repetitive or recursive tool call patterns in real time and triggering a circuit breaker before costs escalate. Unlike simple iteration counters, proper loop detection uses pattern matching, semantic similarity, and frequency analysis to catch all three major loop types: exact repetition, near-identical repetition, and behavioral loops.
TL;DR
- Agents loop for four distinct reasons — each requires a different detection strategy.
- String-match detection catches exact repetition in O(1) time using call hashing.
- Semantic similarity detection catches near-identical loops using cosine distance between embeddings.
- Frequency analysis catches behavioral loops regardless of call content.
- A three-state circuit breaker (CLOSED → OPEN → HALF-OPEN) provides the cleanest architectural response.
Why Agents Loop
Infinite loops in AI agents are not the result of buggy logic in the conventional sense. They emerge from the intersection of LLM non-determinism and tool call architecture. There are four documented causes, each with a distinct failure signature.
Erroneous Error Interpretation
When a tool returns an error, the LLM interprets the error message as a signal that the task failed and should be retried. It has no built-in understanding of transient vs. permanent errors, and no mechanism to prevent retrying immediately. A rate-limit error is particularly dangerous: the API will keep returning 429 for 60 seconds, and the agent will keep calling it, generating 1,000 identical calls before the window expires.
# Tool returns HTTP 429 (rate limit)
tool_result = {"error": "rate_limit_exceeded", "message": "Too many requests", "retry_after": 60}
# LLM interprets: "The task failed, I should try again"
# No built-in mechanism prevents retrying immediately
# Result: 1,000 identical calls in 60 secondsHallucinated Incompletion
After completing a long task, the LLM may hallucinate that it did not actually finish. This is more likely with tasks involving large datasets or multi-step processes where the agent cannot easily verify its own output. The result is a full restart from scratch — consuming the same resources as the original run — repeated until the budget is exhausted.
# Agent completes a report generation task # LLM internal reasoning: "Did I actually send all 500 reports? # I'm not sure. Let me check by running the task again." # Runs the full task again from scratch # Repeats until budget is exhausted
Tool Dependency Cycle
Circular dependencies between tools create infinite recursion at the infrastructure level, not the LLM level. Tool A needs output from Tool B to run, and Tool B needs output from Tool A to initialize. Unlike the other loop types, this one is deterministic — it will always loop given the same dependency graph, regardless of LLM behavior.
# Tool A needs output from Tool B
# Tool B needs output from Tool A to initialize
def tool_a(context):
b_result = tool_b(context) # Tool A calls Tool B
return process(b_result)
def tool_b(context):
a_result = tool_a(context) # Tool B calls Tool A
return transform(a_result) # → infinite recursionRecursive Agent Spawning
In multi-agent systems, orchestrator agents that spawn sub-agents can create exponential growth if there is no depth limit. Each sub-agent spawns its own sub-agents, and the tree grows geometrically. By depth 5 with a branching factor of 3, you have 243 simultaneous agents — each consuming tokens and making tool calls in parallel. This is the most expensive loop type because the cost compounds multiplicatively.
# Orchestrator spawns sub-agents that each spawn more sub-agents # With no depth limit, this creates an exponential tree # Depth 1: 3 agents, Depth 2: 9, Depth 3: 27, Depth 5: 243 simultaneous agents
Three Detection Strategies
No single detection strategy catches all loop types. A production circuit breaker combines all three, applying each in sequence from fastest to slowest. String-match is O(1) and runs first; frequency analysis is O(n) over the time window; semantic similarity is the most expensive and runs last.
String-Match Detection
Fast · catches exact loopsHash each tool call (name + serialized arguments) and maintain a rolling window of recent call hashes. If the same hash appears more than the threshold within the window, a loop is confirmed. This is the most efficient strategy: O(1) lookup per call, zero external dependencies, and zero false negatives for exact repetition.
from collections import deque
import hashlib
class StringMatchDetector:
def __init__(self, window_size=10, threshold=3):
self.call_history = deque(maxlen=window_size)
self.threshold = threshold
def check(self, tool_name: str, args: dict) -> bool:
call_hash = hashlib.md5(
f"{tool_name}:{json.dumps(args, sort_keys=True)}".encode()
).hexdigest()
identical_count = self.call_history.count(call_hash)
self.call_history.append(call_hash)
if identical_count >= self.threshold:
raise LoopDetected(f"Tool '{tool_name}' called {identical_count+1}x with identical args")
return TrueCatches
Exact repetition — the most common loop type (Cause 1 and Cause 2)
Misses
Near-identical calls with minor argument variation (e.g., incrementing page numbers)
Semantic Similarity Detection
Slower · catches near-identical loopsEmbed each call as a vector using a lightweight sentence transformer model. Compare the new embedding against recent embeddings using cosine similarity. If any previous call is above the similarity threshold, the calls are semantically equivalent — a near-identical loop is detected. This catches the hard cases that string-match misses: search queries with slightly different phrasing, API calls with minor parameter variations, and reformulated research tasks.
from sentence_transformers import SentenceTransformer
import numpy as np
class SemanticLoopDetector:
def __init__(self, threshold=0.95, window=5):
self.model = SentenceTransformer('all-MiniLM-L6-v2')
self.recent_embeddings = deque(maxlen=window)
self.threshold = threshold
def check(self, tool_name: str, args: dict) -> bool:
call_text = f"{tool_name}: {json.dumps(args)}"
embedding = self.model.encode(call_text)
for prev_embedding in self.recent_embeddings:
similarity = np.dot(embedding, prev_embedding) # cosine similarity
if similarity > self.threshold:
raise SemanticLoopDetected(f"Semantically similar call detected (similarity: {similarity:.3f})")
self.recent_embeddings.append(embedding)
return TrueExample: catches "search for 'AI security'" followed by "search for 'AI security tools'" — same intent, different string, same loop.
Frequency Analysis
Content-agnostic · catches behavioral loopsTrack how many times each tool is called within a sliding time window. If any tool exceeds the frequency threshold, halt regardless of whether the individual calls are identical or semantically similar. This strategy catches behavioral loops that neither string-match nor semantic similarity can detect: an agent that cycles through 50 different search queries is running a behavioral loop even though no two queries are the same.
from collections import defaultdict
import time
class FrequencyDetector:
def __init__(self, max_calls_per_window=20, window_seconds=60):
self.call_counts = defaultdict(list)
self.max_calls = max_calls_per_window
self.window = window_seconds
def check(self, tool_name: str) -> bool:
now = time.time()
# Remove calls outside the window
self.call_counts[tool_name] = [
t for t in self.call_counts[tool_name]
if now - t < self.window
]
self.call_counts[tool_name].append(now)
if len(self.call_counts[tool_name]) > self.max_calls:
raise FrequencyLoopDetected(
f"Tool '{tool_name}' called {len(self.call_counts[tool_name])}x "
f"in {self.window}s (max: {self.max_calls})"
)
return TrueThe Circuit Breaker Pattern
The circuit breaker is a formal software design pattern originally developed for distributed systems to prevent cascade failures. When applied to AI agents, it provides the architectural response layer on top of loop detection: the detectors identify that a loop is happening; the circuit breaker decides what to do about it. The pattern operates across three states:
State
CLOSED
Normal operation. All tool calls pass through. Failures are counted within the window. When failure count reaches the threshold, the breaker trips.
State
OPEN
Tripped. All tool calls are immediately rejected without execution. The agent receives a CircuitBreakerOpen error. After the timeout expires, transitions to HALF-OPEN.
State
HALF-OPEN
Recovery testing. A single test call is allowed through. If it succeeds, the breaker resets to CLOSED. If it fails, it returns to OPEN.
failure_count >= threshold
CLOSED ────────────────────────────────────► OPEN
▲ │
│ │ timeout expires
│ ▼
│ success HALF-OPEN
└────────────────────────────────────────────┘
(test call succeeds)Here is a complete Python implementation from scratch:
import time
from enum import Enum
from functools import wraps
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Tripped, blocking all calls
HALF_OPEN = "half_open" # Testing recovery
class AgentCircuitBreaker:
def __init__(self, failure_threshold=10, timeout=60, window=60):
self.state = CircuitState.CLOSED
self.failure_count = 0
self.last_failure_time = None
self.failure_threshold = failure_threshold
self.timeout = timeout # seconds before trying HALF_OPEN
self.window = window # seconds to count failures in
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = CircuitState.HALF_OPEN
else:
raise CircuitBreakerOpen("Circuit breaker is OPEN — agent halted")
try:
result = func(*args, **kwargs)
if self.state == CircuitState.HALF_OPEN:
self._reset() # Recovery succeeded
return result
except Exception as e:
self._record_failure()
raise
def _record_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
def _reset(self):
self.state = CircuitState.CLOSED
self.failure_count = 0The SupraWall protect() wrapper replaces all of the above with a single configuration block. It combines all three detection strategies and handles the state machine internally:
from suprawall import protect
secured = protect(agent, budget={
"circuit_breaker": {
"strategy": "combined", # uses all three detection strategies
"max_identical_calls": 10, # string-match threshold
"semantic_threshold": 0.95, # cosine similarity threshold
"max_tool_frequency": 20, # calls per minute per tool
"window_seconds": 60,
"recovery_timeout": 300, # 5 minutes before HALF_OPEN
}
})Graceful Degradation
When the circuit trips, you have three choices for what happens next. The right choice depends on the criticality of the task and how much human oversight you want in the loop. These are ranked from most to least conservative.
Halt Gracefully
Raise a structured exception and return a status object to the orchestrator. The agent receives enough context to log partial results, report the incident, and halt cleanly. This is the safest option and the correct default for production systems where a partial result is better than a corrupt one.
on_circuit_break = "halt"
# Agent receives: {"status": "halted", "reason": "circuit_breaker", "incident_id": "..."}Notify and Pause
Send a webhook notification and pause the agent until a human approves resumption. This is the right choice for business-critical agents where you cannot afford to lose partial progress, but also cannot afford to let a runaway loop continue unchecked. The human reviews the loop evidence and either approves a resume or confirms the halt.
on_circuit_break = {
"action": "pause",
"notify": "https://hooks.slack.com/...",
"resume_after_approval": True
}Degrade Gracefully
Skip the problematic tool and continue executing the agent with reduced capabilities. This is the least conservative option and should only be used when the looping tool is optional for task completion. For example, if a web search tool is looping, the agent can continue generating a response from its training knowledge without search augmentation.
on_circuit_break = {
"action": "degrade",
"skip_tool": True, # skip the problematic tool
"continue_without": ["web_search"] # proceed without this tool
}Testing Loop Detection
Loop detection should be tested before deploying to production. The testing strategy is straightforward: create a fixture with a tool that deterministically triggers the circuit breaker, configure a low threshold, and assert that the breaker trips within the expected number of calls. This pattern works for all three detection strategies.
import pytest
from unittest.mock import patch
from suprawall import protect
from suprawall.exceptions import CircuitBreakerTripped
@pytest.fixture
def looping_tool():
"""Tool that always returns an error, simulating a stuck retry loop"""
call_count = {"n": 0}
def tool(query: str) -> str:
call_count["n"] += 1
return {"error": "rate_limit_exceeded", "attempt": call_count["n"]}
return tool, call_count
def test_circuit_breaker_triggers(looping_tool):
tool_fn, call_count = looping_tool
secured = protect(
agent_with_tool(tool_fn),
budget={"circuit_breaker": {"max_identical_calls": 5, "window_seconds": 30}}
)
with pytest.raises(CircuitBreakerTripped) as exc_info:
secured.invoke({"input": "Run the search that will loop"})
assert call_count["n"] <= 6, f"Circuit breaker should have triggered before call {call_count['n']}"
assert "circuit_breaker" in str(exc_info.value).lower()Write one test per detection strategy using different fixture types: an exact-repeat fixture for string-match detection, a semantically-similar fixture for semantic detection, and a rapid-fire fixture (calls made in a tight loop) for frequency detection. All three tests should be part of your CI pipeline and must pass before every production deployment.
Frequently Asked Questions
What is an AI agent circuit breaker?
A circuit breaker monitors tool call patterns and trips to the OPEN state (blocking all calls) when a loop is detected. It prevents runaway cost escalation by catching infinite loops within seconds rather than hours.
How is this different from LangChain's max_iterations?
max_iterations counts total steps regardless of repetition. A circuit breaker specifically detects repetitive patterns — it will trip after 5 identical calls in 60 seconds even if max_iterations is set to 1,000.
Can I tune the sensitivity of loop detection?
Yes. max_identical_calls (default: 10), window_seconds (default: 60), and semantic_threshold (default: 0.95) are all configurable. For agents that legitimately retry failing calls, increase the threshold.
What happens to in-progress work when the circuit breaks?
SupraWall returns a structured CircuitBreakerTripped exception to the agent, which can be caught to save partial results before halting.
How do I test that my loop detection works before deploying?
Write a pytest fixture with a tool that always returns an error, configure the circuit breaker with a low threshold (3-5 calls), and assert CircuitBreakerTripped is raised within the expected number of calls.
Related