API Reference
Forensics
The main interface. All functionality is accessed through this class.
from agent_forensics import Forensics
f = Forensics(
session="session-id", # Unique session identifier
agent="agent-name", # Agent name for the trace
db_path="forensics.db", # Path to SQLite database
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
session |
str |
"default" |
Session ID. Use unique IDs to isolate traces. |
agent |
str |
"default-agent" |
Agent name recorded with every event. |
db_path |
str |
"forensics.db" |
Path to the SQLite database file. Created if it doesn't exist. |
Recording Methods
decision()
Record when the agent makes a decision.
f.decision(
action="search_products",
input={"query": "wireless mouse"},
reasoning="User requested product search",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
action |
str |
Yes | What the agent decided to do |
input |
dict |
No | Input data that informed the decision |
reasoning |
str |
No | Why the agent made this decision |
Returns: str — event ID
tool_call()
Record a tool execution. Creates two events: tool_call_start and tool_call_end.
f.tool_call(
action="search_api",
input={"q": "wireless mouse"},
output={"results": [{"name": "Mouse A", "price": 29.99}]},
reasoning="Searching product catalog",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
action |
str |
Yes | Tool name |
input |
dict |
No | Tool input parameters |
output |
dict |
No | Tool output / result |
reasoning |
str |
No | Why this tool was called |
Returns: str — event ID of the tool_call_end event
llm_call()
Record an LLM call with model configuration for deterministic replay.
f.llm_call(
input={"messages": [{"role": "user", "content": "Find a mouse"}]},
output="I found several options...",
model="gpt-4o",
temperature=0.0,
seed=42,
reasoning="Initial product search query",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
input |
dict |
No | What was sent to the LLM |
output |
str |
No | What the LLM returned |
model |
str |
No | Model name (e.g., "gpt-4o", "claude-sonnet-4-20250514") |
temperature |
float |
No | Temperature setting |
seed |
int |
No | Random seed (if supported) |
reasoning |
str |
No | Why this LLM call was made |
Returns: str — event ID
Model config for replay
The model, temperature, and seed parameters are stored as _model_config in the event data. Use get_replay_config() to extract them later.
error()
Record an error or incident.
f.error(
action="purchase_failed",
output={"reason": "Out of stock", "code": 404},
reasoning="API returned stock unavailable",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
action |
str |
Yes | What failed |
output |
dict |
No | Error details |
reasoning |
str |
No | Error context |
finish()
Record the agent's final output.
| Parameter | Type | Required | Description |
|---|---|---|---|
output |
str |
No | Final result text |
reasoning |
str |
No | Why this is the final answer |
guardrail()
Record a guardrail checkpoint — was a critical action allowed or blocked?
f.guardrail(
intent="check price",
action="purchase item",
allowed=True,
reason="Price within approved budget",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
intent |
str |
Yes | What the agent intended to do |
action |
str |
Yes | What the agent actually did or tried |
allowed |
bool |
Yes | Whether the action was permitted |
reason |
str |
No | Why it was allowed or blocked |
Missing guardrails
Critical actions (purchase, delete, send) without a preceding guardrail check trigger the MISSING_APPROVAL failure pattern.
context_injection()
Record when external context is injected (RAG chunks, memory, retrieved docs).
f.context_injection(
source="vector_db",
content={
"document": "refund_policy.md",
"similarity_score": 0.92,
},
reasoning="RAG retrieval for refund question",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
source |
str |
Yes | Where the context came from |
content |
dict |
No | The actual context data |
reasoning |
str |
No | Why this context was injected |
Similarity scores
Include similarity_score in the content dict. Scores below 0.7 trigger the RETRIEVAL_MISMATCH failure pattern.
prompt_state()
Record the current system prompt. Automatically detects drift from the previous state.
| Parameter | Type | Required | Description |
|---|---|---|---|
system_prompt |
str |
Yes | Current system prompt text |
metadata |
dict |
No | Additional info (version, source, etc.) |
When the prompt changes from the previous call, the event type becomes prompt_drift instead of prompt_state, and a diff is computed automatically.
record()
Record a generic event (for custom event types).
f.record(
event_type="custom_check",
action="validate_output",
input={"schema": "order"},
output={"valid": True},
reasoning="Output validation step",
)
Analysis Methods
report()
Generate the full Markdown forensic report.
Returns: str — complete Markdown report
save_markdown() / save_pdf()
Save the report to a file.
f.save_markdown("./reports") # → ./reports/forensics-report-session-id.md
f.save_pdf("./reports") # → ./reports/forensics-report-session-id.pdf
save_pdf() requires the pdf extra: pip install agent-forensics[pdf]
classify()
Auto-classify failure patterns in a session trace.
failures = f.classify() # Current session
failures = f.classify(session_id="other") # Specific session
Returns: list[dict] — each dict contains:
{
"type": "MISSING_APPROVAL", # Failure pattern name
"severity": "HIGH", # HIGH / MEDIUM / LOW
"description": "Critical action...",
"evidence": {"action": "purchase", ...},
"step": 5, # Position in timeline
}
See Failure Patterns for all pattern types.
failure_stats()
Aggregate failure patterns across multiple sessions.
stats = f.failure_stats() # All sessions
stats = f.failure_stats(session_ids=["s1", "s2"]) # Specific sessions
Returns:
{
"total_failures": 12,
"by_type": {
"MISSING_APPROVAL": {"count": 5, "description": "...", "severities": [...]},
...
},
"by_severity": {"HIGH": 7, "MEDIUM": 4, "LOW": 1},
}
add_pattern()
Register a custom failure pattern detector.
def detect_large_purchase(events):
failures = []
for i, e in enumerate(events):
if e.event_type == "decision" and "purchase" in e.action.lower():
total = e.input_data.get("total", 0)
if isinstance(total, (int, float)) and total > 10000:
failures.append({
"type": "LARGE_PURCHASE",
"severity": "HIGH",
"description": f"Purchase of ${total:,.0f} exceeds threshold",
"evidence": {"total": total},
"step": i + 1,
})
return failures
f.add_pattern(detect_large_purchase)
The detector must be a callable that takes list[Event] and returns list[dict] in the same format as built-in patterns.
on_failure()
Register a callback or webhook to fire when failures are detected.
# Callback
f.on_failure(lambda failures: print(f"ALERT: {len(failures)} failures!"), min_severity="HIGH")
# Webhook (Slack, Discord, etc.)
f.on_failure(None, webhook="https://hooks.slack.com/services/...", min_severity="HIGH")
| Parameter | Type | Required | Description |
|---|---|---|---|
callback |
callable |
Yes | Function that receives matching failures (pass None for webhook-only) |
min_severity |
str |
No | Minimum severity to trigger ("HIGH", "MEDIUM", "LOW") |
webhook |
str |
No | URL to POST failure data to |
handoff()
Record an agent-to-agent handoff in multi-agent systems.
f.handoff(
to_agent="executor",
context={"task": "buy mouse", "budget": 100},
reasoning="Delegating purchase to executor agent",
)
| Parameter | Type | Required | Description |
|---|---|---|---|
to_agent |
str |
Yes | Agent receiving the handoff |
context |
dict |
No | Data passed to the next agent |
reasoning |
str |
No | Why this handoff is happening |
agent_stats()
Get per-agent breakdown of events and failures in a session.
Returns:
{
"agents": {
"planner": {"events": 5, "decisions": 2, "errors": 0, "tools": 2, "failures": []},
"executor": {"events": 8, "decisions": 3, "errors": 1, "tools": 4, "failures": [...]},
},
"handoffs": [{"from": "planner", "to": "executor", "reasoning": "..."}],
"handoff_chain": ["planner", "executor"],
"total_agents": 2,
"is_multi_agent": True,
}
get_replay_config()
Extract model config and step sequence from a recorded session.
Returns:
{
"session_id": "session-123",
"model_config": {"model": "gpt-4o", "temperature": 0, "seed": 42},
"steps": [{"type": "decision", "action": "...", ...}, ...],
"tool_responses": {"tool_result": {...}},
"total_events": 15,
}
replay_diff()
Compare two sessions (original vs replay) and return differences.
Returns:
{
"original_session": "original-session",
"replay_session": "replay-session",
"matching": False,
"divergences": [
{"step": 3, "type": "diverged", "original": {...}, "replay": {...}},
],
}
Divergence types: diverged, extra_in_replay, missing_in_replay
Query Methods
events()
Return all events for the current session.
Returns: list[Event]
sessions()
Return all session IDs in the database.
Integration Methods
langchain()
Return a LangChain callback handler.
openai_agents()
Return OpenAI Agents SDK hooks.
crewai()
Return CrewAI callback collection.
Event
The Event dataclass represents a single recorded event.
| Field | Type | Description |
|---|---|---|
timestamp |
str |
ISO format UTC timestamp |
event_type |
str |
Event type (decision, tool_call_start, error, etc.) |
agent_id |
str |
Agent name |
action |
str |
What happened |
input_data |
dict |
Input data |
output_data |
dict |
Output data |
reasoning |
str |
Why this event occurred |
session_id |
str |
Session this event belongs to |
event_id |
str |
Unique event identifier |
EventStore
Low-level access to the SQLite event store.