Skip to content

API Reference

Forensics

The main interface. All functionality is accessed through this class.

from agent_forensics import Forensics

f = Forensics(
    session="session-id",       # Unique session identifier
    agent="agent-name",         # Agent name for the trace
    db_path="forensics.db",     # Path to SQLite database
)

Parameters:

Parameter Type Default Description
session str "default" Session ID. Use unique IDs to isolate traces.
agent str "default-agent" Agent name recorded with every event.
db_path str "forensics.db" Path to the SQLite database file. Created if it doesn't exist.

Recording Methods

decision()

Record when the agent makes a decision.

f.decision(
    action="search_products",
    input={"query": "wireless mouse"},
    reasoning="User requested product search",
)
Parameter Type Required Description
action str Yes What the agent decided to do
input dict No Input data that informed the decision
reasoning str No Why the agent made this decision

Returns: str — event ID


tool_call()

Record a tool execution. Creates two events: tool_call_start and tool_call_end.

f.tool_call(
    action="search_api",
    input={"q": "wireless mouse"},
    output={"results": [{"name": "Mouse A", "price": 29.99}]},
    reasoning="Searching product catalog",
)
Parameter Type Required Description
action str Yes Tool name
input dict No Tool input parameters
output dict No Tool output / result
reasoning str No Why this tool was called

Returns: str — event ID of the tool_call_end event


llm_call()

Record an LLM call with model configuration for deterministic replay.

f.llm_call(
    input={"messages": [{"role": "user", "content": "Find a mouse"}]},
    output="I found several options...",
    model="gpt-4o",
    temperature=0.0,
    seed=42,
    reasoning="Initial product search query",
)
Parameter Type Required Description
input dict No What was sent to the LLM
output str No What the LLM returned
model str No Model name (e.g., "gpt-4o", "claude-sonnet-4-20250514")
temperature float No Temperature setting
seed int No Random seed (if supported)
reasoning str No Why this LLM call was made

Returns: str — event ID

Model config for replay

The model, temperature, and seed parameters are stored as _model_config in the event data. Use get_replay_config() to extract them later.


error()

Record an error or incident.

f.error(
    action="purchase_failed",
    output={"reason": "Out of stock", "code": 404},
    reasoning="API returned stock unavailable",
)
Parameter Type Required Description
action str Yes What failed
output dict No Error details
reasoning str No Error context

finish()

Record the agent's final output.

f.finish(
    output="Ordered Logitech M750 for $45",
    reasoning="Purchase completed successfully",
)
Parameter Type Required Description
output str No Final result text
reasoning str No Why this is the final answer

guardrail()

Record a guardrail checkpoint — was a critical action allowed or blocked?

f.guardrail(
    intent="check price",
    action="purchase item",
    allowed=True,
    reason="Price within approved budget",
)
Parameter Type Required Description
intent str Yes What the agent intended to do
action str Yes What the agent actually did or tried
allowed bool Yes Whether the action was permitted
reason str No Why it was allowed or blocked

Missing guardrails

Critical actions (purchase, delete, send) without a preceding guardrail check trigger the MISSING_APPROVAL failure pattern.


context_injection()

Record when external context is injected (RAG chunks, memory, retrieved docs).

f.context_injection(
    source="vector_db",
    content={
        "document": "refund_policy.md",
        "similarity_score": 0.92,
    },
    reasoning="RAG retrieval for refund question",
)
Parameter Type Required Description
source str Yes Where the context came from
content dict No The actual context data
reasoning str No Why this context was injected

Similarity scores

Include similarity_score in the content dict. Scores below 0.7 trigger the RETRIEVAL_MISMATCH failure pattern.


prompt_state()

Record the current system prompt. Automatically detects drift from the previous state.

f.prompt_state(
    system_prompt="You are a helpful shopping assistant.",
    metadata={"version": 2},
)
Parameter Type Required Description
system_prompt str Yes Current system prompt text
metadata dict No Additional info (version, source, etc.)

When the prompt changes from the previous call, the event type becomes prompt_drift instead of prompt_state, and a diff is computed automatically.


record()

Record a generic event (for custom event types).

f.record(
    event_type="custom_check",
    action="validate_output",
    input={"schema": "order"},
    output={"valid": True},
    reasoning="Output validation step",
)

Analysis Methods

report()

Generate the full Markdown forensic report.

markdown = f.report()
print(markdown)

Returns: str — complete Markdown report


save_markdown() / save_pdf()

Save the report to a file.

f.save_markdown("./reports")   # → ./reports/forensics-report-session-id.md
f.save_pdf("./reports")        # → ./reports/forensics-report-session-id.pdf

save_pdf() requires the pdf extra: pip install agent-forensics[pdf]


classify()

Auto-classify failure patterns in a session trace.

failures = f.classify()                    # Current session
failures = f.classify(session_id="other")  # Specific session

Returns: list[dict] — each dict contains:

{
    "type": "MISSING_APPROVAL",       # Failure pattern name
    "severity": "HIGH",               # HIGH / MEDIUM / LOW
    "description": "Critical action...",
    "evidence": {"action": "purchase", ...},
    "step": 5,                        # Position in timeline
}

See Failure Patterns for all pattern types.


failure_stats()

Aggregate failure patterns across multiple sessions.

stats = f.failure_stats()                              # All sessions
stats = f.failure_stats(session_ids=["s1", "s2"])      # Specific sessions

Returns:

{
    "total_failures": 12,
    "by_type": {
        "MISSING_APPROVAL": {"count": 5, "description": "...", "severities": [...]},
        ...
    },
    "by_severity": {"HIGH": 7, "MEDIUM": 4, "LOW": 1},
}

add_pattern()

Register a custom failure pattern detector.

def detect_large_purchase(events):
    failures = []
    for i, e in enumerate(events):
        if e.event_type == "decision" and "purchase" in e.action.lower():
            total = e.input_data.get("total", 0)
            if isinstance(total, (int, float)) and total > 10000:
                failures.append({
                    "type": "LARGE_PURCHASE",
                    "severity": "HIGH",
                    "description": f"Purchase of ${total:,.0f} exceeds threshold",
                    "evidence": {"total": total},
                    "step": i + 1,
                })
    return failures

f.add_pattern(detect_large_purchase)

The detector must be a callable that takes list[Event] and returns list[dict] in the same format as built-in patterns.


on_failure()

Register a callback or webhook to fire when failures are detected.

# Callback
f.on_failure(lambda failures: print(f"ALERT: {len(failures)} failures!"), min_severity="HIGH")

# Webhook (Slack, Discord, etc.)
f.on_failure(None, webhook="https://hooks.slack.com/services/...", min_severity="HIGH")
Parameter Type Required Description
callback callable Yes Function that receives matching failures (pass None for webhook-only)
min_severity str No Minimum severity to trigger ("HIGH", "MEDIUM", "LOW")
webhook str No URL to POST failure data to

handoff()

Record an agent-to-agent handoff in multi-agent systems.

f.handoff(
    to_agent="executor",
    context={"task": "buy mouse", "budget": 100},
    reasoning="Delegating purchase to executor agent",
)
Parameter Type Required Description
to_agent str Yes Agent receiving the handoff
context dict No Data passed to the next agent
reasoning str No Why this handoff is happening

agent_stats()

Get per-agent breakdown of events and failures in a session.

stats = f.agent_stats()

Returns:

{
    "agents": {
        "planner": {"events": 5, "decisions": 2, "errors": 0, "tools": 2, "failures": []},
        "executor": {"events": 8, "decisions": 3, "errors": 1, "tools": 4, "failures": [...]},
    },
    "handoffs": [{"from": "planner", "to": "executor", "reasoning": "..."}],
    "handoff_chain": ["planner", "executor"],
    "total_agents": 2,
    "is_multi_agent": True,
}

get_replay_config()

Extract model config and step sequence from a recorded session.

config = f.get_replay_config("session-123")

Returns:

{
    "session_id": "session-123",
    "model_config": {"model": "gpt-4o", "temperature": 0, "seed": 42},
    "steps": [{"type": "decision", "action": "...", ...}, ...],
    "tool_responses": {"tool_result": {...}},
    "total_events": 15,
}

replay_diff()

Compare two sessions (original vs replay) and return differences.

diff = f.replay_diff("original-session", "replay-session")

Returns:

{
    "original_session": "original-session",
    "replay_session": "replay-session",
    "matching": False,
    "divergences": [
        {"step": 3, "type": "diverged", "original": {...}, "replay": {...}},
    ],
}

Divergence types: diverged, extra_in_replay, missing_in_replay


Query Methods

events()

Return all events for the current session.

events = f.events()
for e in events:
    print(f"[{e.event_type}] {e.action}")

Returns: list[Event]

sessions()

Return all session IDs in the database.

print(f.sessions())  # ['order-123', 'order-456']

Integration Methods

langchain()

Return a LangChain callback handler.

handler = f.langchain()
agent.invoke({"input": "..."}, config={"callbacks": [handler]})

openai_agents()

Return OpenAI Agents SDK hooks.

hooks = f.openai_agents()
agent = Agent(name="shopper", hooks=hooks)

crewai()

Return CrewAI callback collection.

hooks = f.crewai()
agent = Agent(role="...", step_callback=hooks.step_callback)

Event

The Event dataclass represents a single recorded event.

from agent_forensics import Event
Field Type Description
timestamp str ISO format UTC timestamp
event_type str Event type (decision, tool_call_start, error, etc.)
agent_id str Agent name
action str What happened
input_data dict Input data
output_data dict Output data
reasoning str Why this event occurred
session_id str Session this event belongs to
event_id str Unique event identifier

EventStore

Low-level access to the SQLite event store.

from agent_forensics import EventStore

store = EventStore("forensics.db")
events = store.get_session_events("session-123")
sessions = store.get_all_sessions()