Agent Forensics
Black box for AI agents. Capture every decision, auto-detect failure patterns, generate forensic reports for EU AI Act compliance.
When an AI agent makes a wrong purchase, leaks data, or fails silently — you need to know why. Agent Forensics records every decision point, tool call, and LLM interaction, then reconstructs the causal chain and auto-classifies what went wrong.
Why
- EU AI Act (Aug 2026): High-risk AI systems must provide decision traceability. Fines up to €35M or 7% of global revenue.
- AI agents are already causing incidents: unauthorized purchases, fabricated customer responses, silent data leaks.
- No existing tool reconstructs the why behind agent failures. Monitoring tools watch in real-time. Forensics analyzes after the fact.
Key Features
- Framework-agnostic — works with any agent (LangChain, OpenAI, CrewAI, or custom)
- One-line integration — add a callback handler, get full forensics
- 6 failure patterns — auto-detected with severity and evidence
- Prompt drift detection — catches when system prompts change mid-session
- Deterministic replay — reproduce and compare agent runs
- Compliance-ready — reports aligned with EU AI Act Article 14
Quick Example
from agent_forensics import Forensics
f = Forensics(session="order-123", agent="shopping-agent")
f.decision("search_products", input={"query": "mouse"}, reasoning="User request")
f.tool_call("search_api", input={"q": "mouse"}, output={"results": [...]})
f.guardrail(intent="check price", action="purchase", allowed=True, reason="Within budget")
f.finish("Ordered Logitech M750 for $45")
# Generate forensic report
print(f.report())
# Auto-classify failures
failures = f.classify()
Architecture
Your Agent (any framework)
│
│ Callback / Hook (1 line of code)
▼
┌───────────────────────────┐
│ Forensics Collector │ Captures decisions, tool calls, LLM interactions
├───────────────────────────┤
│ Context & Prompt Tracker │ Tracks RAG injections + prompt drift
├───────────────────────────┤
│ Event Store (SQLite) │ Immutable event log with session isolation
├───────────────────────────┤
│ Failure Classifier │ Auto-detects 6 failure patterns
├───────────────────────────┤
│ Report Generator │ Markdown / PDF / Dashboard
├───────────────────────────┤
│ Replay Engine │ Deterministic trace reproduction + diff
└───────────────────────────┘
Next Steps
- Getting Started — install and generate your first report
- API Reference — full method documentation
- Integrations — LangChain, OpenAI Agents SDK, CrewAI
- Failure Patterns — what each pattern means and how to fix it
- EU AI Act Compliance — how Agent Forensics maps to Article 14