Knowledge Hub • Compliance Logging

AI Agent Audit Trail
& Logging.

An AI agent audit trail is a tamper-proof chronological record of every tool call, policy decision, and agent action. SupraWall generates structured audit logs that satisfy EU AI Act Article 12 technical documentation requirements and support forensic investigation of incidents — with risk scores, session IDs, and cost attribution on every entry.

TL;DR

  • Article 12 of the EU AI Act mandates logging capabilities for high-risk AI systems — logs must be automatic, comprehensive, and retained for the system's operational lifetime.
  • Logs must capture what the agent did, not just what it said — tool call arguments, policy decisions, and outcomes matter more than LLM output text.
  • Tamper-proofing requires cryptographic chaining or immutable storage — a log that can be edited is legally worthless in a regulatory investigation.
  • SupraWall generates logs with risk scores, session IDs, cost attribution, and integrity hashes on every entry — structured for both compliance submission and forensic investigation.

What Regulators Actually Need

Most engineering teams approach audit logging from the wrong direction. They ask "what should we log?" and answer it by logging whatever is convenient — typically LLM inputs and outputs, maybe some timestamps. This produces logs that satisfy developers and satisfy no one else.

Regulators investigating an AI incident ask a different set of questions: What did the agent do? What policy governed that action? Why was the action permitted or denied? Who was responsible? What was the cost? Can we verify these logs have not been altered? These questions have nothing to do with LLM output text and everything to do with the agent's actions in the world.

Article 12 of the EU AI Act specifies that logging capabilities must enable "automatic recording of events throughout the lifetime of the AI system." For the specific case of autonomous agents, the regulation's guidance makes clear that “events” include: input data, the AI system's output, the period of use, and any human oversight measures applied. For agents, “output” means tool calls executed, not text generated — the distinction is critical.

The technical documentation required under Article 11 must include a description of the logging system and demonstrate that it is capable of producing the evidence required for a conformity assessment. An audit log that is missing key fields, is not tamper-proof, or only covers a subset of agent actions will fail this assessment. The cost of getting logging wrong is not a warning — it is a prohibition on placing the system on the EU market.

The Anatomy of an Agent Audit Log

A complete agent audit log entry is not a print statement or a simple key-value record. It is a structured document that captures the full context of a single decision point in the agent's execution. Below is the canonical SupraWall audit log entry format, with annotations for each field:

SupraWall Audit Log Entry — JSON Format

{
  // Identity & Context
  "logId":          "log_01J8XK2M3N4P5Q6R7S8T9U0V1W",
  "sessionId":      "sess_01J8XK2M3N4P5Q6R7S8T9U0V1W",  // Ties all events in one agent run
  "agentId":        "billing-agent",
  "agentVersion":   "2.3.1",
  "userId":         "user_7f2a9c",                         // Who invoked the agent
  "organizationId": "org_acme_corp",

  // Action Details
  "toolName":       "stripe.charge",
  "toolVersion":    "stripe-python@8.1.0",
  "arguments": {
    "amount":       2400,
    "currency":     "usd",
    "customer_id":  "cus_NffrFeUfNV2Hib",
    "description":  "Annual plan renewal"
    // Note: no secrets, no PII beyond what's operationally required
  },

  // Policy Decision
  "decision":       "ALLOW",            // ALLOW | DENY | REQUIRE_APPROVAL | APPROVED | DENIED
  "policyId":       "pol_stripe_charge_allow",
  "reason":         "Tool within agent scope; amount $2,400 below $5,000 auto-approve threshold",
  "policyVersion":  "v4",

  // Risk Assessment
  "riskScore":      42,                 // 0-100; computed by SupraWall risk engine
  "riskFactors": [
    "financial_transaction",
    "external_api_call",
    "customer_data_access"
  ],
  "riskLevel":      "MEDIUM",           // LOW | MEDIUM | HIGH | CRITICAL

  // Cost Attribution
  "cost_usd":       0.0031,             // LLM inference cost for this decision step
  "tokens_used":    847,
  "model":          "gpt-4o",

  // Timing
  "timestamp":      "2026-03-19T14:32:07.412Z",
  "latency_ms":     187,                // End-to-end decision + execution latency

  // Result
  "outcome":        "SUCCESS",          // SUCCESS | FAILURE | TIMEOUT | CANCELLED
  "responseCode":   200,
  "responseBytes":  1240,

  // Tamper-Proof Chain
  "sequenceNumber": 47,                 // Monotonically increasing within session
  "previousHash":   "sha256:a3f4b2c1d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5",
  "integrityHash":  "sha256:b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5"
  // integrityHash = SHA256(all fields above + previousHash)
}

The key insight is that a complete audit entry captures the decision as much as the action. Knowing that the agent called stripe.charge is useful; knowing that it was allowed by policy pol_stripe_charge_allow version 4, with a risk score of 42, at 14:32:07 UTC, is what regulators and forensic investigators actually need.

Risk Scoring

Every action in the SupraWall audit trail carries a risk score between 0 and 100, computed by the SupraWall risk engine at the time of the policy decision. The risk score is not a post-hoc annotation — it is part of the policy evaluation and influences which policy branch is applied.

The risk engine evaluates each tool call across four dimensions:

DimensionWeightExamples
Action Reversibility40%Deletion (+40), Write (+20), Read (+0)
Scope of Impact25%External API (+25), Internal DB (+15), Read-only (+0)
Data Sensitivity20%PII/Financial (+20), Business Data (+10), Public (+0)
Volume / Scale15%Bulk (>100 records) (+15), Multi-record (+8), Single (+0)

Risk scores map to risk levels: 0–24 is LOW, 25–49 is MEDIUM, 50–74 is HIGH, and 75–100 is CRITICAL. Policy rules can reference risk levels directly, making it possible to write policies like “require approval for all CRITICAL actions regardless of tool” — a powerful catch-all that catches novel attack vectors that haven't been explicitly anticipated in the tool-level policy set.

In the audit trail, risk scores serve a second purpose beyond real-time policy enforcement: they make incident reconstruction significantly faster. During a forensic investigation, analysts can immediately filter the audit trail to HIGH and CRITICAL events, rather than reviewing thousands of low-risk read operations. The median incident investigation time drops from hours to minutes when risk scores are present.

Forensic Fields

Four fields in the SupraWall audit log exist specifically for forensic and compliance use cases. They are not operationally useful during normal agent execution — their value surfaces only during an investigation.

Integrity Hash

A SHA-256 hash of the entire log entry, computed at write time. Comparing the stored hash against a recomputation of the entry immediately reveals if any field has been modified after the fact. This is the primary tamper-detection mechanism.

Previous Hash Chain

Each entry's integrityHash is computed using the previousHash of the preceding entry. This creates a cryptographic chain — modifying any historical entry changes its hash, which cascades through all subsequent entries, making retroactive alteration detectable across the entire log history.

Sequence Number

A monotonically increasing integer within each session. Gaps in the sequence number indicate that log entries have been deleted. Combined with the hash chain, sequence numbers make it impossible to silently remove entries from the middle of a session log.

Risk Factors

The specific risk factors that contributed to the risk score for this entry. During investigation, risk factors explain why the system considered an action high-risk and whether the policy response was proportionate. These are the 'working shown' field of the risk score.

Tamper-Proof Storage

Cryptographic chaining at the log entry level provides tamper detection — it tells you if a log has been modified. But detection alone is not sufficient for compliance purposes. Regulators also require that the system prevents tampering, or at minimum makes tampering unambiguously attributable. This requires immutable storage at the infrastructure level.

SupraWall supports three storage backends designed for tamper-proof compliance archival:

01

AWS S3 with Object Lock (WORM)

Write-Once-Read-Many storage with Governance or Compliance mode locking. Compliance mode prevents deletion even by root users for the duration of the retention period. This is the gold standard for regulatory archival in AWS environments.

02

Google Cloud Storage with Retention Policies

Bucket-level retention policies prevent object deletion or modification before the retention period expires. Combined with bucket lock, the retention policy itself cannot be shortened, providing verifiable long-term immutability.

03

SupraWall Hosted Audit Store

SupraWall's managed audit store uses append-only storage with cryptographic anchoring to a public blockchain checkpoint every 24 hours. The checkpoint hash can be independently verified to prove the log state at any historical point in time.

Configuring Tamper-Proof Audit Storage

import suprawall

sw = suprawall.Client(api_key="sw_live_...")

# Configure immutable audit log archival
sw.audit.configure(
    # Primary storage: SupraWall hosted (append-only + blockchain anchoring)
    primary="suprawall_hosted",

    # Archive to AWS S3 with Object Lock for long-term retention
    archive={
        "backend": "s3",
        "bucket": "acme-ai-audit-logs",
        "region": "eu-west-1",
        "object_lock": True,
        "retention_days": 1095,     # 3 years
        "retention_mode": "COMPLIANCE",
    },

    # Export format for compliance submissions
    export_format="json_ld",        # Linked Data format for regulatory submissions

    # Cryptographic anchoring interval
    blockchain_checkpoint_hours=24,

    # Alert if log integrity verification fails
    integrity_alert_channel="slack",
    integrity_alert_slack="#security-alerts",
)

# Verify integrity of historical logs on demand
result = sw.audit.verify_integrity(
    session_id="sess_01J8XK2M3N4P5Q6R7S8T9U0V1W",
    from_sequence=1,
    to_sequence=47
)

print(f"Verified {result.entries_checked} entries")
print(f"Integrity: {result.status}")  # VALID or TAMPERED
print(f"Last checkpoint: {result.blockchain_anchor}")

EU AI Act Article 12 Compliance

EU AI Act Article 12 — Logging Requirements

Article 12(1) requires that high-risk AI systems have “capabilities enabling the automatic recording of events throughout the lifetime of the AI system.” The SupraWall audit trail satisfies this requirement through automatic instrumentation at the SDK level — no manual logging calls are required in agent code.

Article 12(2) specifies that logging capabilities shall ensure traceability of the AI system's functioning throughout its lifetime. The hash chain and sequence numbers in SupraWall audit logs provide cryptographic traceability — a continuous verifiable record from the first agent action to the most recent.

Article 12(3) adds specific requirements for certain biometric and critical infrastructure AI systems, including logging of the reference database used, input data, and operating periods. For general autonomous agent deployments, the SupraWall log format's arguments, sessionId, timestamp, and agentVersion fields map directly to these requirements.

SupraWall generates a compliance evidence report that cross-references each log field against the specific Article 12 sub-requirement it satisfies, formatted for direct inclusion in the technical documentation required under Article 11. This report can be generated on-demand for conformity assessments and audit requests.

Cost Attribution and Operational Intelligence

Beyond compliance, the SupraWall audit trail serves as the primary source of operational intelligence for agent deployments. The cost_usd and tokens_used fields on every entry enable per-session, per-agent, per-user, and per-task cost attribution — answering the question that every engineering leader asks: “how much are our agents actually costing us, broken down by what they're doing?”

The latency field enables performance regression detection — if a specific tool call or policy evaluation starts taking significantly longer than its historical average, the audit trail surfaces this before it becomes a user-facing incident. Combined with the sequence number, you can reconstruct the exact execution timeline of any agent session down to the millisecond.

Session-level aggregates are computed automatically by SupraWall and available via the API: total cost, total tokens, total actions, action breakdown by risk level, policy hit rate, approval rate, and denial rate. These aggregates power the SupraWall dashboard's governance reporting view, which is the primary interface for AI governance teams conducting ongoing oversight.

Frequently Asked Questions

What does EU AI Act Article 12 require for audit logging?

Article 12 of the EU AI Act requires high-risk AI systems to have logging capabilities that enable automatic recording of events throughout the system's lifetime. Logs must capture the period of each use, the reference database against which input data was checked, data that has led to the AI system giving a certain result, and any human oversight measures applied. For autonomous agents, this means logging every tool call, policy decision, and the context that led to each action.

What makes an AI agent audit log tamper-proof?

Tamper-proof audit logs use cryptographic chaining — each log entry contains a hash of the previous entry, creating a chain where modifying any historical entry invalidates all subsequent hashes. This makes retroactive tampering detectable. SupraWall additionally stores audit logs in append-only storage with cryptographic integrity verification, and can export to immutable storage services like AWS S3 with Object Lock or Google Cloud Storage with retention policies.

How long should AI agent audit logs be retained?

The EU AI Act does not specify a retention period in Article 12, but Article 18 requires providers of high-risk AI systems to keep documentation for at least 10 years after the system is placed on the market. For audit logs specifically, legal guidance generally recommends at minimum 3 years to cover typical statute of limitations periods for regulatory investigations. SupraWall supports configurable retention policies with automatic archival to cold storage.

Related Resources

Start Logging Now.

Get tamper-proof, Article-12-ready audit logs for every agent action in your fleet. No code changes to your agent logic — just wrap and deploy.