Why August 2026 Is A Real Deadline
The EU AI Act entered into force in August 2024. Provisions for prohibited practices kicked in February 2025. General-purpose AI model obligations started in August 2025. And the full set of high-risk system requirements becomes enforceable on 2 August 2026. That is the date most product teams should be paying attention to, because it is the one that imposes the biggest engineering burden.
This post is not legal advice. We have lawyers for that; you should too. What this post covers is what the regulation asks of software systems in language an engineer can act on, and how to implement the common obligations without turning your codebase into a compliance wasteland.
The Four Risk Tiers
The Act classifies AI systems into four tiers, each with different obligations.
| Tier | Definition | What you must do |
|---|---|---|
| Unacceptable | Social scoring, real-time biometric ID in public, manipulative systems | Prohibited — don't build it |
| High-risk | Listed use cases in Annex III (employment, credit, education, critical infra, law enforcement, etc.) | Full set of requirements — documentation, oversight, logging, monitoring |
| Limited risk | Chatbots, generative AI, emotion recognition | Transparency obligations |
| Minimal risk | Everything else | Voluntary codes of conduct |
Most startup teams will fall into one of two categories: limited-risk for a customer-facing LLM feature, or high-risk if you touch HR, credit, education, or other Annex III domains. If you are in any doubt, err toward high-risk.
High-Risk: What It Actually Demands
A high-risk AI system under Article 9-15 must implement, at minimum:
- A risk management system (ongoing, documented, updated as the system evolves).
- Data and data governance practices — quality, representativeness, bias analysis.
- Technical documentation sufficient to allow an auditor to assess conformity.
- Record-keeping (automated logs throughout the lifecycle).
- Transparency and information to deployers.
- Human oversight — the system must be designable for meaningful human supervision.
- Accuracy, robustness, cybersecurity.
- A quality management system.
- Post-market monitoring.
- Incident reporting for serious incidents.
Let's walk through the ones that show up in code.
Record-Keeping: Prompt And Response Logging
Article 12 mandates automatic logging throughout the operation of a high-risk AI system. For an LLM application this means every prompt, every response, every tool call, every decision. The logs must allow tracing the system's behavior back to inputs.
A minimal logging middleware for a FastAPI app:
from __future__ import annotations
import hashlib
import json
import time
import uuid
from datetime import datetime, timezone
from typing import Any
from fastapi import FastAPI, Request
from pydantic import BaseModel, Field
import structlog
logger = structlog.get_logger()
app = FastAPI()
class LLMInteraction(BaseModel):
interaction_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
timestamp_utc: str = Field(
default_factory=lambda: datetime.now(timezone.utc).isoformat()
)
system_id: str
system_version: str
model: str
model_version: str
user_id_hash: str
input_hash: str
input_tokens: int
output_tokens: int
latency_ms: int
safety_flags: list[str] = []
human_review_requested: bool = False
def hash_user_id(raw: str) -> str:
salt = b"eu-ai-act-log-salt-v1"
return hashlib.sha256(salt + raw.encode()).hexdigest()[:16]
@app.middleware("http")
async def log_llm_interaction(request: Request, call_next):
start = time.monotonic()
response = await call_next(request)
latency = int((time.monotonic() - start) * 1000)
if request.url.path.startswith("/v1/generate"):
interaction = LLMInteraction(
system_id="hr-screening",
system_version=request.app.state.version,
model="claude-sonnet-4.6",
model_version="20260115",
user_id_hash=hash_user_id(request.headers.get("x-user-id", "anon")),
input_hash=request.state.input_hash,
input_tokens=request.state.input_tokens,
output_tokens=request.state.output_tokens,
latency_ms=latency,
safety_flags=getattr(request.state, "safety_flags", []),
)
logger.info("llm.interaction", **interaction.model_dump())
return response
Two notes. First, we hash user identifiers with a salt — you are not required to pseudonymize in the logs themselves, but you almost always want to for GDPR reasons. Second, we log token counts rather than raw content. The Act requires record-keeping but not necessarily full content retention; you can store hashes or summaries if you have a policy that says so. Pick one and document it.
Retention
There is no single retention number in the Act. Recitals and Article 19 require the provider to keep logs "appropriate for the purpose" and "for a period appropriate to the intended purpose" — generally at least six months, often longer. We default clients to 12 months in hot storage, 5 years in cold storage, with documented deletion on user request for personal data components.
Technical Documentation: The Model Card
Annex IV lists what the technical documentation must cover. For a model-backed system the most efficient format is a model card kept in version control next to the code.
A Pydantic model registry entry that doubles as a model card:
from __future__ import annotations
from datetime import date
from typing import Literal
from pydantic import BaseModel, Field, HttpUrl
class IntendedPurpose(BaseModel):
description: str
deployers: list[str]
geographic_scope: list[str]
not_intended_for: list[str] = []
class DataGovernance(BaseModel):
training_data_sources: list[str]
training_data_licensing: list[str]
bias_evaluation: str
bias_metrics: dict[str, float]
pii_handling: str
class HumanOversight(BaseModel):
oversight_type: Literal["human-in-the-loop", "human-on-the-loop", "human-in-command"]
override_mechanism: str
escalation_path: str
class PostMarketMonitoring(BaseModel):
metrics: list[str]
alerting: str
review_cadence_days: int
class ModelCard(BaseModel):
system_id: str
name: str
provider: str
version: str
risk_tier: Literal["unacceptable", "high", "limited", "minimal"]
annex_iii_category: str | None = None
intended_purpose: IntendedPurpose
data_governance: DataGovernance
human_oversight: HumanOversight
monitoring: PostMarketMonitoring
known_limitations: list[str]
accuracy_estimate: float = Field(ge=0, le=1)
robustness_notes: str
last_reviewed: date
reviewed_by: list[str]
documentation_url: HttpUrl
Store one of these per system in a model-cards/ directory. CI validates that every system referenced by production code has a card, and that the card has been reviewed in the last 180 days.
Human Oversight: Designing For It
Article 14 requires that high-risk systems can be effectively overseen by natural persons. For developers, this usually reduces to three questions:
- Can the human see what the system is doing and why?
- Can the human intervene?
- Can the human override the output before it has an effect?
Concretely: if your system auto-rejects job applications, it cannot be a fire-and-forget pipeline. A human must see the decision, see the reasoning, and be able to override it before the candidate is notified.
A simple pattern: the AI produces a recommendation with a confidence and a reason; the workflow queues it for human review; only after explicit approval is the action taken.
class Recommendation(BaseModel):
candidate_id: str
decision: Literal["proceed", "reject", "uncertain"]
confidence: float
reasoning: str
model_version: str
generated_at: str
class ReviewOutcome(BaseModel):
recommendation: Recommendation
reviewer_id: str
reviewer_decision: Literal["approved", "overridden", "escalated"]
reviewer_notes: str
reviewed_at: str
The auditor will expect to see examples of overridden outcomes. If 100% of recommendations are approved as-is, either your model is perfect or your oversight is theater.
Transparency: What The User Sees
Article 52 covers transparency obligations for limited-risk systems. The key ones:
- Users must be informed they are interacting with an AI system (unless obvious).
- AI-generated or manipulated content (deepfakes, synthetic images) must be labeled.
- Emotion recognition and biometric categorization users must be informed.
- General-purpose AI models must watermark synthetic outputs where technically feasible.
In practice, this is a small UI change. A persistent badge near the chat interface: "You're talking to an AI assistant. Responses may be inaccurate." For generated images: a metadata tag and a visible indicator.
Deployment Gating In CI
The cleanest way to enforce this in day-to-day engineering is a CI check that refuses to promote code to production if the associated model card is missing, stale, or has unresolved gaps.
name: AI Compliance Gate
on:
pull_request:
paths:
- "apps/**"
- "model-cards/**"
jobs:
gate:
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install pydantic==2.9 pyyaml==6.0
- name: Validate model cards
run: python scripts/validate_model_cards.py
- name: Check referenced systems have cards
run: python scripts/check_coverage.py
- name: Enforce staleness
run: python scripts/check_staleness.py --max-days 180
And the validation script is a few lines:
from __future__ import annotations
import pathlib
import sys
from datetime import date, timedelta
import yaml
from compliance.model_card import ModelCard
root = pathlib.Path("model-cards")
max_age = timedelta(days=180)
today = date.today()
errors: list[str] = []
for path in root.glob("*.yaml"):
raw = yaml.safe_load(path.read_text())
try:
card = ModelCard.model_validate(raw)
except Exception as exc:
errors.append(f"{path}: {exc}")
continue
if today - card.last_reviewed > max_age:
errors.append(f"{path}: stale, last reviewed {card.last_reviewed}")
if card.risk_tier == "high" and not card.annex_iii_category:
errors.append(f"{path}: high-risk card missing Annex III category")
if errors:
print("Compliance gate failed:")
for e in errors:
print(f" - {e}")
sys.exit(1)
print(f"All {len(list(root.glob('*.yaml')))} model cards valid.")
Post-Market Monitoring
Article 72 requires a post-market monitoring plan — not just metrics, but a documented process for how you collect, analyze, and act on real-world performance data. Your existing observability stack is 80% of the answer. The other 20% is a scheduled review where a named person looks at the data, writes down what they saw, and files it. Quarterly is the floor; monthly is better for newer systems.
Serious incidents — where the system causes harm or a near-miss — must be reported to the national authority within 15 days (Article 73). Your incident runbook needs a branch for this: "is this reportable under the EU AI Act?" If yes, the compliance team gets paged alongside engineering.
A Pragmatic Readiness Checklist
- All AI systems classified into risk tiers
- High-risk systems have model cards covering Annex IV
- Prompt/response logging in place with defined retention
- Human oversight designed into every high-risk workflow
- Transparency disclosures in UI for limited-risk systems
- CI gate preventing deployment of uncarded systems
- Incident runbook includes AI Act reporting branch
- Post-market monitoring plan documented and scheduled
Next Steps
The August 2026 deadline is close enough that product teams should be treating this as a this-quarter problem, not a this-year problem. The implementation pattern is not exotic — it's a registry, some middleware, a CI gate, and a review cadence — but it is boring enough that it will not get done unless it's scheduled. Book the work now. If you want help mapping your systems to the Act and building the gating pipeline, get in touch.