Implementing ISO 42001: A Practical Runbook for Engineers

Why ISO 42001 Is On Your Roadmap Whether You Want It Or Not

ISO/IEC 42001:2023 is the first international, certifiable management system standard for artificial intelligence. If you've ever been through ISO 27001, you already know the shape: a Plan-Do-Check-Act cycle wrapped around a set of controls, a statement of applicability, an internal audit, and eventually an external certification audit. ISO 42001 brings that same machinery to AI systems — covering how you build them, govern them, monitor them, and retire them.

Through 2025 it was an interesting thing to read about. By early 2026 it is becoming a buyer requirement. Enterprise customers — especially in regulated industries — are starting to add "ISO 42001 certified or on a path to certification" to their RFPs. The EU AI Act also explicitly recognizes ISO 42001 as presumptive evidence of conformity for several AIMS obligations. If you're a startup building AI features or selling to enterprises, this standard is no longer optional to understand.

This post is the engineering side of an ISO 42001 implementation. We leave the paperwork to your compliance team where we can, and focus on what you, the people who run the systems, actually have to build.

What The Standard Actually Requires

ISO 42001 defines an AI Management System (AIMS). At a high level:

Clauses 4-10 describe the management system itself — context, leadership, planning, support, operation, evaluation, improvement. This is the "ISO-shaped" skeleton identical to 27001.
Annex A lists 38 controls covering AI-specific concerns: policies for AI, internal organization, resources (data, tools, people), impact assessments, AI system lifecycle, data for AI, information for interested parties, use of AI systems, and third-party relationships.
Annex B is implementation guidance for those controls.
Annex C maps AI objectives and risk sources.
Annex D covers sector-specific considerations.

The certification auditor will want to see evidence that each applicable Annex A control is implemented, maintained, and effective. Your job as engineers is to generate that evidence automatically wherever possible — because the alternative is PDFs.

How It Maps To ISO 27001

If you already run a 27001 program, roughly 40-50% of the machinery carries over. We've done the overlap analysis for several clients and here is the rough picture:

ISO 27001 control area	ISO 42001 equivalent	Reuse %
Access control	A.4 Resources (access to AI systems/data)	~80%
Cryptography	Applies to model weights, training data	~90%
Supplier relationships	A.10 Third-party relationships	~70%
Incident management	A.9 AI system lifecycle - incidents	~60%
Change management	A.6 AI system impact assessment	~40%
Risk assessment	A.5 Assessing impacts of AI systems	~30%

The areas with the lowest reuse are the ones unique to AI: impact assessments on people and groups, bias and fairness considerations, model lifecycle management, and explainability. Plan to invest real engineering time there.

Step 1: Scoping The AIMS

The first question the auditor will ask is: what AI systems are in scope? The answer can't be "everything" or "nothing." It has to be a defined, defensible set of systems with boundaries you can document.

We recommend an inventory approach. Build a model registry — yes, an actual piece of software, not a spreadsheet — and treat it as the source of truth. Every AI system in scope has a record. Here is a minimal schema:

# models/claude-rag-assistant.yaml
id: claude-rag-assistant
name: Customer Support RAG Assistant
owner: team-ai-platform
contact: [email protected]
deployment:
  environment: production
  regions: [eu-west-1]
  endpoint: https://api.example.com/v1/assistant
classification:
  purpose: customer-support
  data_sensitivity: pii
  risk_tier: high
  eu_ai_act_tier: limited
  affected_parties: [customers, support-agents]
model:
  type: llm-rag
  base_model: claude-sonnet-4.6
  provider: anthropic
  version_pinned: true
  weights_location: vendor-hosted
data:
  training_sources: none-fine-tuning
  rag_sources:
    - name: support-kb
      location: s3://example-kb/support/
      pii: false
    - name: ticket-history
      location: postgres://prod-rds/tickets
      pii: true
      retention_days: 365
governance:
  impact_assessment_id: IA-2026-014
  last_reviewed: "2026-01-15"
  approved_by: [cto, dpo]
lifecycle:
  stage: production
  decommission_date: null

Every model in the registry triggers a downstream workflow: an impact assessment if one doesn't exist, a data-governance check, logging configuration, monitoring dashboards, and an audit trail. A human signs off before the record can move to stage: production.

Step 2: Risk And Impact Assessment

ISO 42001 takes risk assessment further than 27001. You assess risks not only to the organization but to individuals and society affected by the AI system. Annex B.5 gives you the framework: identify the AI system, describe its context, identify impacts (positive and negative), assess severity and likelihood, and document the decision to proceed, mitigate, or decline.

A lightweight impact assessment template:

# Impact Assessment: <System Name>

## 1. System context
- Purpose and intended users
- Operating environment
- Data flows

## 2. Affected parties
- Direct users
- Third parties
- Vulnerable populations

## 3. Potential impacts
- Privacy
- Fairness / discrimination
- Safety
- Financial
- Autonomy and human oversight

## 4. Likelihood and severity
- Matrix scored 1-5 each

## 5. Mitigations
- Technical
- Procedural

## 6. Residual risk and decision
- Accept / mitigate / decline
- Approver and date

Store these as markdown in a compliance repo that is read-only to everyone except a small group, version-controlled, and referenced by model ID from the registry.

Step 3: Controls As Code

The 38 Annex A controls are the thing an auditor will trace. You can meet most of them with paperwork, but the faster path is to express as many as possible as machine-checkable policy. We use OPA with Rego for this. It runs in CI, in admission controllers, and as a standalone check against the model registry.

A policy that blocks any model from going to production without an impact assessment and DPO sign-off:

package ai.model_registry

default allow := false

deny contains msg if {
  input.deployment.environment == "production"
  input.classification.data_sensitivity == "pii"
  not input.governance.impact_assessment_id
  msg := sprintf("model %q handles PII but has no impact assessment", [input.id])
}

deny contains msg if {
  input.deployment.environment == "production"
  input.classification.risk_tier == "high"
  not "dpo" in input.governance.approved_by
  msg := sprintf("high-risk model %q missing DPO approval", [input.id])
}

deny contains msg if {
  input.lifecycle.stage == "production"
  not input.governance.last_reviewed
  msg := sprintf("model %q has no review date", [input.id])
}

allow if {
  count(deny) == 0
}

Running in CI against every PR to the models/ directory:

- name: Validate model registry
  run: |
    for file in models/*.yaml; do
      yq -o=json "$file" | \
        opa eval -I -d policies/ \
          "data.ai.model_registry.deny" --format=pretty
    done

Every merge that changes a model record produces an audit entry. The auditor gets a Git log.

Step 4: Data Governance For Training And RAG

Annex A control A.7 covers data for AI: quality, provenance, preparation, and ongoing management. For an LLM-based application this usually means:

Training data, if you fine-tune — source, license, consent, PII scrubbing.
RAG corpus — what documents are indexed, how they are updated, how they are removed on user request.
Prompt and response logs — what you capture, how long you keep it, who can read it.

Lock down the S3 bucket that holds the RAG corpus with object lock, versioning, and access logging. Terraform:

resource "aws_s3_bucket" "ai_corpus" {
  bucket = "example-ai-corpus-prod"
}

resource "aws_s3_bucket_versioning" "corpus" {
  bucket = aws_s3_bucket.ai_corpus.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_object_lock_configuration" "corpus" {
  bucket = aws_s3_bucket.ai_corpus.id
  rule {
    default_retention {
      mode = "GOVERNANCE"
      days = 365
    }
  }
}

resource "aws_s3_bucket_logging" "corpus" {
  bucket        = aws_s3_bucket.ai_corpus.id
  target_bucket = aws_s3_bucket.audit_logs.id
  target_prefix = "s3-access/ai-corpus/"
}

resource "aws_cloudtrail" "ai_data_events" {
  name                          = "ai-data-events"
  s3_bucket_name                = aws_s3_bucket.audit_logs.id
  include_global_service_events = false

  event_selector {
    read_write_type           = "All"
    include_management_events = true

    data_resource {
      type   = "AWS::S3::Object"
      values = ["${aws_s3_bucket.ai_corpus.arn}/"]
    }
  }
}

That CloudTrail setup gives you per-object access logs. Feed them into your SIEM and you have a defensible evidence trail for the auditor.

Step 5: Model Lifecycle And Change Management

Every material change to a model in production triggers an evaluation. Upgrading the base model from Claude Sonnet 4.5 to Sonnet 4.6 is a change. Swapping the embedding model is a change. Adding a new tool to an agent is a change. Each of these must go through:

A change record (PR to the model registry).
An updated impact assessment if the purpose, users, or data sensitivity changed.
A regression eval against a saved benchmark.
A canary deployment with monitoring.
Approval and rollout.

The eval step is the one most teams skip. Build it now. A minimal Python harness using LangSmith, Promptfoo, or your own scripts is fine. The goal is: "does the new model still produce acceptable outputs on our golden dataset?"

Step 6: Performance Monitoring And Incident Response

A.9 covers operational monitoring and incident handling. You need:

Quality metrics — task completion, hallucination rate, user feedback signals.
Safety metrics — refusal rate, policy violations, jailbreak detection.
Performance metrics — latency, token usage, cost.
Drift detection — input distribution shifts over time.

Wire all of these into the same observability stack you use for regular services. OpenTelemetry spans for each LLM call, tagged with the model ID, user ID (hashed), input and output token counts, and a quality signal.

Incident response is the other half. When a model misbehaves — produces a harmful output, leaks a prompt, violates a policy — you need a runbook that triggers within minutes. It should look like any other incident runbook: on-call engineer paged, initial triage, containment (disable the model or route to a fallback), investigation, postmortem. The only AI-specific piece is that the postmortem may require a dataset review and a re-evaluation before the model goes back online.

Step 7: Internal Audit

ISO requires an internal audit before certification. Don't treat it as a formality. Pick an engineer who wasn't on the implementation team, give them the Statement of Applicability, and have them trace each applicable control to the evidence. Wherever they can't find automated evidence, you have a gap. Fix it, then book the external audit.

Gap Analysis Checklist

A short checklist we use in client engagements to triage where you are:

If fewer than 7 of those are true today, you have 4-6 months of work ahead of certification readiness.

Next Steps

ISO 42001 is significantly more tractable than it looks on first read, especially for teams that already have ISO 27001 or SOC 2. The trick is to treat the controls as engineering problems, not documentation problems — build a registry, wire up policy-as-code, automate your evidence, and the paperwork takes care of itself. If you want a hand standing up the machinery or running a gap assessment, get in touch.

filed under

iso-42001aicompliancegovernance