The Governance Advantage: Trustworthy AI at Planet Scale

The Governance Advantage: Trustworthy AI at Planet Scale

Introduction: a new fiduciary dilemma

Artificial intelligence has become an operating system for the modern economy. Large language models write emails and code, recommender systems steer our consumption, and pattern‑recognition engines decide who gets a loan or a job. The boardroom challenge is not whether to use AI but how to govern it. Directors must juggle three forces: the imperative to reduce cost and improve performance; the duty to protect people and comply with law; and the expectation that technology reflect societal values. Regulatory regimes have sharpened these duties. The European Union’s AI Act requires providers of high‑risk systems to perform risk assessments, use high‑quality datasets, maintain logs and documentation, ensure human oversight and guarantee robustness and cybersecurity[1]. It also mandates post‑market monitoring—providers must collect and analyse real‑world performance data, report serious incidents and register high‑risk systems[2][3]. Global principles align with these obligations: the OECD AI Principles (adopted by 47 jurisdictions) call for inclusive growth, human rights, transparency, robustness and accountability[4][5], and the 2024 update emphasises safety, privacy and interoperability[6][7].

Meanwhile, AI’s economics are shifting dramatically. According to the Stanford AI Index 2025, the cost to generate text at GPT‑3.5 accuracy plummeted from $20 per million tokens in November 2022 to $0.07 by October 2024—a 280‑fold reduction[8][9]. Hardware price–performance improves roughly 30 % per year and energy efficiency about 40 % per year[8][10], while smaller models achieve frontier‑grade accuracy (Phi‑3‑mini matches PaLM with one‑fourteenth the parameters[11]). The result is that organisations can deploy powerful models cheaply—but with great risk if oversight lags behind. AI incidents are increasing, and standardised evaluations remain rare[12][13]. Boards cannot assume that lower costs justify cutting corners on safety.

This paper presents a board‑grade framework for governing AI at hyperscale. It translates normative principles, regulatory requirements and technical best practices into measurable Service Level Objectives (SLOs) and provides case lessons from hyperscalers. The goal is to make trust auditable and actionable, enabling organisations to innovate responsibly at planet scale.

From principles to practice: building an AI governance stack

Aligning with global norms

The governance stack begins with aligning the organisation’s AI charter to widely accepted values. The OECD AI Principles—inclusive growth, human rights, transparency, robustness and accountability—are now a global baseline[4][5]. The World Economic Forum’s Digital Trust framework stresses ethical use, user rights and system resilience. UNESCO’s recommendations add cultural diversity and environmental sustainability. By embedding these norms in corporate policy, boards communicate that AI will be used to enhance society rather than merely to cut costs.

Operationalising risk management

Norms alone do not mitigate risk. Boards need structured frameworks. The NIST AI Risk Management Framework (AI‑RMF) introduces four functions—map, measure, manage and govern. It emphasises quantitative and qualitative assessment of risks before deployment and regular evaluation afterwards[14]. Measurements must be objective, repeatable and scientifically sound[15]. The UK’s Department for Science, Innovation & Technology (DSIT) crystallises assurance into a measure–evaluate–communicate pattern: organisations must gather performance and impact data, benchmark risks against standards, and share results through dashboards or certification[16]. DSIT cautions that no single technique suffices; effective assurance combines risk assessments, bias audits, compliance audits, formal verification and conformity assessments[17][18].

Institutionalising governance via management systems

Governance frameworks should be integrated into organisational processes. ISO/IEC 42001, released in late 2023, is the first certifiable AI management system. It applies to any organisation using AI and aligns with information‑security (ISO 27001) and privacy (ISO 27701) standards[19]. ISO 42001 requires leadership commitment, identification of stakeholder needs (Clause 4), measurable AI objectives and risk assessments (Clause 6) and continuous performance evaluation through internal audits and management reviews (Clause 9)[20][21]. Certification signals to regulators and partners that AI governance is rigorous and externally audited.

Meeting regulatory duties

Regulators are setting the floor. The EU AI Act categorises systems by risk; high‑risk applications—such as employment, education, critical infrastructure and law enforcement—must implement comprehensive risk management, maintain documentation and logs, provide human oversight and guarantee robustness and cybersecurity[1]. Providers must establish post‑market monitoring systems to collect and analyse real‑world performance[2] and report serious incidents[3]. Globally, the OECD principles and the 2024 update stress safety, privacy and interoperability[6][7], while the UK’s AI White Paper applies cross‑cutting principles (safety, transparency, fairness, accountability and contestability). Boards operating across borders should adopt the strictest applicable standard to avoid fragmentation.

Building an assurance ecosystem

Assurance is no longer optional. The UK’s AI assurance roadmap envisions a market of third‑party auditors and service providers. DSIT emphasises that assurance must draw on multiple techniques and be proportionate to risk[17][18]. Standards underpin this ecosystem: ISO 22989 (terminology), ISO 42006 (interfaces), ISO TR 24027 (bias measurement), ISO 42001 (management systems) and other performance requirements[22]. Boards should engage independent auditors to conduct bias audits, fairness assessments, security evaluations and energy efficiency tests. Transparency frameworks like CLeAR (Comparable, Legible, Actionable, Robust) mandate that documentation of datasets, models and systems be mandatory, enabling reproducibility and accountability[23][24]. The Partnership on AI’s deployment guidance categorises risks by capability and release type, providing a playbook for customised governance[25].

These layers—principles, risk frameworks, management systems, regulation and assurance—form the AI governance stack. The next step is to make these layers measurable.

Designing SLOs for trustworthy AI

Financial performance has long been managed through key performance indicators (KPIs). Service Level Objectives (SLOs) perform a similar role for AI trust, translating ethical imperatives into measurable targets. Each SLO should link to a control (e.g., ISO 42001 clause), a risk management function (e.g., NIST AI‑RMF) and an assurance artefact (e.g., documentation or audit). Below is a consolidated set of SLO categories; boards should adapt thresholds to context and risk appetite.

Safety and integrity

1.        Harm rate (safety loss): number of policy‑violating outputs per million interactions. Formula: harmful outputs divided by one million. It ties to ISO 42001 risk assessments and the EU AI Act’s duty to minimise harm[1]. Red‑team reports and user feedback logs evidence compliance.

2.        Hallucination rate (reliability): percentage of incorrect factual claims. This measures reliability and should trend downward quarter‑over‑quarter. Independent fact‑checking audits provide assurance.

3.        Incident mean time to recovery (MTTR): average hours to resolve safety incidents from detection to remediation. Rapid recovery demonstrates operational maturity and meets post‑market monitoring obligations[26].

Performance and reliability

1.        p95 latency and p99 availability: the 95th percentile of response times and 99th percentile uptime. Boards may require p95 latency below 300 ms and p99 availability above 99.9 %. MLPerf benchmarks inform these thresholds and provide independent verification.

2.        Task quality: model accuracy relative to state‑of‑the‑art benchmarks (e.g., MMLU, GPQA). Open‑weight models have narrowed the performance gap with closed models from 8 % to 1.7 %[8][27]; boards can benchmark accordingly.

3.        Change failure rate: proportion of model updates requiring rollback due to degraded performance or safety issues. This measure aligns with ISO 42001 Clause 8 on operational control and change management.

Cost and sustainability

1.        Unit inference cost: cost per 1 000 tokens or per API call. As inference costs fall, boards can set ceilings and target continuous improvement. The AI Index’s 280‑fold cost drop[8][9] provides a benchmark.

2.        Energy per request (J/req) and carbon intensity (gCO₂e/req): energy used and associated emissions per inference. MLPerf Power measures energy consumption across compute, memory and cooling[28][29]. Boards should set energy budgets and track improvements using quantisation and hardware optimisation.

3.        Expected cost of harm: modelled cost of safety failures, including fines, legal liability and brand damage. Scenario analysis can estimate this cost; regulatory reporting requirements make it more salient[3].

Transparency and accountability

1.        Documentation coverage: percentage of mandated artefacts—model cards, data sheets, evaluation plans, red‑team reports, post‑market monitoring plans—completed. CLeAR requires documentation to be mandatory and transparent[23][24].

2.        Third‑party assurance: maintain current ISO 42001 certification and no major non‑conformities. Regular bias and fairness audits ensure accountability.

Privacy and rights

1.        Data retention: maximum days to retain personal data without legal justification. Align with privacy laws and human rights principles[4].

2.        Training data provenance: percentage of training data with documented licensing or consent. ISO 42001 mandates that impact assessments include data provenance[20].

Risk management and change control

1.        Red‑team coverage: proportion of identified risks and abuse cases addressed through adversarial testing. This links to NIST AI‑RMF’s emphasis on measurement and independent review[14].

2.        Risk closure SLA: maximum days to remediate identified risks. The EU AI Act’s post‑market monitoring requires prompt corrective actions[26].

By adopting these SLOs, boards make trust measurable. Thresholds should be reviewed quarterly and adjusted based on industry benchmarks, regulatory developments and the organisation’s risk appetite.

Planet‑scale economics: hidden costs and trade‑offs

Falling inference costs and rising expectations

The dramatic reduction in inference costs has two effects. First, it lowers the barrier to entry: start‑ups can deploy models previously reserved for tech giants. Second, it compresses margins, making efficiency a competitive advantage. The AI Index estimates that the cost per million tokens has fallen from $20 to $0.07[8][9], while hardware price–performance improves by 30 % and energy efficiency by 40 % annually[8][10]. Smaller models—like Phi‑3‑mini—match the accuracy of massive models at a fraction of the parameter count[11]. However, lower costs do not eliminate the need for rigorous assurance. As AI becomes ubiquitous, any failure can scale instantly to millions of users.

Energy budgets and environmental impact

AI’s environmental footprint is significant. Training GPT‑3 emitted about 588 tonnes of CO₂e, GPT‑4 roughly 5 184 tonnes, and Llama 3.1 405B about 8 930 tonnes[30]. MLPerf Power shows that scaling up accelerators reduces training time but increases energy consumption due to communication overheads; optimising for 99.9 % accuracy instead of 99 % can double energy use[31][32]. Quantisation and mixed‑precision techniques can mitigate this, but boards must set and monitor energy budgets. Measuring J/req and gCO₂e/req using MLPerf Power methodologies[28][29] integrates environmental sustainability into governance.

The Pareto frontier: balancing cost, safety and performance

Improvements along one dimension often come at the expense of another. A single large model may maximise accuracy but increase latency and energy consumption; a smaller model may reduce cost but raise hallucination rates; safety filtering may reduce harm but slow responses. SLOs enable boards to navigate this frontier: for example, by routing 80 % of traffic to small models and sending only high‑risk queries to larger models, organisations can improve safety and performance while controlling cost. Continuous post‑market monitoring ensures that trade‑offs remain transparent and adjustments are data‑driven.

The hidden cost of harm

If boards focus solely on inference costs, they risk overlooking the expected cost of harm. Bias, discrimination and misinformation can trigger fines, legal liability and reputational damage. The EU AI Act mandates reporting of serious incidents[3], and authorities may impose penalties or suspend systems. Scenario analysis can estimate the financial impact of failures; investing in red‑team testing and bias audits may reduce long‑term harm and insurance premiums. Quantifying harm costs provides a counterweight to low inference costs and justifies investment in safety.

Regulation and standards: turning rules into operations

EU AI Act: a risk‑based compliance regime

The EU AI Act is the world’s most comprehensive regulation. It classifies systems by risk level: unacceptable applications (e.g., social scoring) are prohibited; high‑risk systems—including critical infrastructure, employment, education, law enforcement and essential private services—must implement a risk management system, high‑quality datasets, logs and documentation, transparency to users, human oversight and robustness[1]. Providers must establish post‑market monitoring to collect and analyse lifecycle performance[2] and report serious incidents[3]. Article 72 requires that monitoring plans be documented and integrated with quality management[26]. Boards should ensure that high‑risk systems are registered and that incident reporting protocols are clear.

ISO/IEC 42001: a certifiable management system

The ISO/IEC 42001 standard provides a management‑system framework. Its clauses mirror other ISO standards, simplifying integration. Boards should ensure the organisation identifies stakeholder needs (Clause 4), commits leadership (Clause 5), sets measurable objectives and conducts AI impact and risk assessments (Clause 6), manages operational change (Clause 8) and conducts regular audits and management reviews (Clause 9)[20][21]. Certification evidences due diligence and can be a differentiator in procurement.

UK guidance and global alignment

The UK’s DSIT guidance encourages a pro‑innovation approach; it emphasises measuring, evaluating and communicating AI system performance and impacts[16]. It lists assurance techniques—risk assessments, impact assessments, bias audits, formal verification and conformity assessments—and stresses that assurance must be proportionate to context and risk[17][18]. The OECD principles and their 2024 update highlight safety, privacy and global interoperability[6][7]. Boards operating globally should adopt these as a compliance floor, mitigating regulatory fragmentation. Participating in international standard‑setting and industry consortia can influence emerging norms and reduce future compliance costs.

Lessons from hyperscale

Content moderation at scale

A major social platform deployed a unified language model to detect harmful content across billions of posts. Initially, a single large model delivered high accuracy but incurred high latency and cost. The board defined SLOs: harm rate ≤ 50 per million interactions, p95 latency ≤ 300 ms and unit cost ≤ £0.005 per 1 000 interactions. Engineers implemented a two‑tier routing strategy, using a small model for routine traffic and escalating ambiguous content to a larger model. Gated rollouts and shadow deployment allowed testing on a subset of users before full release. A dedicated red‑team simulated adversarial attacks and tested evasion strategies. Within two quarters, the harm rate dropped by 60 %, p95 latency improved by 30 %, and unit cost fell by 40 %. Mean time to recovery for safety incidents decreased to 20 hours due to an around‑the‑clock safety operations centre and automated rollback. Quarterly board reports included metrics and red‑team findings, enabling data‑driven adjustments.

Fraud detection copilot

A global bank introduced an AI copilot to assist fraud analysts. The board required false positives ≤ 5 % and p95 latency ≤ 250 ms. Privacy SLOs limited data retention to 30 days and mandated 100 % documented provenance for training data. The system incorporated human‑in‑the‑loop review: analysts could override AI decisions, and these overrides fed back into model retraining. The bank pursued ISO/IEC 42001 certification and engaged independent auditors for quarterly bias audits. After one year, false positives fell by 15 % relative to the previous system, and analyst productivity improved. No major non‑conformities were identified in surveillance audits. Regulators lauded the bank’s proactive risk management.

Automotive edge vision

An automotive supplier developed vision models for semi‑autonomous driving. The board defined SLOs for p95 latency and energy per frame, aligning with ISO 26262 functional safety standards. Using MLPerf Automotive benchmarks, engineers evaluated models with pruning, quantisation and knowledge distillation. Quantised models delivered equivalent accuracy with 50 % lower energy consumption[32], meeting energy budgets derived from MLPerf Power methodology. Comprehensive documentation—including model cards, data sheets and monitoring plans—facilitated regulatory approval and reassured automotive OEM customers. Post‑deployment, the company captured on‑road performance data and incident reports, updating models through over‑the‑air updates governed by change‑control SLOs.

Public‑sector genAI deployment

A government agency deployed generative AI to summarise policy documents and draft responses. Procurement followed the Ada Lovelace Institute’s guidance on buying AI responsibly, requiring suppliers to provide evidence of bias audits, privacy assessments and adherence to ISO 42001. The agency used the CLeAR documentation framework to produce model cards, data sheets, evaluation plans, red‑team reports and post‑market monitoring plans[23]. A human‑in‑the‑loop process ensured that all high‑impact communications were reviewed by civil servants. Documentation coverage reached 100 %, enabling independent auditors to verify compliance. Quarterly transparency reports enhanced public trust and demonstrated alignment with democratic values.

These cases illustrate that by defining SLOs, adopting routing and gating strategies, integrating human oversight, and engaging independent assurance, organisations can improve safety and performance while controlling cost. They also show that documentation and monitoring enable continuous improvement and regulatory compliance.

The governance advantage: a board agenda for trust

Trustworthy AI is not simply a regulatory burden; it is a competitive differentiator. Organisations that embed governance into strategy can innovate more rapidly, attract talent and capital, and mitigate costly failures. Boards should adopt a three‑line‑of‑defence model: (1) management builds and operates AI systems, (2) risk and compliance functions set policy and monitor adherence, and (3) internal audit and external assurance validate effectiveness. ISO 42001 and DSIT guidance provide structures for this model.

The following board agenda turns this into action:

1.        Set and approve SLOs: Establish harm‑rate, hallucination‑rate, latency, availability, cost, energy, documentation and privacy targets. Review dashboards each quarter; require justification for threshold changes; link executive incentives to SLO performance.

2.        Review post‑market monitoring and incidents: Ensure that high‑risk systems have monitoring plans (EU AI Act Article 72)[26]. Require regular reports on fairness metrics, user feedback and incidents. Validate that serious incidents are reported to authorities[3] and that corrective actions are timely.

3.        Commission independent assurance: Engage accredited auditors to conduct bias and fairness audits, security assessments, energy evaluations and management‑system audits. Require ISO 42001 certification and remediate findings.

4.        Ensure transparency and stakeholder engagement: Mandate comprehensive documentation (CLeAR, model cards) and publish transparency reports where appropriate. Engage with civil society, academia and regulators to incorporate feedback, especially for high‑impact use cases.

5.        Integrate sustainability and climate considerations: Set energy and carbon budgets, report J/req and gCO₂e/req using MLPerf Power methodologies[28][29]. Invest in hardware and software optimisations (quantisation, sparsity) to meet these budgets. Align disclosures with climate frameworks.

6.        Monitor regulatory evolution: Track updates to the EU AI Act, U.S. executive orders, UK AI Bill and OECD guidelines. Participate in standard‑setting bodies and industry coalitions to shape and anticipate emerging requirements.

7.        Foster a culture of ethical innovation: Provide training for board members and executives on AI risks and opportunities. Encourage engineers and business leaders to raise concerns. Tie compensation to ethical outcomes and SLO compliance.

This agenda transforms governance from a compliance checklist into a mechanism for creating value. It aligns fiduciary duty with societal expectations, ensuring that cost and performance gains do not come at the expense of safety and rights.

Conclusion: making trust auditable and actionable

AI’s disruptive power creates immense potential and significant risk. To harness the former while mitigating the latter, boards must treat trust not as a slogan but as a measurable commitment. This paper has proposed a governance stack that begins with global principles, operationalises risk through frameworks like NIST AI‑RMF, embeds governance in management systems via ISO 42001, meets regulatory obligations such as the EU AI Act, and leverages documentation and assurance ecosystems. It has translated these into Service Level Objectives across safety, performance, cost, transparency, privacy and risk management. It has examined cost curves, energy budgets and the hidden cost of harm, and presented case lessons showing how organisations can meet these SLOs in practice. Finally, it has offered a board agenda to embed trust into fiduciary duty.

The path forward is clear: boards must insist on transparency, measurement and independent assurance. They must integrate sustainability and human rights into AI strategies and adapt to evolving regulation. By doing so, they will transform trust from an aspiration into an auditable reality, unlock innovation responsibly and lead the next era of AI adoption without hand‑waving.


[1] [2] AI Act | Shaping Europe’s digital future

https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai

[3] White Papers 2024 Understanding the EU AI Act

https://www.isaca.org/resources/white-papers/2024/understanding-the-eu-ai-act

[4] [5] AI principles | OECD

https://www.oecd.org/en/topics/sub-issues/ai-principles.html

[6] [7] Evolving with innovation: The 2024 OECD AI Principles update - OECD.AI

https://oecd.ai/en/wonk/evolving-with-innovation-the-2024-oecd-ai-principles-update

[8] [9] [10] [11] [12] [13] [27] [30] Artificial Intelligence Index Report 2025

https://hai.stanford.edu/assets/files/hai_ai_index_report_2025.pdf

[14] [15] Artificial Intelligence Risk Management Framework (AI RMF 1.0)

https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf

[16] [17] [18] [22]  Introduction to AI assurance - GOV.UK

https://www.gov.uk/government/publications/introduction-to-ai-assurance/introduction-to-ai-assurance

[19] [20] [21] ISO 42001 Certification: Step-by-Step Guide to Achieve - Centraleyes

https://www.centraleyes.com/iso-42001-certification-step-by-step-guide-to-achieve/

[23] [24] CleAR_KChmielinski_FINAL.pdf

https://shorensteincenter.org/wp-content/uploads/2024/05/CleAR_KChmielinski_FINAL.pdf

[25] Partnership on AI Releases Guidance for Safe Foundation Model Deployment, Takes the Lead to Drive Positive Outcomes and Help Inform AI Governance Ahead of AI Safety Summit in UK

https://www.businesswire.com/news/home/20231024901268/en/Partnership-on-AI-Releases-Guidance-for-Safe-Foundation-Model-Deployment-Takes-the-Lead-to-Drive-Positive-Outcomes-and-Help-Inform-AI-Governance-Ahead-of-AI-Safety-Summit-in-UK

[26] Article 72: Post-Market Monitoring by Providers and Post-Market Monitoring Plan for High-Risk AI Systems | EU Artificial Intelligence Act

https://artificialintelligenceact.eu/article/72/

[28] [29] [31] [32] MLCommons Power Working Group Presents MLPerf Power benchmark at IEEE HPCA Symposium - MLCommons

https://mlcommons.org/2025/03/ml-commons-power-hpca/

Kostakis Bouzoukas

Kostakis Bouzoukas

London, UK