Fairness / red-team probes / launch gates

AI Safety Audit Tool

The platform's release-integrity baseline now connects fairness metrics and scenario benchmarks to an implemented PyRIT adversarial gate: PASS, WARN, or BLOCK with redacted evidence, a control, and an owner.

Local fairness executionCommitted provider benchmark summary

Cases: 540/model
Scenarios: 3 domains
Models: 2 compared
Decision: Launch gate

Run locally View GitHub ↗ Benchmarks

Decision: Tie fairness gaps and red-team failures to PASS, WARN, or BLOCK launch verdicts.
Why: Policy language gave launch owners no threshold, evidence record, or accountable follow-up.
Result: A 540-case-per-model benchmark across three risk domains with baseline-versus-candidate comparison.

Quick readGovernance

Safety needs to show up as a launch artifact.

Responsible AI often stops at policy language. The harder part is giving a launch review one place to inspect scenario data, fairness gaps, red-team probes, and a clear go/no-go verdict—with an owner, retained evidence, exception expiry, and remediation path.

Scenario-based review

Hiring, lending, and support triage make the policy discussion concrete.

Fairness by segment

Protected-group metrics appear as gaps the launch team can debate and defend.

40 PyRIT probes

The implemented harness captures three vulnerable-baseline successes and produces the correct hard BLOCK.

Explicit verdicts

PASS, WARN, and BLOCK include reasons that can survive a launch meeting.

BenchmarksGate

The benchmark has one job: show when to block a launch.

Overview

Gate verdict

Launch decision with reasons, not a vague risk label.

Fairness

Segment gaps

Demographic parity, equal opportunity, and FPR gap visibility.

Change

Baseline vs candidate

Regressions become visible before launch.

Report

Audit artifact

The output can be shared in a launch review.

Open safety tool Inspect the automated red-team gate See the release-integrity capability

Evidence contractTwo proof modes

Fairness calculations and provider evidence stay separate.

The browser recalculates disparity metrics from deterministic sample rows or a validated local CSV. Run npm run ai-safety:bench -- --limit 540 in the source repository to reproduce the provider evidence summarized in the second tab, including the full source artifact hash. Red-team evidence remains in its dedicated campaign explorer.

Evidence manifest ↗Benchmark summary ↗Red-team campaign ↗