Acquisition-State Reliability · Physics-Grounded · Imaging AI

You know when your scanners are failing.
Do you know when your AI is?

AI models do not read DICOM metadata — they read pixels.

Reconstruction conditions, dose drift, and slice thickness changes that standard metadata pipelines cannot detect materially affect what your model actually sees. GammaMetric characterizes the imaging environment your AI encounters — so you know when a study is inside or outside its validated envelope.

Real paired NLST CT: the same nodule confidently detected under both reconstruction kernels, but the kernel alone shifts the AI-measured diameter across the 6mm Fleischner threshold, flipping the follow-up recommendation
Credentials
Yale School of Medicine Diagnostic Physics Residency
ABR Board-Eligible Diagnostic Medical Physicist
Full Study Under Review — Academic Radiology
arXiv:2603.26785 — AI Sensitivity Preprint
Interval-to-Diameter Ratio Governs Reconstruction-Phase Sensitivity — Physics in Medicine & Biology (submitted)
154-Case LIDC-IDRI Perturbation Study
1 in 6 patients received a different Lung-RADS follow-up recommendation between full-dose and quarter-dose reconstructions of the same scan — even though the DICOM metadata was identical.
AAPM Mayo Clinic · Real Projection-Domain Data · n=45 · Replicated on LIDC-IDRI n=183
Live Engine Output

Real inputs. Real outputs.
No mock data.

Three studies run through the sensitivity engine. Each score is what the API returns — your system decides what to do with it.

■ RED — Immediate Alert Siemens SOMATOM · 5.0mm · B40f · 8.2 mGy
Est. Relative Sensitivity 56.2% baseline 78.2% · CI [51.2–61.2%]
Degradation −22pp below validated baseline
Drivers
Slice thickness 5.0mm  −13.2pp CTDIvol 5.0 mGy  −2.3pp
Diameter Uncertainty
Mean shift ↑ +1.7mm · 95% CI width 9.9mm — nodule sizes may be overestimated under these conditions
Score returned: RED. 3–6mm nodules 28% relatively less likely to be detected under current acquisition conditions. Diameter measurements may be overestimated by +1.7mm on average. Your system decides what to surface.
Example downstream action — hospital-side email notification built on the RED score
RED Acquisition Reliability Warning alert
■ YELLOW — Daily Digest GE Revolution · 3.75mm · B30f · 7.5 mGy
Est. Relative Sensitivity 68.9% baseline 78.2% · CI [63.9–73.9%]
Degradation −9.3pp below validated baseline
Drivers
Slice thickness 3.75mm  −8.1pp Dose 7.5 mGy  −1.2pp
Score returned: YELLOW. Acquisition conditions are moderately outside the validated envelope. Your system decides whether to flag or surface the result normally.
■ GREEN — No Action Philips IQon · 1.25mm · B30f · 9.0 mGy
Est. Relative Sensitivity 77.7% baseline 78.2% · CI [72.7–82.7%]
Degradation −0.5pp within normal range
Drivers
Dose 9.0 mGy  −0.5pp
Score returned: GREEN. Acquisition parameters within the characterized validation envelope. Proceed normally.
Note — These are real outputs from the live engine. Sensitivity deltas are derived from 154-case LIDC-IDRI perturbation experiments (arXiv:2603.26785). Baseline: MONAI RetinaNet, LUNA16-trained, v1.0.0.
View Sample Detectability Report →
Capabilities

One engine. Two buyers.
Same acquisition-state signal.

The underlying engine is shared. AI vendors use it per-scan before surfacing a result. Health systems use it site-wide for governance and protocol QA. Same physics. Different surfaces.

AI Vendors — Available Now

Reliability API

  • Per-scan reliability score (GREEN / YELLOW / RED) via REST API or DICOM webhook — your system decides how to use it
  • Physics-grounded sensitivity prediction from acquisition metadata — slice thickness, kernel, dose, reconstruction state
  • Pixel-based acquisition fingerprinting — detects conditions standard DICOM cannot expose; ConvolutionKernel reads identically for FBP and iterative reconstruction (AUC 0.995 on independent phantom validation)
  • Detection-aware scoring — tier elevates when AI confidence falls in the acquisition-sensitive regime under mismatch
  • Per-nodule comparability scoring across prior and current reconstruction conditions
  • Full audit log — every study classified and timestamped
  • PDF reliability report on demand — suitable for post-market surveillance documentation
  • Based on published research: arXiv:2603.26785, under review at Academic Radiology
Health Systems — Available Now

Site Reliability Monitoring

  • Passive Orthanc DICOM listener — every study classified automatically, no workflow change
  • Per-study reliability record — acquisition parameters, sensitivity estimate, full audit trail
  • Automated site reliability report — acquisition trends, protocol drift, sensitivity impact over time
  • Designed for post-market surveillance, Joint Commission QA, and CHAI governance programs
  • Answers the question regulators are starting to ask: is your AI performing as validated at this site, with these protocols?
  • Diameter uncertainty quantification — mean shift and 95% CI per acquisition state
Site dashboard — coming soon
Also Available — Free Tool
CT Dose Analytics & Leapfrog Reporting
Leapfrog Section 8B, ACR DIR benchmarking, protocol outlier detection. Free at dose.gammametric.com.
Try It Free →
Case Studies

The research behind the work.

Two analyses showing exactly what GammaMetric measures — and what it finds.

Protocol Optimization

Your Protocols Are Costing You on Three Fronts Simultaneously

Dose compliance. Image quality. AI performance. Most CT protocol reviews address one. This analysis shows how the parameters interact — and which ones actually matter.

  • 5mm slice thickness: −13.2pp AI sensitivity loss
  • Soft reconstruction kernel: −10.5pp AI sensitivity loss
  • mAs reduction: only −4pp — the least destructive lever
  • Leapfrog compliance and AI performance are different problems
Read the case study →
AI Validation

How Your AI Degrades After Deployment

Post-deployment validation of a CT lung nodule detection algorithm across six real-world imaging perturbations. Based on LIDC-IDRI (154 cases). Methodology: arXiv:2603.26785.

  • Baseline sensitivity: 84.8% under reference protocol
  • Combined perturbation: ~65–68% — a 20pp gap
  • Effect most pronounced in the 3–6mm nodule range
  • Vendor benchmarks do not reflect site-specific conditions
Read the case study →
The Problem

AI models are validated in one imaging environment
and deployed into another.

The gap between validation conditions and real-world deployment is where AI performance quietly degrades — and where accountability belongs to whoever ships the model.

01

DICOM Metadata Is Not Enough

ConvolutionKernel reads identically for FBP and iterative reconstruction on major scanners. Slice thickness varies across sites. Dose drifts without notice. Standard pipelines are blind to acquisition conditions that materially affect model performance.

02

Vendor Benchmarks Don't Reflect Deployment

FDA clearance is tested at controlled dose levels and standard protocols. Real-world sites run lower doses, thicker slices, and varied reconstruction. The gap between "cleared" and "deployed" is rarely measured — until now.

03

Failures Get Blamed on the Model

When acquisition conditions push a study outside the validated envelope, the AI result is unreliable — but the model gets blamed. GammaMetric quantifies which studies are operating outside that envelope before a result is surfaced.

Integration

One API call.
Before you surface a result.

GammaMetric runs passively in your pipeline. Every study gets scored before your AI result is surfaced — your system decides what to do with the signal.

01

Send the Study

Forward DICOM headers (or pixel data) to the GammaMetric API via webhook or REST. Only acquisition parameters and pixel patches are used — no PHI transmitted, no image storage.

02

Get a Reliability Score

Each study is scored against your model's characterized acquisition envelope. Slice thickness, kernel, dose, and reconstruction state are all assessed. You get GREEN / YELLOW / RED plus sensitivity delta.

03

Your System Decides

Show the result. Suppress it. Flag it. Route it to secondary review. GammaMetric returns the signal — you own the decision. No clinical logic baked in, no radiologist-facing UI required.

On Demand

PDF Reliability Report

Generate a 7-page site-specific reliability report from any study — acquisition profile, sensitivity degradation analysis, and prioritized protocol recommendations. Suitable for post-market surveillance documentation.

Request a Demo →
Also Available

CT Dose Analytics

Self-serve CT dose monitoring at dose.gammametric.com. Leapfrog Section 8B reporting, ACR DIR benchmarking, drift alerts. Free to use.

Try It Free →
Report Contents

Everything your quality
program needs

Compliance

Leapfrog Section 8B Reporting

Median DLP for routine head and abdomen-pelvis CT across all five Leapfrog pediatric age groups (<1, 1–4, 5–9, 10–14, 15–17) — formatted and ready for Section 8B reference.

Benchmarking

ACR DIR Benchmark Comparisons

Your facility's dose percentiles compared against ACR Dose Index Registry national reference levels. Clear status flags — Excellent, Acceptable, or Above Benchmark — for every body region.

Quality

Outlier Detection

Automatic identification of exams with unusually high DLP — repeat acquisitions, wrong protocols, or multi-phase studies — with transparent methodology notes for your physics team.

Optimization

Protocol Observations

Physicist observations on protocol consistency, scanner variability, and dose reduction opportunities — useful context for your quality improvement program beyond compliance reporting.

Trend

Dose Trends Across Reporting Period

Dose trends visualized across your full reporting period. Identify protocol changes, scanner drift, or technologist variability — supporting ongoing QA program development beyond Leapfrog season.

Deliverable

Professional PDF Report

Publication-quality output with percentile tables, benchmark charts, methodology documentation, and your facility name — suitable for quality committee presentation or Leapfrog submission reference.

Context

The performance gap
is measurable

GammaMetric's own pilot study quantifies how acquisition variability affects imaging AI — and why protocol optimization matters beyond compliance.

1 in 6

Patients who receive a different AI-derived Lung-RADS follow-up recommendation between full-dose and quarter-dose reconstructions of the same scan (n=183, LIDC-IDRI; replicated on real projection-domain data, AAPM Mayo).

~19pp

Sensitivity drop at 5mm slice thickness versus standard. The gap between your protocol and the vendor's validation conditions is rarely measured.

0.995 AUC

Domain separability between FBP and iterative reconstruction on phantom data — with identical DICOM ConvolutionKernel tags. Standard metadata pipelines cannot detect this condition. Pixel analysis can.

Pricing

Built for vendors.
Priced per site.

AI monitoring is the primary product. CT dose analytics runs alongside it, free.

CT Dose Analytics
$0
free tool · always available
  • Self-serve at dose.gammametric.com
  • Adult CT DLP percentiles — all body regions
  • Pediatric CT — all five Leapfrog age strata
  • ACR DIR national benchmark comparisons
  • Drift alerts and QA acknowledgment workflow
  • Physicist-reviewed reports available — $1,500/facility/year
Try It Free →
FAQ

Common questions

What data format do you accept?
CSV exports from Radimetrics (Bayer), DoseWatch (GE), or any dose monitoring system. Manual PACS query exports are also accepted. Common column naming conventions are auto-detected. Non-standard formats are welcome — format mapping is handled before analysis begins.
Is my data secure?
De-identifying patient data before sending is strongly recommended — the analysis only requires dose metrics, exam descriptions, and patient age. No PHI is needed or requested. Raw data files are not retained after analysis is complete.
What Leapfrog section does this cover?
Section 8B: Pediatric Computed Tomography (CT) Radiation Dose. This requires reporting median DLP for routine head and abdomen-pelvis CT across five pediatric age groups. Reports provide exactly those data points, plus adult CT analysis as a value-add for your quality program.
Is this a replacement for a medical physicist?
No. Every report includes review by a diagnostic medical physicist, but final interpretation, regulatory compliance, and clinical protocols remain the responsibility of your institution and its qualified physics staff. GammaMetric is a reporting and analytics service, not a substitute for physics oversight.
How are benchmarks determined?
Dose percentiles are compared against ACR Dose Index Registry (DIR) national reference levels, maintained and updated by a diagnostic medical physicist to reflect current national practice.
What does an AI validation engagement look like?
This describes a standalone validation engagement, separate from the continuous monitoring product. You provide de-identified DICOM data or model outputs across your facility's acquisition conditions (dose levels, slice thicknesses, protocols). GammaMetric applies systematic degradation and inference to quantify sensitivity loss and failure modes under each condition — delivered as a physicist-reviewed report with methodology documentation suitable for quality committee or regulatory review.
What does the API actually return?
A per-scan reliability score (GREEN / YELLOW / RED), estimated sensitivity delta relative to your model's validation baseline, acquisition state characterization (dose, slice thickness, kernel, reconstruction conditions), and any pixel-fingerprinting findings the DICOM metadata did not expose. Your system decides what to do — show the AI result, suppress it, flag it, or route it. Every scored study is logged with its full parameter set for audit trail purposes.
Get Started

See it
working.

15-minute demo. Live API, real DICOM parameters, real reliability score. No slides.

Request a Demo → CT Dose Tool →

Or email directly: dan@gammametric.com