Home AI Solutions MedTech Cyber Security Clients Company Blog עברית Get Started
MedTechAIMLSaMDClinical Validation

Clinical Validation Pathways for AI/ML SaMD: A Step-by-Step Map

Pelican Tech 6 min read
Abstract dark composition with stepped blue clinical evaluation pathways and orange validation checkpoints, evoking SaMD clinical validation map

Clinical validation is where many AI/ML SaMD programmes discover that they are six to twelve months behind their planned launch. The technical work was on schedule. The clinical evidence package was not. By the time the team realises the regulatory body wants more or different evidence than the team prepared, the calendar has slipped past the next viable filing window.

This piece is the pathway map we use with vendors planning the clinical validation phase for an AI/ML SaMD. It is opinionated about which pathways are appropriate for which classes of system, and where the choice between them is more flexible than the published guidance suggests.

The two questions that determine the pathway

Before deciding what evidence to generate, two questions decide which evidence framework applies.

1. What kind of clinical claim is the device making? A diagnostic device claims to identify a condition. A triage device claims to prioritise cases. A monitoring device claims to detect changes over time. A therapeutic decision-support device claims to inform a treatment choice. Each claim type has a corresponding evidence expectation, and the strongest version of the claim is the one the regulators evaluate. Vendors who hedge their claims to reduce the evidence burden often produce documentation that does not support the claim they actually want to make commercially.

2. How autonomous is the device's role in the clinical workflow? A device that produces a finding for a clinician to review has a different evidence bar from a device that produces a finding the clinician acts on without independent verification, which is different again from a device that takes action autonomously. The regulators care intensely about this; vendors often underestimate how much it matters.

These two questions, answered honestly, narrow the pathway dramatically. We see vendors waste six months because they entered the validation phase without resolving them.

The four pathways most AI/ML SaMD takes

For most clinical AI/ML SaMD, one of four validation pathways applies. The differences are not trivial, and the choice is partly strategic.

Pathway A: Retrospective performance evaluation against a curated reference standard. The system is evaluated on a held-out dataset with ground truth established by expert review or a gold-standard test. This is sufficient for many low-to-moderate risk diagnostic and triage applications, particularly where the comparator (clinician judgement) is the standard the system is replacing or augmenting. Evidence required: dataset characterisation (size, demographics, source diversity, prevalence), reference standard methodology, statistical performance metrics with confidence intervals, subgroup analysis. Time to evidence: 3–6 months once data is in hand.

Pathway B: Multi-reader multi-case (MRMC) study. Used when the claim is about clinician performance with the device versus without it. Multiple clinicians read multiple cases under both conditions; statistical inference uses methods like the Dorfman-Berbaum-Metz framework. Standard for radiology AI claims about diagnostic accuracy improvement. Evidence required: MRMC protocol with statistical pre-specification, reader recruitment plan, case selection that represents the intended-use population, ground-truth methodology, statistical analysis plan. Time to evidence: 6–9 months.

Pathway C: Prospective clinical study. Used when retrospective evidence is insufficient: the device introduces a workflow change beyond decision-support, the claim is about clinical outcomes (length of stay, mortality, treatment outcomes), or the regulator has explicitly asked for prospective data. Evidence required: protocol with ethics approval, prospective enrollment, defined primary and secondary endpoints, pre-specified statistical analysis. Time to evidence: 12–24 months.

Pathway D: Pivotal randomised controlled trial. Used for high-risk therapeutic decision-support, novel mechanisms of action, or claims about clinical superiority over existing standard of care. Required for some Class III indications, sometimes appropriate for Class IIb. Time to evidence: 18–36 months and substantial budget.

Most vendors land on Pathway A or B. Pathway C is increasingly common as regulators tighten expectations for AI systems that influence clinical workflow. Pathway D remains the exception; reserve it for claims that genuinely require it.

The evidence-quality factors regulators look for

Whichever pathway you choose, four evidence-quality factors determine whether the validation holds up under regulatory review. They apply across all pathways, and weakness in any of them creates problems.

Population representativeness. The validation data must reflect the population the device will encounter in deployment. This is the area where AI/ML SaMD most often fails: training data was sourced from one or two academic centres, and the validation reuses that population, but deployment will be in community hospitals across multiple regions. The regulator will ask. The honest answer is rarely the one that smooths the review.

Subgroup performance. Statistical performance broken down by clinically relevant subgroups: age, sex, ethnicity, comorbidities, disease severity. Disparities must be reported and addressed. The FDA's recent draft guidance on AI/ML in SaMD makes this explicit; the European MDR has always implied it through Article 61 clinical evaluation requirements.

Robustness to realistic noise. Performance under conditions the system will actually encounter: imaging artefacts, missing data fields, out-of-distribution inputs. A device that performs perfectly on curated test cases but degrades on routine clinical inputs will not retain its claimed performance in practice; regulators are increasingly testing for this.

Comparator appropriateness. The performance comparison must be against a meaningful clinical comparator, not a strawman. "Faster than radiologist" is meaningful if speed has clinical value; "more accurate than no test at all" rarely is.

These four are independently important. Programmes that ace one and ignore another produce evidence packages that look strong on paper and fail under review.

The order of operations that prevents the six-month slip

The reason vendors lose time in clinical validation is usually order of operations, not work itself. The sequence that prevents the slip:

Months 0–3: Resolve the claim and the role. Before any data work, decide what claim you are making and what the device's role in the clinical workflow is. Document both with the precision the regulator will use to interpret them. Claim drift after this point is a major source of rework.

Months 3–6: Pre-specify the validation plan. With a competent biostatistician, write the validation protocol before generating data. Define the reference standard, the population, the endpoints, the analysis plan, the success criteria. Pre-specification is what distinguishes evidence from data dredging in the regulator's eyes.

Months 6–12: Generate the data and execute the analysis. With a pre-specified plan, this is execution, not exploration. The pathway determines the duration; pathways A and B fit comfortably in this window, pathway C does not.

Months 12–15: Build the clinical evaluation report. Synthesise the validation evidence into a clinical evaluation report (MDR vocabulary) or a comparable submission section (FDA vocabulary). This is a writing exercise if the previous phases were done correctly; it is a salvage exercise if they were not.

Months 15–18: Submission and review. The package goes to the regulator. Review cycles typically run 90–270 days depending on the regulator and submission type, with one or more rounds of additional information likely.

The total: 18 months from claim definition to clearance for a typical Pathway A or B programme. Vendors who attempt to compress this by skipping the claim/plan phase routinely add the time back during the review cycle when the gaps become visible.

Where this connects to our practice

Pelican Tech's MedTech practice builds clinical validation programmes for AI/ML SaMD with the order-of-operations discipline above, working alongside the engineering and statistical teams to produce evidence rather than retrospective documentation. We work with our regulatory affairs team when the clinical evaluation needs to be assembled into FDA, MDR, or IVDR submissions, and with our AI Solutions practice when the underlying ML engineering needs to support the evidence requirements identified during validation planning.

If you are within 12 months of a planned submission for an AI/ML SaMD and the clinical evidence package is not yet structured around a pre-specified validation plan, that is the engagement to start with before the data work begins.