Trustworthy AI in Healthcare: Open vs Closed LLMs

Artificial intelligence (AI) has become a constant presence in healthcare, from reading radiology images to summarising clinical notes. The promise is seductive: precision, speed, and scalability. But beneath that promise lies a pressing question: can we trust AI systems to diagnose us without bias if we don’t know how they were trained or on whom?

Understanding bias in healthcare AI

Closed large language models (LLMs), developed by commercial or institutional entities, rarely disclose their training datasets. This lack of transparency conceals the social and clinical biases embedded within them.

According to The Lancet Digital Health, every health dataset carries limitations that can encode inequities into AI systems.

Underrepresentation of women, ethnic minorities, or low-income groups reduces algorithmic accuracy and perpetuates inequality in care.

Bias can also arise from missing or incomplete data. For instance, patients from underserved regions may undergo fewer diagnostic tests, leading algorithms to mistakenly label them “healthy” when they are not.

Why Transparency Matters in AI Development

A transparent model, whether open-source or simply well-documented, allows clinicians, regulators, and patients to understand how it works.

Closed vs. Open Approach of Deploying LLMs for Clinical Applications
Source: Dennstädt et al., NPJ Digital Medicine (2025)

This framework compares closed LLMs, which rely on external commercial servers, with open LLMs that run within local healthcare environments. While open models require greater technical expertise, they offer more control and keep sensitive patient data local.

The Frontiers in Digital Health “trustworthy AI reality check” found that fewer than one-third of CE-certified radiology AIs in Europe provided basic documentation on training data, consent procedures, or limitations. Without such information, healthcare organisations cannot accurately assess the risk of bias.

Transparency, as emphasised by the Institute for Healthcare Improvement, builds trust. Patients should be informed when AI is used in their care, and clinicians must receive practical training to use these tools safely and ethically.

Recognising bias across the AI model life cycle

Even the most sophisticated AI cannot transcend the biases inherent in the system that generated its data. Researchers at the Mayo Clinic describe two intertwined forms of bias:

Inherent bias — resulting from unrepresentative datasets
Labelling bias — where flawed proxy measures (like healthcare cost) stand in for true health indicators

The AI Model Life Cycle and Common Biases Across Each Phase
Source: Hasanzadeh et al., NPJ Digital Medicine (2025)

Bias can emerge at any of the six stages of AI development, from conception and data collection to clinical deployment and post-deployment surveillance. Each stage introduces distinct risks, including representation bias, algorithmic bias, and feedback loop bias, all of which can undermine fairness and equity in healthcare.

Evaluating and mitigating bias in large language models

Recent meta-analyses have revealed that generative AI models can match the diagnostic accuracy of non-expert physicians but still lag behind specialists, achieving an accuracy rate of around 52%. This positions AI as a clinical aid, not a replacement. However, many studies fail to disclose their training data provenance, raising ongoing concerns about reproducibility and fairness.

Five-Step Process for Large Language Model Evaluation
Source: Templin et al., NPJ Digital Medicine (2025)

The five-step evaluation process outlines a practical path to building trustworthy AI:

Engage stakeholders to define objectives, parameters, and metrics.
Calibrate models and generate synthetic data relevant to the patient population.
Execute and analyse audits through perturbation tests.
Align values and ethics to ensure clinician and stakeholder acceptability.
Continuously evaluate AI systems in real-world clinical settings.

This structured framework emphasises the importance of continuous validation, monitoring for data drift, and accountability throughout an AI model’s lifecycle.

The path forward: Building trustworthy AI in healthcare

Trustworthy AI requires three layers of transparency:

Public transparency, where key information is accessible beyond regulators.
Substantive transparency, which includes data sources, consent processes, limitations, and validation results.
Educational transparency, ensuring clinicians and patients understand AI’s role and boundaries.

Open or closed, every AI system must earn trust through the openness of its process, clarity of data, and accountability of its outcomes. Without transparency, there is no informed consent, only blind faith.

Bias in AI is not inevitable, but hiding it perpetuates its persistence. The future of ethical, reliable healthcare AI depends on a timeless medical principle: do no harm, and show your workings.

Authored by Tom Varghese, Global Product Marketing & Growth Manager at Orion Health.

References

Adedinsewo, Demilade, and Sana M. Al-Khatib. “Understanding AI Bias in Clinical Practice.” Heart Rhythm 21, no. 10 (October 2024): e262–e264.
Alderman, Joseph E., Joanne Palmer, Elinor Laws, et al. “Tackling Algorithmic Bias and Promoting Transparency in Health Datasets: The STANDING Together Consensus Recommendations.” The Lancet Digital Health 7, no. 1 (January 2025): e64–88.
Cross, James L., Michael A. Choma, and John A. Onofrey. “Bias in Medical AI: Implications for Clinical Decision-Making.” PLOS Digital Health 3, no. 11 (November 7, 2024): e0000651.
Fehr, Jana, Brian Citro, Rohit Malpani, Christoph Lippert, and Vince I. Madai. “A Trustworthy AI Reality-Check: The Lack of Transparency of Artificial Intelligence Products in Healthcare.” Frontiers in Digital Health 6 (February 20, 2024): 1267290.
Institute for Healthcare Improvement. Brett Moran, Amy Weckman, and Natalie Martinez. “Transparency and Training: Keys to Trusted AI in Health Care.” IHI Blog, September 25, 2025.
Smith, Derek. “Accounting for Bias in Medical Data Helps Prevent AI from Amplifying Racial Disparity.” Michigan Engineering News, October 30, 2024.
Takita, Hirotaka, Daijiro Kabata, Shannon L. Walston, et al. “A Systematic Review and Meta-Analysis of Diagnostic Performance Comparison Between Generative AI and Physicians.” npj Digital Medicine 8 (2025): 175.
“Unveiling Transparency in Medical AI Systems.” Bioengineer.org, September 10, 2025.
University of Washington News. Stefan Milne. “Q&A: Transparency in Medical AI Systems Is Vital, UW Researchers Say.” UW News, September 10, 2025.

Open or closed LLMs: Can we trust AI models to diagnose us without bias?

Understanding bias in healthcare AI

Why Transparency Matters in AI Development

Recognising bias across the AI model life cycle

Evaluating and mitigating bias in large language models

The path forward: Building trustworthy AI in healthcare

References

Keep up to date with the latest in healthcare technology

Products

Resources

Solutions

Contact

About Us

Support

Careers

Canada

USA

United Kingdom

France

Spain

Middle East

South East Asia

Australia

New Zealand

Open or closed LLMs: Can we trust AI models to diagnose us without bias?

Understanding bias in healthcare AI

Why Transparency Matters in AI Development

Recognising bias across the AI model life cycle

Evaluating and mitigating bias in large language models

The path forward: Building trustworthy AI in healthcare

References

Keep up to date with the latest in healthcare technology

Related Resources

Why “Shift Left, Stay Left” Is More Than a Catchphrase.

From Scepticism to Trust: Finding the ‘Aha’ Moment with AI in Healthcare

Are We Funding Sickness Instead of Health?

Products

Resources

Solutions

Contact

About Us

Support

Careers

Canada

USA

United Kingdom

France

Spain

Middle East

South East Asia

Australia

New Zealand