Thinking Beyond the Hype of Clinical LLMs

The recent explosion of Large Language Models (LLMs) like GPT has captured the world’s imagination. The ability of these models to understand and generate human-like text is undeniably impressive, and it has led to a wave of speculation about how they will transform every industry, including healthcare.

As a company that has been working at the intersection of AI and clinical data for years, we’ve been watching these developments with a mixture of excitement and caution. It’s crucial to separate the genuine potential from the hype and to approach the application of these powerful new tools with the rigor and responsibility that healthcare demands.

The Promise: Where LLMs Could Shine

There are several areas where LLMs could provide significant value in the clinical data space:

Summarization: One of the most promising near-term applications is the ability to summarize long, complex patient histories into concise, readable narratives. In our context this Summarization can be key to fact check all the different datapoints in a faster manner.
Enhanced Cohort Selection: LLMs could help researchers define complex patient cohorts using natural language. A researcher could describe the patient profile they’re looking for, and the LLM could help translate that into a structured query against an OMOP database.
Data Extraction on Steroids: LLMs represent a potential step-change for our existing NLP pipelines. Their advanced understanding of language could help us extract nuanced information from clinical notes with higher accuracy and less need for highly specific training data.

The Reality: Grounding LLMs in Clinical Practice

However, the leap from impressive demos to reliable clinical tools is a massive one. In healthcare, “close enough” is not good enough. We have to address several major challenges:

Accuracy and “Hallucinations”: LLMs are known to “hallucinate” or confidently invent facts. In a clinical context, this is unacceptable. An LLM that invents a diagnosis or misinterprets a lab value could have life-threatening consequences. Any application of LLMs in healthcare must have rigorous validation and human-in-the-loop oversight.
Bias: These models are trained on vast amounts of text from the internet and other sources, and they can inherit and amplify the biases present in that data. An LLM used for clinical trial matching could, for example, perpetuate historical biases against underrepresented patient populations.
Privacy and Data Security: The largest, most powerful LLMs are currently controlled by a few large tech companies. Using these models for clinical data requires sending sensitive information to a third-party API, which is a non-starter for us and our hospital partners. The future of clinical LLMs will likely rely on smaller, domain-specific models that can be run securely within a hospital’s own environment.

Our Approach: Cautious Experimentation

At IOMED, we are actively experimenting with LLMs, but we are doing so in a controlled and responsible way. We are exploring how they can augment our existing, battle-tested NLP workflows, not replace them. For example, we’re testing their ability to improve our entity recognition and normalization steps, always with a human expert validating the results.

LLMs are a powerful new tool in the AI toolbox, but they are not a magic bullet. Their application in healthcare requires a deep understanding of their limitations and a steadfast commitment to patient safety and data fidelity. The future is exciting, but we must proceed with caution and care.