Transforming Pathology with AI: Unlocking Structured Data from Unstructured Reports

May 8, 2025
Clarity Bot

A groundbreaking study from leading research institutions demonstrates how large language models (LLMs) can accurately extract structured clinical data from unstructured pathology reports—enabling scalable, privacy-respecting, and high-performing applications in medical data processing.

Abstract

Purpose

This study presents and evaluates an advanced AI system powered by a large language model (LLM) for structured information extraction from breast cancer histopathology reports.

Demonstrates human-level accuracy using zero-shot prompting without prior model fine-tuning
Successfully extracts 51 complex clinical features from unstructured medical text
Introduces the open-source Medical Report Information Extractor tool to expand accessibility for non-programmers

Scope

The research addresses a critical challenge in clinical informatics: transforming unstructured medical data into usable, structured formats at scale.

Scalable Data Extraction: Enables high-volume processing of pathology reports without manual annotation
Clinical Accuracy: Matches expert-level performance in identifying detailed clinical features
Accessibility & Usability: Provides an open-source solution tailored to users without programming expertise
Privacy & Cost Efficiency: Supports self-hosting to protect patient data and reduce infrastructure costs
Open Tooling: Facilitates widespread adoption through transparent, shareable resources

This work provides a significant advancement for healthcare systems and medical researchers, offering a practical path toward large-scale integration of LLMs in clinical workflows.

Summary

A Solution for Unstructured Medical Text

In clinical and research environments, pathology reports are typically in unstructured formats, making it difficult to extract usable data. Manual extraction is costly, slow, and often error-prone. This study presents a scalable, efficient alternative by employing large language models, enabling automated extraction of structured data dictated by a study-specific data dictionary.

Zero-Shot Prompting with LLMs

The team developed a methodology using zero-shot prompting—a technique where the model receives only task instructions, without needing training data. This makes deployment accessible, even when labeled data is unavailable. Prompts were tuned using a small training set but designed to generalize to new, unseen pathology reports with high performance.

The Medical Report Information Extractor

Researchers created a modular web application—“Medical Report Information Extractor”—which connects to various LLMs via APIs. The app extracts structured outputs (JSON) and optionally converts them into standardized formats (JSON-LD) using SNOMED CT. Users can modify the behavior of the tool through three simple and human-readable configuration files, enabling use by non-programmers.

Performance Benchmarking of Language Models

Five state-of-the-art LLMs were evaluated, including OpenAI’s proprietary GPT-4o and open-source self-hostable Meta’s Llama 3 models (405B, 70B, and 8B). GPT-4o reached 96.1% accuracy, closely followed by Llama 3.1 405B at 94.7%, both comparable to the human annotator. These findings support the viability of using LLMs as cost-effective and scalable alternatives to manual annotation.

Cost-Performance Tradeoffs

While GPT-4o demonstrated the highest accuracy, it incurred the highest processing cost. Llama 3.1 70B offered a strong balance between performance and cost, making it attractive for self-hosted deployments. The smallest, portable model (Llama 3.1 8B) remained underpowered, though promising for future on-device applications with privacy benefits.

Addressing Data Privacy and Standardization

Recognizing the sensitivity of medical data, the study emphasizes self-hosting capabilities to preserve privacy. Additionally, the extracted outputs were mapped to SNOMED CT terms using Linked Data (JSON-LD format), enabling interoperability for downstream research—a significant step towards FAIR data principles in clinical informatics.

Evaluation Process and Gold Standard Generation

To fairly assess performance, a new gold standard dataset was developed through conflict resolution between GPT-4o and human annotator outputs. An independent physician resolved disagreements, and cases were reviewed for OCR errors. This rigorous methodology ensured the reliability of evaluation metrics.

Challenges in Clinical Text Understanding

The study identifies ambiguities inherent in pathology report structures and domain-specific data dictionaries—highlighting the need for better documentation standards and consistent terminology. It also notes the risk introduced by OCR errors and suggests extending evaluations to multilingual and multi-specimen reports.

Implications for Foundation Model Deployment

The report emphasizes that foundation models, including LLMs, are uniquely positioned to enable rapid development of intelligent, generalizable applications across domains. However, version control, validation benchmarks, and adaptability to model drift must be maintained. The design of the Medical Report Information Extractor aligns well with these future-proofing practices.

Open Access and Future Directions

The software is open-sourced and available for adaptation across other types of clinical or non-clinical text. Future work includes integrating multimodal models to eliminate OCR, automating prompt engineering, and fine-tuning smaller models for optimized performance on specific tasks—opening doors to even greater accessibility.

This study represents a pivotal step toward transforming biomedical data extraction workflows using AI. By combining the power of foundation models, a configurable user interface, and international data standards, the authors offer a scalable, privacy-conscious, and practical solution for the healthcare AI community. Institutions seeking to modernize clinical research and health informatics infrastructure will find this work especially consequential.

Resource

Read more in Leveraging large language models for structured information extraction from pathology reports by Jeya Balaji Balasubramanian and other researchers

Liked this post? Share with others!

Subscribe to our newsletter

Collect visitor’s submissions and store it directly in your Elementor account, or integrate your favorite marketing & CRM tools.