top of page

BiomedGPT: Applications of Multimodal Large Language Models in Various Biomedical Tasks

With the rapid growth of AI, the medical field increasingly relies on multimodal large language models to integrate visual and linguistic data, enabling precise diagnosis, treatment, and patient care. Traditional models often lack flexibility across diverse tasks, but BiomedGPT overcomes these challenges. This lightweight, open-source vision-language model excels in adaptability and performance, achieving breakthroughs in biomedical applications through advanced architecture, pretraining strategies, and fine-tuning methods.


Jump to ↓

The BiomedGPT framework integrates multimodal inputs, including text and images, using a transformer-based encoder-decoder architecture. It supports tasks like masked language modeling (MLM), image modeling, and fine-tuning for biomedical applications such as VQA, image classification, and medical report generation. This unified design ensures high adaptability and efficiency for diverse medical AI tasks.
BiomedGPT framework

Distinct Advantages of BiomedGPT: Architecture and Features of Multimodal Large Language Models

1. Unified Multimodal Representation for Biomedical Tasks

BiomedGPT is a Transformer-based encoder-decoder architecture designed for multimodal biomedical tasks. It processes text, images, and multimodal data seamlessly through a unified input-output representation. Key features include:

  • Text: Processed using BPE (Byte Pair Encoding) tokenization.

  • Images: Encoded with a pretrained VQ-GAN discretization mechanism.


2. Comprehensive Multimodal Task Support

BiomedGPT supports a wide range of biomedical tasks:

  • Vision tasks: Image classification, masked image modeling (MIM), object detection.

  • Text tasks: Masked language modeling (MLM), summarization, natural language inference.

  • Multimodal tasks: Biomedical visual question answering (VQA), image captioning.


3. Lightweight Design for Multimodal Large Language Models

BiomedGPT offers three versions (small, medium, and base) with 33 million, 93 million, and 182 million parameters, respectively, catering to diverse computational needs. Despite having significantly fewer parameters than the commercial Med-PaLM M model (12 billion parameters), it excels in a variety of biomedical tasks.


BiomedGPT Methodology: Core Technologies of Multimodal Large Language Models in Biomedical Tasks

1. Architecture Design: Encoder-Decoder Model

BiomedGPT combines a BERT-style encoder and GPT-style decoder within its architecture. It enhances sequence understanding with relative positional biases.


2. Pretraining Strategies: Large-Scale Biomedical Multimodal Data

BiomedGPT is pretrained on 14 public datasets, encompassing 592,567 images, 183 million text sentences, and 271,804 image-text pairs. Key pretraining tasks include:

  • Masked image modeling (MIM).

  • Masked language modeling (MLM).

  • Image captioning.

  • Biomedical visual question answering (VQA).


3. Fine-Tuning and Task Adaptation

BiomedGPT achieves outstanding adaptability across tasks, including medical image classification, report generation, and natural language inference, by fine-tuning existing weights without requiring additional components.


4. Instruction Tuning: Enhancing Multimodal Task Performance

Using natural language instructions, BiomedGPT generates accurate answers in zero-shot scenarios. For example:

  • Image description: "What does the image describe?"

  • Text summarization: "What is the summary of the text '{Text}'?"


5. Exceptional Zero-Shot Prediction Capability

BiomedGPT exhibits robust zero-shot reasoning capabilities in biomedical tasks, achieving a 54.7% accuracy rate on the VQA-RAD dataset, surpassing GPT-4V's 53.0%.


Performance Advantages and Application Scenarios of BiomedGPT in Biomedical Tasks

1. Biomedical Visual Question Answering (VQA): Task-Specific Capabilities of Multimodal Large Language Models

BiomedGPT achieved an 86.1% accuracy rate on the SLAKE dataset, setting a new record (previously 85.4%). This highlights its strong ability to interpret medical images and answer related questions, facilitating the rapid extraction of key information.


2. Medical Image Description: Language Generation in Multimodal Tasks

BiomedGPT improved the ROUGE-L score by 8.1% on the Peir Gross dataset and achieved a METEOR score of 15.9% on the MIMIC-CXR dataset. This makes it a powerful tool for radiologists in describing medical images.


3. Medical Image Classification: Enhancing Accuracy in Dermatology and Other Tasks

In seven classification tasks on the MedMNIST-Raw dataset, BiomedGPT outperformed other models in five tasks. Notably, it achieved a 14% higher accuracy rate than baseline models on the dermoscopy dataset, demonstrating its potential in detecting dermatological conditions.


In zero-shot settings, BiomedGPT successfully handled complex disease diagnosis tasks, performing on par with Med-PaLM M. Its capabilities are particularly valuable for diagnosing rare diseases and emerging conditions.


5. In-Hospital Mortality Prediction: Accurate Evaluation in Medical Tasks

BiomedGPT outperformed other models in predicting in-hospital mortality using the MIMIC-III database, aiding in identifying high-risk patients and optimizing resource allocation.


6. Medical Report Generation and Summarization: Document Creation in Multimodal Tasks

On the MIMIC-CXR dataset, BiomedGPT-generated reports were favorably rated by medical experts, achieving a preference score comparable to expert-authored reports (48% vs. 52%). This reduces the documentation burden in high-intensity scenarios like radiology.


AIExPro: Multimodal Large Language Model Solutions for Biomedical Tasks

AIExPro focuses on the innovative application of multimodal large language models in various biomedical tasks, providing precise and efficient solutions for the healthcare industry. By integrating medical imaging and text data, AIExPro demonstrates exceptional vision-language interaction capabilities, driving breakthroughs in fields such as medical imaging diagnostics, clinical report generation, drug discovery, and medical text analysis. Its unified input-output representation ensures task flexibility, while the lightweight design lowers deployment barriers, enabling healthcare institutions to implement high-performance models without relying on extensive computational resources. AIExPro offers intelligent support for medical teams, advancing healthcare services toward precision and intelligence.

 
 
Reliable service

Dr. Mark Johnson, PhD in Machine Learning

He is a machine learning expert with over 12 years of experience in algorithm development. He earned his PhD from the University of Washington, specializing in deep learning. Dr. Johnson has collaborated with top tech companies to create AI solutions that enhance business operations and customer experiences.

Ready to become an AI company?

Take the first steps in your transformation.

Talk to Our AI Experts
bottom of page