Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Abstract
Libra, a temporal-aware multimodal large language model with a radiology-specific image encoder and Temporal Alignment Connector, achieves state-of-the-art performance in radiology report generation by effectively integrating temporal and visual information.
Radiology report generation (RRG) is a challenging task, as it requires a thorough understanding of medical images, integration of multiple temporal inputs, and accurate report generation. Effective interpretation of medical images, such as chest X-rays (CXRs), demands sophisticated visual-language reasoning to map visual findings to structured reports. Recent studies have shown that multimodal large language models (MLLMs) can acquire multimodal capabilities by aligning with pre-trained vision encoders. However, current approaches predominantly focus on single-image analysis or utilise rule-based symbolic processing to handle multiple images, thereby overlooking the essential temporal information derived from comparing current images with prior ones. To overcome this critical limitation, we introduce Libra, a temporal-aware MLLM tailored for CXR report generation using temporal images. Libra integrates a radiology-specific image encoder with a MLLM and utilises a novel Temporal Alignment Connector to capture and synthesise temporal information of images across different time points with unprecedented precision. Extensive experiments show that Libra achieves new state-of-the-art performance among the same parameter scale MLLMs for RRG tasks on the MIMIC-CXR. Specifically, Libra improves the RadCliQ metric by 12.9% and makes substantial gains across all lexical metrics compared to previous models.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation (2025)
- Multimodal Large Language Models for Medical Report Generation via Customized Prompt Tuning (2025)
- CLARIFID: Improving Radiology Report Generation by Reinforcing Clinically Accurate Impressions and Enforcing Detailed Findings (2025)
- Semantically Informed Salient Regions Guided Radiology Report Generation (2025)
- R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation (2025)
- Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review (2025)
- MedRegion-CT: Region-Focused Multimodal LLM for Comprehensive 3D CT Report Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
 
					 
						
