EpiLLaMA-3.3-70B: Fine-tuned LLaMA for Epidemiological Information Extraction

Model Description

EpiLLaMA-3.3-70B is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct specialized for extracting structured epidemiological information from unstructured disease outbreak reports. The model was trained on the WHO Disease Outbreak News (DONs) curated database (Carlson et al., 2023) to automatically extract key epidemiological features including disease classification, geographical locations, case counts, temporal information, and outbreak characteristics.

Model Details

Base Model: meta-llama/Llama-3.3-70B-Instruct
Base Model License: LLaMA 3.3 Community License Agreement
Model Type: Causal Language Model (Decoder-only Transformer)
Fine-tuning Method: Parameter-Efficient Fine-Tuning (PEFT) with LoRA (Low-Rank Adaptation)
Adapter Weights License: CC0-1.0 (Public Domain Dedication) - Note: Only the LoRA adapter weights are released under CC0. The base model weights remain under the LLaMA 3.3 Community License.
Training Data: WHO Disease Outbreak News curated database (3,338 records through 2019)
Language: English
Application Domain: Public health surveillance, epidemic intelligence, epidemiological information extraction

License

Important Licensing Information

This repository contains LoRA adapter weights only, not the full model weights.

Base Model (LLaMA 3.3 70B): Licensed under the LLaMA 3.3 Community License Agreement
- Copyright © Meta Platforms, Inc. All Rights Reserved.
- Users must comply with the LLaMA 3.3 Community License to use the base model
- Acceptable Use Policy and other restrictions apply
LoRA Adapter Weights: Released under CC0 1.0 Universal (Public Domain Dedication)
- The adapter weights can be used without restriction
- However, to use these adapters, you must have access to and comply with the license of the base LLaMA 3.3 70B model

Attribution Required: When using this model, please include the following notice:

LLaMA 3.3 is licensed under the LLaMA 3.3 Community License,
Copyright © Meta Platforms, Inc. All Rights Reserved.

EpiLLaMA-3.3-70B LoRA adapter weights are released under CC0 1.0 Universal (Public Domain).

Distribution Notes

This repository distributes only the fine-tuned LoRA adapter parameters
Base model weights are unchanged and must be obtained separately from Meta/Hugging Face
Users must agree to Meta's LLaMA 3.3 Community License to use the complete model
The LoRA adapters are applied on top of the base model weights at inference time

Performance

The model achieved the following results on the evaluation set:

Metric	Score
Rouge-1	0.937 ± 0.046
Rouge-2	0.896 ± 0.058
Rouge-L	0.928 ± 0.047
Rouge-Lsum	0.929 ± 0.049

These scores represent overall performance across 5-fold stratified cross-validation, demonstrating very high accuracy in extracting structured epidemiological information from unstructured outbreak reports.

Training Summary

Best Training Step: 12,810
Best Training Loss: 0.0095
Total Training Steps: 13,325
Final Training Loss: 0.0105
Total Improvement: 1.6752 (from initial loss of 1.6847)

Intended Uses & Limitations

Intended Uses

This model is designed for:

Automated extraction of epidemiological information from disease outbreak reports
Public health surveillance systems requiring structured data from unstructured sources
Epidemic intelligence pipelines for rapid outbreak detection and monitoring
Research purposes in computational epidemiology and public health informatics

Limitations

The model is trained specifically on WHO DONs format and may require adaptation for other report formats
Performance on diseases not well-represented in the training data may vary
The model extracts information present in the text and does not generate or infer missing data
Designed for English-language outbreak reports only
Should be used as a decision-support tool, with human verification for critical public health decisions

Extracted Features

The model extracts the following structured epidemiological information:

Disease Information:

DiseaseLevel1 (primary disease classification)
DiseaseLevel2 (disease subtype/variant)

Geographical Information:

Country
ISO country code
OutbreakEpicenter (specific location within country)

Case Counts:

CasesTotal
CasesSuspected
CasesProbable
CasesConfirmed
Deaths

Temporal Information:

Outbreak start date (year, month, day)
Outbreak detection date (year, month, day)
Outbreak verification date (year, month, day)
Outbreak end date and status

Training Procedure

Training Data

The model was trained on the WHO Disease Outbreak News curated database (Carlson et al., 2023), which contains:

3,338 structured records of disease outbreaks (data through 2019)
Curated epidemiological information manually extracted from WHO DONs reports
Standardized format for disease classifications, geographical locations, case counts, and temporal data

Training Approach

The training followed an instruction-tuning paradigm where unstructured outbreak report text is paired with structured JSON output containing extracted epidemiological features. The prompt format used was:

Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request.

### Instruction:
Extract disease outbreak information from the given text and format it as JSON.
Return a list containing one JSON object per outbreak mentioned.
Use "None" for missing information. Never invent or guess data.

### Input:
[Outbreak report text]

### Response:
[Extracted JSON with epidemiological features]

Fine-tuning Configuration

LoRA (Low-Rank Adaptation) Parameters:

Rank (r): 16
Alpha (α): 16
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Dropout: 0.05
Task type: CAUSAL_LM

Training Hyperparameters:

Learning rate: 1e-5
Optimizer: AdamW (8-bit paged)
Training batch size: 4 per device (8 GPUs)
Gradient accumulation steps: 4
Number of epochs: 5
Warmup steps: Adaptive (10% of training steps, max 10)
FP16 mixed precision training
Weight decay: 0.01
LR scheduler: Linear
Seed: 41

Evaluation Strategy:

5-fold stratified cross-validation
Evaluation metric: Training loss (model selection based on lowest training loss)
Early stopping: After 6 consecutive evaluations without improvement
Logging steps: 10
Save steps: Adaptive (10% of training steps)

Hardware:

Infrastructure: JRC Big Data Analytics Platform
System: Linux cluster, Ubuntu 22.04.5 LTS
CPU: Intel Xeon Platinum 8470 (208 CPUs)
RAM: 1TB
GPUs: 8x NVIDIA H100
Training time: ~30 hours per fold

Quantization

The model uses 8-bit quantization with LoRA during training:

Load in 8-bit: True
Quantization type: Standard 8-bit
Compute dtype: bfloat16

Usage

Installation

pip install transformers==4.52.4
pip install torch==2.3.1
pip install peft==0.12.0
pip install accelerate==1.7.0
pip install bitsandbytes==0.43.3

Basic Usage

Important: You must have access to the base LLaMA 3.3 70B model and accept Meta's license terms before using these adapter weights.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model (requires LLaMA 3.3 license acceptance)
base_model_id = "meta-llama/Llama-3.3-70B-Instruct"
adapter_model_id = "jrc-ai/EpiLLaMA-3.3-70B"  # LoRA adapters
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load tokenizer from base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load and apply LoRA adapters
model = PeftModel.from_pretrained(base_model, adapter_model_id)

# Example outbreak report
outbreak_text = """
WHO has reported 3 suspected cases of yellow fever in Maryland county, 
in the south-eastern part of the country. One case with disease onset on 
1 August has been confirmed (IgM positive) by the Institut Pasteur in 
Abidjan, Côte d'Ivoire. All three cases have died.
"""

# Format prompt
prompt = f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Extract disease outbreak information from the given text and format it as JSON.
Return a list containing one JSON object per outbreak mentioned.
Always return a list of JSON objects, even for single outbreaks.
Use "None" for missing information. If no outbreak information is found, return an empty list [].
Never invent or guess data.

### Input:
{outbreak_text}

### Response:
"""

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=600,
        temperature=0.1,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode output
extracted_info = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(extracted_info)

Expected Output Format

[{
  "DiseaseLevel1": "Yellow fever",
  "DiseaseLevel2": "",
  "Country": "Liberia",
  "ISO": "LBR",
  "OutbreakEpicenter": "Maryland county",
  "CasesTotal": 3,
  "CasesSuspected": 2,
  "CasesProbable": null,
  "CasesConfirmed": 1,
  "Deaths": 3,
  "OutbreakStartYear": 2001,
  "OutbreakStartMonth": 8,
  "OutbreakStartDay": 1,
  "OutbreakDetectionYear": null,
  "OutbreakDetectionMonth": null,
  "OutbreakDetectionDay": null,
  "OutbreakVerificationYear": null,
  "OutbreakVerificationMonth": null,
  "OutbreakVerificationDay": null,
  "OutbreakEnd": null,
  "OutbreakEndYear": null,
  "OutbreakEndMonth": null,
  "OutbreakEndDay": null
}]

Comparison with Other Approaches

In-Context Learning vs Fine-Tuning

This fine-tuned model significantly outperforms in-context learning (iCL) approaches:

Approach	Rouge-1	Rouge-2	Rouge-L	Rouge-Lsum
EpiLLaMA 3.3-70B (fine-tuned)	0.937	0.896	0.928	0.929
LLaMA 3.3-70B (16-shot iCL)	0.840	0.698	0.824	0.841
Qwen 2.5-7B (16-shot iCL)	0.819	0.682	0.801	0.819

Performance gain from fine-tuning: ~10 percentage points across all ROUGE metrics.

Comparison with Smaller Models

Model	Parameters	Rouge-1	Rouge-2	Rouge-L
EpiLLaMA 3.3-70B	70B	0.937	0.896	0.928
EpiQwen 2.5-7B	7B	0.918	0.864	0.908
EpiMistral-7B	7B	0.899	0.853	0.889

All pairwise comparisons are statistically significant (p < 0.001, Nemenyi post-hoc test with Bonferroni correction).

Citation

If you use this model in your research, please cite:

@article{consoli2025generative,
  title={Generative AI for Structured Epidemiological Information Extraction: Comparing In-Context Learning and Fine-Tuning Approaches},
  author={Consoli, Sergio and Bertolini, Lorenzo and Stefanovitch, Nicolas and Spagnolo, Luigi and Espinosa, Laura and Stilianakis, Nikolaos I.},
  journal={Epidemiology and Infection},
  volume={submitted, currently under revision},
  year={2025},
  publisher={Cambridge University Press}
}

Please also acknowledge the base model:

@article{llama3.3,
  title={The Llama 3 Herd of Models},
  author={Meta AI},
  year={2024},
  url={https://ai.meta.com/research/publications/the-llama-3-herd-of-models/}
}

Ethical Considerations & Dual-Use Implications

Upon evaluation, we identified no dual-use implications for this model. The model is designed specifically for public health surveillance and epidemic intelligence applications to support global health initiatives.

Important Notes:

The model should be used as a decision-support tool with appropriate human oversight
Extracted information should be verified by public health professionals before making critical decisions
The model does not replace human expertise in epidemiological analysis
Privacy and data protection regulations should be followed when processing outbreak reports
Users must review and comply with Meta's Acceptable Use Policy included in the LLaMA 3.3 Community License

Acknowledgments

We acknowledge:

Meta Platforms, Inc. for developing and releasing LLaMA 3.3 70B under the LLaMA 3.3 Community License
The GPT@JRC initiative for providing access to LLMs
The JRC Big Data Analytics Platform for computational infrastructure
The WHO Epidemic Intelligence from Open Sources (EIOS) initiative for support
Colleagues at the European Commission Joint Research Centre (JRC) and the European Centre for Disease Prevention and Control (ECDC)

Framework Versions

Transformers: 4.52.4
PyTorch: 2.3.1
PEFT: 0.12.0
Accelerate: 1.7.0
BitsAndBytes: 0.43.3
Datasets: 2.20.0

Disclaimer: The views expressed are purely those of the authors and may not in any circumstance be regarded as stating an official position of the European Commission.

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AI4PH/EpiLLaMA-3.3-70B

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Finetuned

(222)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard