File size: 7,702 Bytes
e586797 8509851 e586797 c01211c e586797 c01211c e586797 902bf73 741d011 d67e401 e586797 d67e401 e586797 d67e401 018f60d e586797 d67e401 902bf73 e586797 d1651dd e586797 d1651dd e586797 d1651dd d67e401 8c7b9ad d67e401 56b0e95 d67e401 56b0e95 d67e401 56b0e95 d67e401 56b0e95 d67e401 56b0e95 8c7b9ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
license: mit
language:
- en
metrics:
- f1
- accuracy
base_model:
- bilalzafar/cb-bert-mlm
pipeline_tag: text-classification
library_name: transformers
tags:
- CBDC
- Central Bank Digital Currencies
- Central Bank Digital Currency
- Sentiment Analysis
- Central Bank
- Tone
- Finance
- NLP
- Finance NLP
- BERT
- Transformers
- Digital Currency
---
# **CBDC-Sentiment: A Domain-Specific BERT for CBDC-Related Sentiment Analysis**
**CBDC-Sentiment** is a **3-class** (*negative / neutral / positive*) sentence-level BERT-based classifier built for **Central Bank Digital Currency (CBDC)** communications. It is trained to identify overall sentiment in central-bank style text such as consultations, speeches, reports, and reputable news.
**Base Model:** [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) — **CentralBank-BERT** is a domain-adapted **BERT base (uncased)**, pretrained on **66M+ tokens** across **2M+ sentences** from central-bank speeches published via the **Bank for International Settlements (1996–2024)**. It is optimized for *masked-token prediction* within the specialized domains of **monetary policy, financial regulation, and macroeconomic communication**, enabling better contextual understanding of central-bank discourse and financial narratives.
**Training data:** The dataset consists of **2,405** custom, *manually annotated* sentences related to Central Bank Digital Currencies (CBDCs), extracted from **BIS speeches**. The class distribution is **neutral**: *1,068* (44.41%), **positive**: *1,026* (42.66%), and **negative**: *311* (12.93%). The data is split **row-wise**, stratified by label, into **train**: *1,924*, **validation**: *240*, and **test**: *241* examples.
**Intended usage:** Use this model to **classify sentence-level sentiment** in **CBDC** texts (reports, consultations, speeches, research notes, reputable news). It is **domain-specific** and *not intended* for generic or informal sentiment tasks.
## Preprocessing & class imbalance
Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used *Focal Loss (γ=1.0)* with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a *WeightedRandomSampler* with √(inverse-frequency) *per-sample weights*.
## Training procedure
Training used **[`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT)** as the base, with a 3-label `AutoModelForSequenceClassification` head. Optimization was *AdamW* (HF Trainer) with *learning rate 2e-5*, *batch size 16* (train/eval), and up to *8 epochs* with early stopping (patience=2)*—best epoch \~*6*. A *warmup ratio of 0.06*, *weight decay 0.01*, and *fp16* precision were applied. Runs were seeded (*42*) and executed on *Google Colab (T4)*.
## Evaluation
On the **validation split** (\~10% of data), the model achieved **accuracy** *0.8458*, **macro-F1** *0.8270*, and **weighted-F1** *0.8453*.
On the **held-out test split** (\~10%), performance was **accuracy** *0.8216*, **macro-F1** *0.8121*, and **weighted-F1** *0.8216*.
**Per-class (test):**
| Class | Precision | Recall | F1 | Support |
| -------- | --------- | ------ | ------ | ------- |
| negative | 0.8214 | 0.7419 | 0.7797 | 31 |
| neutral | 0.7857 | 0.8224 | 0.8037 | 107 |
| positive | 0.8614 | 0.8447 | 0.8529 | 103 |
Note: On the **entire annotated dataset** (in-domain evaluation, no hold-out), the model reaches \~**0.95 accuracy / weighted-F1**. These should be considered upper bounds; the **test split** above is the main reference for generalization.
---
## Other CBDC Models
This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication.
| **Model** | **Purpose** | **Intended Use** | **Link** |
| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024). | Base encoder for CBDC downstream tasks; fill-mask tasks. | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| **bilalzafar/CBDC-BERT** | Binary classifier: CBDC vs. Non-CBDC. | Flagging CBDC-related discourse in large corpora. | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT) |
| **bilalzafar/CBDC-Stance** | 3-class stance model (Pro, Wait-and-See, Anti). | Research on policy stances and discourse monitoring. | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance) |
| **bilalzafar/CBDC-Sentiment** | 3-class sentiment model (Positive, Neutral, Negative). | Tone analysis in central bank communications. | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment) |
| **bilalzafar/CBDC-Type** | Classifies Retail, Wholesale, General CBDC mentions. | Distinguishing policy focus (retail vs wholesale). | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type) |
| **bilalzafar/CBDC-Discourse** | 3-class discourse classifier (Feature, Process, Risk-Benefit). | Structured categorization of CBDC communications. | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse) |
| **bilalzafar/CentralBank-NER** | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER) |
## Repository and Replication Package
All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository:
🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)**
---
## Usage
```python
from transformers import pipeline
# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Sentiment")
# Example sentences
sentences = [
"CBDCs will revolutionize payment systems and improve financial inclusion."
]
# Predict
for s in sentences:
result = classifier(s, return_all_scores=False)[0]
print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")
# Example output:
# [{CBDCs will revolutionize payment systems and improve financial inclusion. → positive (score=0.9789)}]
```
---
## Citation
If you use this model, please cite as:
**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)**
```bibtex
@article{zafar2025centralbankbert,
title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
author={Zafar, Muhammad Bilal},
year={2025},
journal={SSRN Electronic Journal},
url={https://papers.ssrn.com/abstract=5404456}
} |