File size: 7,702 Bytes
e586797
 
8509851
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e586797
 
c01211c
e586797
c01211c
e586797
902bf73
741d011
d67e401
e586797
d67e401
e586797
d67e401
018f60d
e586797
d67e401
902bf73
e586797
 
 
d1651dd
 
e586797
 
 
d1651dd
e586797
 
 
 
 
d1651dd
d67e401
 
 
8c7b9ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d67e401
 
 
56b0e95
d67e401
56b0e95
 
d67e401
56b0e95
 
 
 
d67e401
56b0e95
 
 
 
d67e401
56b0e95
8c7b9ad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
license: mit
language:
- en
metrics:
- f1
- accuracy
base_model:
- bilalzafar/cb-bert-mlm
pipeline_tag: text-classification
library_name: transformers
tags:
- CBDC
- Central Bank Digital Currencies
- Central Bank Digital Currency
- Sentiment Analysis
- Central Bank
- Tone
- Finance
- NLP
- Finance NLP
- BERT
- Transformers
- Digital Currency
---

# **CBDC-Sentiment: A Domain-Specific BERT for CBDC-Related Sentiment Analysis**

**CBDC-Sentiment** is a **3-class** (*negative / neutral / positive*) sentence-level BERT-based classifier built for **Central Bank Digital Currency (CBDC)** communications. It is trained to identify overall sentiment in central-bank style text such as consultations, speeches, reports, and reputable news.

**Base Model:** [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) — **CentralBank-BERT** is a domain-adapted **BERT base (uncased)**, pretrained on **66M+ tokens** across **2M+ sentences** from central-bank speeches published via the **Bank for International Settlements (1996–2024)**. It is optimized for *masked-token prediction* within the specialized domains of **monetary policy, financial regulation, and macroeconomic communication**, enabling better contextual understanding of central-bank discourse and financial narratives.

**Training data:** The dataset consists of **2,405** custom, *manually annotated* sentences related to Central Bank Digital Currencies (CBDCs), extracted from **BIS speeches**. The class distribution is **neutral**: *1,068* (44.41%), **positive**: *1,026* (42.66%), and **negative**: *311* (12.93%). The data is split **row-wise**, stratified by label, into **train**: *1,924*, **validation**: *240*, and **test**: *241* examples.

**Intended usage:** Use this model to **classify sentence-level sentiment** in **CBDC** texts (reports, consultations, speeches, research notes, reputable news). It is **domain-specific** and *not intended* for generic or informal sentiment tasks.

## Preprocessing & class imbalance
Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used *Focal Loss (γ=1.0)* with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a *WeightedRandomSampler* with √(inverse-frequency) *per-sample weights*.

## Training procedure
Training used **[`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT)** as the base, with a 3-label `AutoModelForSequenceClassification` head. Optimization was *AdamW* (HF Trainer) with *learning rate 2e-5*, *batch size 16* (train/eval), and up to *8 epochs* with early stopping (patience=2)*—best epoch \~*6*. A *warmup ratio of 0.06*, *weight decay 0.01*, and *fp16* precision were applied. Runs were seeded (*42*) and executed on *Google Colab (T4)*.

## Evaluation

On the **validation split** (\~10% of data), the model achieved **accuracy** *0.8458*, **macro-F1** *0.8270*, and **weighted-F1** *0.8453*.
On the **held-out test split** (\~10%), performance was **accuracy** *0.8216*, **macro-F1** *0.8121*, and **weighted-F1** *0.8216*.

**Per-class (test):**

| Class    | Precision | Recall | F1     | Support |
| -------- | --------- | ------ | ------ | ------- |
| negative | 0.8214    | 0.7419 | 0.7797 | 31      |
| neutral  | 0.7857    | 0.8224 | 0.8037 | 107     |
| positive | 0.8614    | 0.8447 | 0.8529 | 103     |

Note: On the **entire annotated dataset** (in-domain evaluation, no hold-out), the model reaches \~**0.95 accuracy / weighted-F1**. These should be considered upper bounds; the **test split** above is the main reference for generalization.

---

## Other CBDC Models

This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication.

| **Model**                       | **Purpose**                                                         | **Intended Use**                                                    | **Link**                                                               |
| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024).      | Base encoder for CBDC downstream tasks; fill-mask tasks.            | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| **bilalzafar/CBDC-BERT**        | Binary classifier: CBDC vs. Non-CBDC.                               | Flagging CBDC-related discourse in large corpora.                   | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT)               |
| **bilalzafar/CBDC-Stance**      | 3-class stance model (Pro, Wait-and-See, Anti).                     | Research on policy stances and discourse monitoring.                | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance)           |
| **bilalzafar/CBDC-Sentiment**   | 3-class sentiment model (Positive, Neutral, Negative).              | Tone analysis in central bank communications.                       | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment)     |
| **bilalzafar/CBDC-Type**        | Classifies Retail, Wholesale, General CBDC mentions.                | Distinguishing policy focus (retail vs wholesale).                  | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type)               |
| **bilalzafar/CBDC-Discourse**   | 3-class discourse classifier (Feature, Process, Risk-Benefit).      | Structured categorization of CBDC communications.                   | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse)     |
| **bilalzafar/CentralBank-NER**  | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER)   |


## Repository and Replication Package

All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository:

🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)**

---

## Usage

```python
from transformers import pipeline

# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Sentiment")

# Example sentences
sentences = [
"CBDCs will revolutionize payment systems and improve financial inclusion."
]

# Predict
for s in sentences:
    result = classifier(s, return_all_scores=False)[0]
    print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")

# Example output:
# [{CBDCs will revolutionize payment systems and improve financial inclusion. → positive (score=0.9789)}]
```
---

## Citation

If you use this model, please cite as:

**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)**

```bibtex
@article{zafar2025centralbankbert,
  title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
  author={Zafar, Muhammad Bilal},
  year={2025},
  journal={SSRN Electronic Journal},
  url={https://papers.ssrn.com/abstract=5404456}
}