Update README.md
Browse files
README.md
CHANGED
|
@@ -22,12 +22,17 @@ Primary Users: Financial analysts, NLP researchers, and developers working on fi
|
|
| 22 |
Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows:
|
| 23 |
|
| 24 |
Training Set: 10,918,272 tokens
|
|
|
|
| 25 |
Validation Set: 1,213,184 tokens
|
|
|
|
| 26 |
Test Set: 1,347,968 tokens
|
| 27 |
|
| 28 |
Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including:
|
|
|
|
| 29 |
Corporate Reports (10-K & 10-Q): 2.5 billion tokens
|
|
|
|
| 30 |
Earnings Call Transcripts: 1.3 billion tokens
|
|
|
|
| 31 |
Analyst Reports: 1.1 billion tokens
|
| 32 |
|
| 33 |
## Evaluation
|
|
|
|
| 22 |
Training Dataset: The model was fine-tuned on a custom dataset of financial communication texts. The dataset was split into training, validation, and test sets as follows:
|
| 23 |
|
| 24 |
Training Set: 10,918,272 tokens
|
| 25 |
+
|
| 26 |
Validation Set: 1,213,184 tokens
|
| 27 |
+
|
| 28 |
Test Set: 1,347,968 tokens
|
| 29 |
|
| 30 |
Pre-training Dataset: FinBERT was pre-trained on a large financial corpus totaling 4.9 billion tokens, including:
|
| 31 |
+
|
| 32 |
Corporate Reports (10-K & 10-Q): 2.5 billion tokens
|
| 33 |
+
|
| 34 |
Earnings Call Transcripts: 1.3 billion tokens
|
| 35 |
+
|
| 36 |
Analyst Reports: 1.1 billion tokens
|
| 37 |
|
| 38 |
## Evaluation
|