NeoLLM
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.4840
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0006
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.4612 | 0.0337 | 5000 | 5.3719 |
| 5.0925 | 0.0674 | 10000 | 5.0228 |
| 4.9633 | 0.1011 | 15000 | 4.8946 |
| 4.909 | 0.1347 | 20000 | 4.8268 |
| 4.844 | 0.1684 | 25000 | 4.7804 |
| 4.8204 | 0.2021 | 30000 | 4.7456 |
| 4.7826 | 0.2358 | 35000 | 4.7157 |
| 4.7616 | 0.2695 | 40000 | 4.6921 |
| 4.7328 | 0.3032 | 45000 | 4.6735 |
| 4.7271 | 0.3368 | 50000 | 4.6575 |
| 4.7147 | 0.3705 | 55000 | 4.6423 |
| 4.7072 | 0.4042 | 60000 | 4.6325 |
| 4.6978 | 0.4379 | 65000 | 4.6206 |
| 4.6824 | 0.4716 | 70000 | 4.6131 |
| 4.6754 | 0.5053 | 75000 | 4.6040 |
| 4.6769 | 0.5389 | 80000 | 4.5978 |
| 4.6631 | 0.5726 | 85000 | 4.5908 |
| 4.6596 | 0.6063 | 90000 | 4.5845 |
| 4.654 | 0.6400 | 95000 | 4.5789 |
| 4.6503 | 0.6737 | 100000 | 4.5746 |
| 4.6454 | 0.7074 | 105000 | 4.5697 |
| 4.6497 | 0.7411 | 110000 | 4.5653 |
| 4.6363 | 0.7747 | 115000 | 4.5563 |
| 4.6209 | 0.8084 | 120000 | 4.5399 |
| 4.6091 | 0.8421 | 125000 | 4.5266 |
| 4.5895 | 0.8758 | 130000 | 4.5117 |
| 4.5762 | 0.9095 | 135000 | 4.5010 |
| 4.5778 | 0.9432 | 140000 | 4.4914 |
| 4.5552 | 0.9768 | 145000 | 4.4840 |
Framework versions
- Transformers 4.57.3
- Pytorch 2.8.0+cu128
- Datasets 4.4.2
- Tokenizers 0.22.1
- Downloads last month
- 52