RAG_general/rerank/models/BAAI-bge-m3-ft
This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-m3
- Maximum Sequence Length: 8192 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
- Language: en
- License: apache-2.0
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("rjnClarke/BAAI-bge-m3-fine-tuned")
sentences = [
'What is the significance of the tennis balls in the excerpt from the play?',
"Says that you savour too much of your youth,\n And bids you be advis'd there's nought in France That can be with a nimble galliard won; You cannot revel into dukedoms there. He therefore sends you, meeter for your spirit, This tun of treasure; and, in lieu of this, Desires you let the dukedoms that you claim Hear no more of you. This the Dauphin speaks. KING HENRY. What treasure, uncle? EXETER. Tennis-balls, my liege. KING HENRY. We are glad the Dauphin is so pleasant with us; His present and your pains we thank you for. When we have match'd our rackets to these balls, We will in France, by God's grace, play a set Shall strike his father's crown into the hazard. Tell him he hath made a match with such a wrangler That all the courts of France will be disturb'd With chaces. And we understand him well, How he comes o'er us with our wilder days, Not measuring what use we made of them. We never valu'd this poor seat of England; And therefore, living hence, did give ourself To barbarous licence; as 'tis ever common That men are merriest when they are from home. But tell the Dauphin I will keep my state, Be like a king, and show my sail of greatness, When I do rouse me in my throne of France; For that I have laid by my majesty And plodded like a man for working-days; But I will rise there with so full a glory That I will dazzle all the eyes of France, Yea, strike the Dauphin blind to look on us. And tell the pleasant Prince this mock of his Hath turn'd his balls to gun-stones, and his soul Shall stand sore charged for the wasteful vengeance\n That shall fly with them; for many a thousand widows\n",
"YORK. From Ireland thus comes York to claim his right\n And pluck the crown from feeble Henry's head: Ring bells aloud, burn bonfires clear and bright, To entertain great England's lawful king. Ah, sancta majestas! who would not buy thee dear? Let them obey that knows not how to rule; This hand was made to handle nought but gold. I cannot give due action to my words Except a sword or sceptre balance it.\n A sceptre shall it have, have I a soul\n On which I'll toss the flower-de-luce of France.\n Enter BUCKINGHAM [Aside] Whom have we here? Buckingham, to disturb me?\n The King hath sent him, sure: I must dissemble. BUCKINGHAM. York, if thou meanest well I greet thee well. YORK. Humphrey of Buckingham, I accept thy greeting. Art thou a messenger, or come of pleasure? BUCKINGHAM. A messenger from Henry, our dread liege, To know the reason of these arms in peace; Or why thou, being a subject as I am, Against thy oath and true allegiance sworn, Should raise so great a power without his leave, Or dare to bring thy force so near the court. YORK. [Aside] Scarce can I speak, my choler is so great. O, I could hew up rocks and fight with flint, I am so angry at these abject terms; And now, like Ajax Telamonius, On sheep or oxen could I spend my fury. I am far better born than is the King, More like a king, more kingly in my thoughts; But I must make fair weather yet awhile, Till Henry be more weak and I more strong.- Buckingham, I prithee, pardon me That I have given no answer all this while; My mind was troubled with deep melancholy. The cause why I have brought this army hither Is to remove proud Somerset from the King, Seditious to his Grace and to the state. BUCKINGHAM. That is too much presumption on thy part; But if thy arms be to no other end, The King hath yielded unto thy demand:\n The Duke of Somerset is in the Tower.\n",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
| Metric |
Value |
| cosine_accuracy@3 |
0.5356 |
| cosine_precision@1 |
0.4209 |
| cosine_precision@3 |
0.1785 |
| cosine_precision@5 |
0.1142 |
| cosine_precision@10 |
0.0619 |
| cosine_recall@1 |
0.4209 |
| cosine_recall@3 |
0.5356 |
| cosine_recall@5 |
0.5708 |
| cosine_recall@10 |
0.6186 |
| cosine_ndcg@10 |
0.5184 |
| cosine_mrr@200 |
0.4916 |
| cosine_map@100 |
0.4914 |
| dot_accuracy@3 |
0.5356 |
| dot_precision@1 |
0.4209 |
| dot_precision@3 |
0.1785 |
| dot_precision@5 |
0.1142 |
| dot_precision@10 |
0.0619 |
| dot_recall@1 |
0.4209 |
| dot_recall@3 |
0.5356 |
| dot_recall@5 |
0.5708 |
| dot_recall@10 |
0.6186 |
| dot_ndcg@10 |
0.5184 |
| dot_mrr@200 |
0.4916 |
| dot_map@100 |
0.4914 |
Training Details
Training Dataset
Unnamed Dataset
Evaluation Dataset
Unnamed Dataset
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: epoch
gradient_accumulation_steps: 2
learning_rate: 1e-05
weight_decay: 5e-05
warmup_steps: 50
fp16: True
half_precision_backend: True
load_best_model_at_end: True
fp16_backend: True
batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 2
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 5e-05
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 50
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: True
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: True
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
Training Logs
| Epoch |
Step |
Training Loss |
loss |
m3-dev_cosine_map@100 |
| 0.7722 |
500 |
1.1966 |
- |
- |
| 1.0008 |
648 |
- |
0.8832 |
0.4814 |
| 1.5436 |
1000 |
0.8492 |
- |
- |
| 2.0008 |
1296 |
- |
0.8582 |
0.4855 |
| 2.3151 |
1500 |
0.6805 |
- |
- |
| 2.9961 |
1941 |
- |
0.8607 |
0.4914 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.43.4
- PyTorch: 2.3.1+cu121
- Accelerate: 0.32.1
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}