ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 9 days ago • 17
Bharat-NanoBEIR: Indian Language Retrieval Benchmarks Collection NanoBEIR retrieval benchmarks translated into 22 Indian languages across 13 datasets. • 22 items • Updated Dec 13, 2025 • 5
CoRNStack Collection State-of-the-art code retrieval and re-ranking models and datasets • 9 items • Updated Mar 26, 2025 • 20
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 16 items • Updated 10 days ago • 14
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 164
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation Jan 28, 2025 • 25
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Oct 27, 2025 • 75