rvo commited on
Commit
33a8c0c
·
verified ·
1 Parent(s): b8e68a7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -7
README.md CHANGED
@@ -1,7 +1,6 @@
1
  ---
2
  license: apache-2.0
3
- base_model:
4
- - microsoft/MiniLM-L6-v2
5
  tags:
6
  - transformers
7
  - sentence-transformers
@@ -24,7 +23,7 @@ language:
24
 
25
  `mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval part of RAGs.
26
 
27
- Enabling even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
28
 
29
  If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
30
 
@@ -37,9 +36,9 @@ A technical report detailing our proposed `LEAF` training procedure is [availabl
37
 
38
  # Highlights
39
 
40
- * **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves new state-of-the-art results for compact embedding models, ranking <span style="color:red">#TBD</span> on the public BEIR benchmark leaderboard for models <30M parameters with an average nDCG@10 score of <span style="color:red">[TBD HERE]</span>.
41
  * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
42
- * **MRL and quantization support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and/or are stored using more efficient types like `int8` and `binary`. [See below](#mrl) for more information.
43
 
44
  # Quickstart
45
 
@@ -103,9 +102,9 @@ document_embeddings = doc_model.encode(documents)
103
  # Compute similarities
104
  scores = query_model.similarity(query_embeddings, document_embeddings)
105
  ```
106
- Retrieval results from asymmetric mode are usually superior to the [standard mode above](#sentence-transformers).
107
 
108
- ## MRL
109
 
110
  Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
111
  ```python
 
1
  ---
2
  license: apache-2.0
3
+ base_model: microsoft/MiniLM-L6-v2
 
4
  tags:
5
  - transformers
6
  - sentence-transformers
 
23
 
24
  `mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieval part of RAGs.
25
 
26
+ To enable even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
27
 
28
  If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
29
 
 
36
 
37
  # Highlights
38
 
39
+ * **State-of-the-Art Performance**: `mdbr-leaf-ir` achieves new state-of-the-art results for compact embedding models, ranking <span style="color:red">#TBD</span> on the public [BEIR benchmark leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for models <100M parameters with an average nDCG@10 score of <span style="color:red">[TBD HERE]</span>.
40
  * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
41
+ * **MRL and Quantization Support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and/or can be stored using more efficient types like `int8` and `binary`. [See below](#mrl) for more information.
42
 
43
  # Quickstart
44
 
 
102
  # Compute similarities
103
  scores = query_model.similarity(query_embeddings, document_embeddings)
104
  ```
105
+ Retrieval results in asymmetric mode are often superior to the [standard mode above](#sentence-transformers).
106
 
107
+ ## MRL Truncation
108
 
109
  Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
110
  ```python