Instructions to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="marksverdhei/LCO-Embedding-Omni-3B-GGUF", filename="LCO-Embedding-Omni-3B-BF16.gguf", )
llm.create_chat_completion( messages = "\"Today is a sunny day and I will get some ice cream.\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with Ollama:
ollama run hf.co/marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
- Unsloth Studio new
How to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for marksverdhei/LCO-Embedding-Omni-3B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for marksverdhei/LCO-Embedding-Omni-3B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for marksverdhei/LCO-Embedding-Omni-3B-GGUF to start chatting
- Docker Model Runner
How to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with Docker Model Runner:
docker model run hf.co/marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
- Lemonade
How to use marksverdhei/LCO-Embedding-Omni-3B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull marksverdhei/LCO-Embedding-Omni-3B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LCO-Embedding-Omni-3B-GGUF-Q4_K_M
List all available models
lemonade list
LCO-Embedding-Omni-3B-GGUF
GGUF quantizations of LCO-Embedding/LCO-Embedding-Omni-3B for use with llama.cpp.
Converted using ht-llama.cpp, a fork with added support for the Qwen2_5OmniThinkerForConditionalGeneration architecture.
About the model
LCO-Embedding-Omni-3B is a multimodal embedding model based on the Thinker component of Qwen 2.5 Omni, fine-tuned with LoRA and contrastive learning to produce 2048-dimensional embeddings from text, images, audio, and video. Uses last-token pooling.
See Scaling Language-Centric Omnimodal Representation Learning (NeurIPS 2025) for details.
Available files
Standard quantizations
| File | Quant | Size | Description |
|---|---|---|---|
LCO-Embedding-Omni-3B-BF16.gguf |
BF16 | -- | Full precision, no quality loss |
LCO-Embedding-Omni-3B-Q8_0.gguf |
Q8_0 | -- | Near-lossless quantization |
LCO-Embedding-Omni-3B-Q4_K_M.gguf |
Q4_K_M | -- | Good balance of quality and size |
LCO-Embedding-Omni-3B-Q3_K_M.gguf |
Q3_K_M | -- | Smaller, some quality loss |
LCO-Embedding-Omni-3B-Q2_K.gguf |
Q2_K | -- | Smallest, more quality loss |
Importance matrix (imatrix) quantizations
Quantized with an importance matrix computed from WikiText-2 calibration data for improved quality at low bit widths.
| File | Quant | Size | Description |
|---|---|---|---|
LCO-Embedding-Omni-3B-IQ4_XS.gguf |
IQ4_XS | -- | 4.25 bpw, imatrix-optimized |
LCO-Embedding-Omni-3B-IQ3_M.gguf |
IQ3_M | -- | 3.66 bpw, imatrix-optimized |
LCO-Embedding-Omni-3B-IQ3_XS.gguf |
IQ3_XS | -- | 3.3 bpw, imatrix-optimized |
LCO-Embedding-Omni-3B-IQ2_M.gguf |
IQ2_M | -- | 2.7 bpw, imatrix-optimized |
Multimodal projection
| File | Quant | Size | Description |
|---|---|---|---|
mmproj-LCO-Embedding-Omni-3b-F16.gguf |
F16 | -- | Vision + audio projection (required for multimodal) |
For text-only embedding, you only need one of the text model GGUFs. For multimodal (image/audio/video), you also need the mmproj file.
Quantization quality
Measured on 8 diverse text sentences (2048-dim embeddings). BF16 is the reference.
Embedding quality vs BF16
Results will be added after quantization testing.
pgvector retrieval quality (query with quant, corpus in BF16)
Results will be added after quantization testing.
Usage
Build llama.cpp
git clone https://github.com/heiervang-technologies/ht-llama.cpp
cd ht-llama.cpp
cmake -B build
cmake --build build --target llama-embedding llama-server -j$(nproc)
Text embeddings (CLI)
./build/bin/llama-embedding \
-m LCO-Embedding-Omni-3B-Q8_0.gguf \
--pooling last \
-p "Your text here"
Text embeddings (server)
./build/bin/llama-server \
-m LCO-Embedding-Omni-3B-Q8_0.gguf \
--embedding --pooling last
curl -s http://localhost:8080/embeddings \
-d '{"content": "Your text here"}'
Multimodal embeddings (vision + audio)
Requires the mmproj file:
./build/bin/llama-server \
-m LCO-Embedding-Omni-3B-Q8_0.gguf \
--mmproj mmproj-LCO-Embedding-Omni-3b-F16.gguf \
--embedding --pooling last
# Image embedding (base64-encoded image)
curl -s http://localhost:8080/embeddings \
-d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-image-data>"]}]}'
# Audio embedding (base64-encoded WAV)
curl -s http://localhost:8080/embeddings \
-d '{"content": [{"prompt_string": "<__media__>", "multimodal_data": ["<base64-audio-data>"]}]}'
JSON output (for programmatic use)
./build/bin/llama-embedding \
-m LCO-Embedding-Omni-3B-Q8_0.gguf \
--pooling last \
--embd-output-format json \
-p "Your text here"
Notes
- This is a quantization of LCO-Embedding/LCO-Embedding-Omni-3B -- see the original model card for benchmarks, training details, and licensing
- The
--pooling lastflag is required -- this model uses last-token pooling, not mean pooling - Embedding dimensions: 2048
- Contributions and bug reports welcome at ht-llama.cpp
Citations
LCO-Embedding
@article{xiao2025scaling,
title={Scaling Language-Centric Omnimodal Representation Learning},
author={Xiao, Chenghao and Chan, Hou Pong and Zhang, Hao and Xu, Weiwen and Aljunied, Mahani and Rong, Yu},
journal={arXiv preprint arXiv:2510.11693},
year={2025}
}
Qwen 2.5 Omni
@article{Qwen2.5-Omni,
title={Qwen2.5-Omni Technical Report},
author={Jin Xu and Zhifang Guo and Jinzheng He and Hangrui Hu and Ting He and Shuai Bai and Keqin Chen and Jialin Wang and Yang Fan and Kai Dang and Bin Zhang and Xiong Wang and Yunfei Chu and Junyang Lin},
journal={arXiv preprint arXiv:2503.20215},
year={2025}
}
- Downloads last month
- 957
2-bit
3-bit
4-bit
8-bit
16-bit
Model tree for marksverdhei/LCO-Embedding-Omni-3B-GGUF
Base model
LCO-Embedding/LCO-Embedding-Omni-3BCollection including marksverdhei/LCO-Embedding-Omni-3B-GGUF
Papers for marksverdhei/LCO-Embedding-Omni-3B-GGUF
Scaling Language-Centric Omnimodal Representation Learning
Qwen2.5-Omni Technical Report
Evaluation results
- Embedding Dimensions on MIEB-Liteself-reported2048.000