Jina-Clip-v2

The version of jina-clip-v2 has been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2

If you want to know how to convert the MobileCLIP2 model into an axmodel that can run on the axera npu board, please read this link in detail.

Support Platform

AX650

End-of-board inference time

Stage	Time
image encoder	592.231 ms
text encoder	15.482 ms

PS: The image resolution input by the image encoder is 512 x 512.

How to use

Download all files from this repository to the device

Run the following command:

python3 run_axmodel.py -i beach1.jpg -t "beautiful sunset over the beach" -iax ./image_encoder.axmodel -tax ./text_encoder.axmodel --hf_path ./jina-clip-v2

Model input and output examples are as follows:

the image you want to input:
The description of the image content:

"beautiful sunset over the beach" or "蓝蓝的天空和海面，在夕阳的照射下，显得非常美丽" or "一群人在沙滩上散步"
The similarity between the output of the image encoder and the text encoder is：

0.3373627 or 0.34764117 or 0.12660183

Downloads last month: 9

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/jina-clip-v2

Base model

jinaai/xlm-roberta-flash-implementation

Quantized

jinaai/jina-clip-v2

Finetuned

(4)

this model