Jina-Clip-v2

The version of jina-clip-v2 has been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2

If you want to know how to convert the MobileCLIP2 model into an axmodel that can run on the axera npu board, please read this link in detail.

Support Platform

  • AX650

End-of-board inference time

Stage Time
image encoder 592.231 ms
text encoder 15.482 ms

PS: The image resolution input by the image encoder is 512 x 512.

How to use

Download all files from this repository to the device

Run the following command:

python3 run_axmodel.py -i beach1.jpg -t "beautiful sunset over the beach" -iax ./image_encoder.axmodel -tax ./text_encoder.axmodel --hf_path ./jina-clip-v2

Model input and output examples are as follows:

  1. the image you want to input:

  2. The description of the image content:

    "beautiful sunset over the beach" or "蓝蓝的天空和海面,在夕阳的照射下,显得非常美丽" or "一群人在沙滩上散步"

  3. The similarity between the output of the image encoder and the text encoder is:

    0.3373627 or 0.34764117 or 0.12660183

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AXERA-TECH/jina-clip-v2

Finetuned
(4)
this model