Long-CLIP: Unlocking the Long-Text Capability of CLIP
Paper
•
2403.15378
•
Published
•
4
The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the original Long-CLIP paper for details. HunyuanVideo demo:
69 tokens, normal scene:
52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
Base model
BeichenZhang/LongCLIP-L