Improve model card for Vision-SR1

by nielsr HF Staff - opened Aug 31, 2025

←

nielsr

Aug 31, 2025

This PR significantly improves the model card for Vision-SR1 by:

Adding pipeline_tag: image-text-to-text to the metadata, which enhances discoverability at https://huggingface.co/models?pipeline_tag=image-text-to-text.
Including library_name: transformers in the metadata, based on explicit evidence in config.json and the GitHub README, enabling the automated "Use in Transformers" widget.
Incorporating the full paper abstract to provide a comprehensive overview of the model's methodology and capabilities.
Linking to the official Hugging Face paper page: Self-Rewarding Vision-Language Model via Reasoning Decomposition.
Providing a direct link to the official GitHub repository for code and further details: https://github.com/zli12321/Vision-SR1.
Populating the main content with a detailed description adapted from the GitHub README, including information about the model, datasets, and related Hugging Face artifacts.
Including the recommended citation for the codebase.

Please review and merge this PR if these changes align with your expectations.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment