Instructions to use miguelcarv/resnet-50-text-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use miguelcarv/resnet-50-text-detector with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="miguelcarv/resnet-50-text-detector") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("miguelcarv/resnet-50-text-detector") model = AutoModelForImageClassification.from_pretrained("miguelcarv/resnet-50-text-detector") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Model Card for ResNet-50 Text Detector
This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text.
Model Details
How to Get Started with the Model
from PIL import Image
import requests
from transformers import AutoImageProcessor, AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained(
"miguelcarv/resnet-50-text-detector",
)
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)
url = "http://images.cocodataset.org/train2017/000000044520.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((256,256))
inputs = processor(image, return_tensors="pt").pixel_values
outputs = model(inputs)
logits_per_image = outputs.logits
probs = logits_per_image.softmax(dim=1)
print(probs)
# tensor([[0.1149, 0.8851]])
Training Details
- Trained for three epochs
- Resolution: 256x256
- Learning rate: 5e-5
- Optimizer: AdamW
- Batch size: 64
- Trained with FP32
- Downloads last month
- 9