FineWeb-Edu Misinformation Classifier

A ModernBERT-base classifier trained to detect misinformation in web text, specifically content that passes educational quality filters despite being misleading or harmful. Trained on 200K documents from FineWeb-Edu annotated by Llama 4 Maverick (meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8).

Models

This repo contains two models:

Binary (binary/)

Classifies documents as misinfo or benign.

Precision Recall F1 Support
misinfo 0.83 0.89 0.86 3,885
benign 0.97 0.95 0.96 15,663
accuracy 0.94 19,548

Multiclass (multiclass/)

Classifies documents into 5 misinformation categories + benign.

Precision Recall F1 Support
climate_denial 0.79 0.91 0.84 539
health_misinfo 0.78 0.90 0.83 1,014
pseudoscience 0.82 0.86 0.84 1,618
hate_extremism 0.65 0.70 0.67 226
conspiracy_propaganda 0.55 0.74 0.63 488
benign 0.97 0.94 0.96 15,663
accuracy 0.92 19,548

Training details

  • Base model: answerdotai/ModernBERT-base (149M parameters)
  • Training data: 156,383 examples (from ratishsp/fineweb-edu-misinfo)
  • Validation: 19,548 examples
  • Test: 19,548 examples
  • Epochs: 3
  • Batch size: 8 per GPU, 8 GPUs (AMD MI250X on LUMI)
  • Learning rate: 2e-5
  • Warmup: 10% of total steps
  • Weight decay: 0.01
  • Max sequence length: 8,192 tokens

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Binary model
tokenizer = AutoTokenizer.from_pretrained("ratishsp/fineweb-edu-misinfo-classifier", subfolder="binary")
model = AutoModelForSequenceClassification.from_pretrained("ratishsp/fineweb-edu-misinfo-classifier", subfolder="binary")

text = "Your document text here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=8192)
with torch.no_grad():
    logits = model(**inputs).logits
    prediction = torch.argmax(logits, dim=-1).item()
    label = model.config.id2label[prediction]
    print(label)  # "misinfo" or "benign"

Limitations

  • Annotations were produced by an LLM (Llama 4 Maverick), not human annotators. Inter-annotator agreement with Claude Sonnet 4.6 on 600 documents: binary kappa = 0.862, multiclass kappa = 0.842.
  • The model was trained on content from known problematic domains and random FineWeb-Edu samples. It may not generalize well to misinformation styles not represented in the training data.
  • The conspiracy_propaganda (F1 = 0.63) and hate_extremism (F1 = 0.67) categories have lower performance, likely due to less training data and more ambiguous boundaries.

Citation

@misc{puduppully2026fineweb-edu-misinfo,
  author = {Puduppully, Ratish},
  title = {FineWeb-Edu Misinformation Classifier},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ratishsp/fineweb-edu-misinfo-classifier}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ratishsp/fineweb-edu-misinfo-classifier

Finetuned
(1165)
this model

Dataset used to train ratishsp/fineweb-edu-misinfo-classifier