thomasrenault/topic

A multi-label political topic classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on distilbert-base-uncased with GPT-4o-mini annotation via the OpenAI Batch API.

Labels

The model predicts 7 independent topic indicators (sigmoid, threshold 0.5).
A document can belong to zero or multiple topics simultaneously.

Label Description
abortion Abortion rights and reproductive policy
democracy Elections, voting rights, democratic institutions
gender equality Gender rights, feminism, LGBTQ+ issues
gun control Firearms regulation, Second Amendment
immigration Immigration policy, border control, citizenship
tax and inequality Tax policy, economic inequality, redistribution
trade Trade policy, tariffs, international commerce

Documents that match none of the above are implicitly classified as other topic.

Training

Setting Value
Base model distilbert-base-uncased
Architecture DistilBertForSequenceClassification (multi-label)
Problem type multi_label_classification
Training data ~200,000 labeled documents
Annotation GPT-4o-mini (temperature=0) via OpenAI Batch API
Epochs 4
Learning rate 2e-5
Batch size 16
Max length 512 tokens
Classification threshold 0.5
Domain US tweets about policy, campaign speeches and congressional floor speeches

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "thomasrenault/topic"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

TOPICS    = ["abortion", "democracy", "gender equality", "gun control",
             "immigration", "tax and inequality", "trade"]
THRESHOLD = 0.5

def predict(text):
    enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
    matched = [t for t, p in zip(TOPICS, probs) if p >= THRESHOLD]
    return matched or ["other topic"]

print(predict("We need stronger border security and immigration reform."))
# ["immigration"]

print(predict("Tax cuts for the wealthy only increase inequality in America."))
# ["tax and inequality"]

Intended Use

  • Academic research on political agenda-setting and issue salience
  • Topic trend analysis across congressional speeches and social media
  • Cross-platform comparison of elite vs. citizen political communication

Limitations

  • Trained on US English political text — may not generalise to other political systems or languages
  • Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
  • Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope

Citation

If you use this model, please cite:

@misc{renault2025topic,
  author    = {Renault, Thomas},
  title     = {thomasrenault/topic: Multi-label political topic classifier for US political text},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/thomasrenault/topic}
}
Downloads last month
56
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thomasrenault/topic

Finetuned
(11268)
this model