thomasrenault/topic

A multi-label political topic classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on distilbert-base-uncased with GPT-4o-mini annotation via the OpenAI Batch API.

Labels

The model predicts 7 independent topic indicators (sigmoid, threshold 0.5).
A document can belong to zero or multiple topics simultaneously.

Label	Description
`abortion`	Abortion rights and reproductive policy
`democracy`	Elections, voting rights, democratic institutions
`gender equality`	Gender rights, feminism, LGBTQ+ issues
`gun control`	Firearms regulation, Second Amendment
`immigration`	Immigration policy, border control, citizenship
`tax and inequality`	Tax policy, economic inequality, redistribution
`trade`	Trade policy, tariffs, international commerce

Documents that match none of the above are implicitly classified as other topic.

Training

Setting	Value
Base model	`distilbert-base-uncased`
Architecture	`DistilBertForSequenceClassification` (multi-label)
Problem type	`multi_label_classification`
Training data	~200,000 labeled documents
Annotation	GPT-4o-mini (temperature=0) via OpenAI Batch API
Epochs	4
Learning rate	2e-5
Batch size	16
Max length	512 tokens
Classification threshold	0.5
Domain	US tweets about policy, campaign speeches and congressional floor speeches

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "thomasrenault/topic"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

TOPICS    = ["abortion", "democracy", "gender equality", "gun control",
             "immigration", "tax and inequality", "trade"]
THRESHOLD = 0.5

def predict(text):
    enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
    matched = [t for t, p in zip(TOPICS, probs) if p >= THRESHOLD]
    return matched or ["other topic"]

print(predict("We need stronger border security and immigration reform."))
# ["immigration"]

print(predict("Tax cuts for the wealthy only increase inequality in America."))
# ["tax and inequality"]

Intended Use

Academic research on political agenda-setting and issue salience
Topic trend analysis across congressional speeches and social media
Cross-platform comparison of elite vs. citizen political communication

Limitations

Trained on US English political text — may not generalise to other political systems or languages
Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope

Citation

If you use this model, please cite:

@misc{renault2025topic,
  author    = {Renault, Thomas},
  title     = {thomasrenault/topic: Multi-label political topic classifier for US political text},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/thomasrenault/topic}
}

Downloads last month: 56

Safetensors

Model size

67M params

Tensor type

F32

Model tree for thomasrenault/topic

Base model

distilbert/distilbert-base-uncased

Finetuned

(11268)

this model