thomasrenault/topic
A multi-label political topic classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on distilbert-base-uncased with GPT-4o-mini annotation via the OpenAI Batch API.
Labels
The model predicts 7 independent topic indicators (sigmoid, threshold 0.5).
A document can belong to zero or multiple topics simultaneously.
| Label | Description |
|---|---|
abortion |
Abortion rights and reproductive policy |
democracy |
Elections, voting rights, democratic institutions |
gender equality |
Gender rights, feminism, LGBTQ+ issues |
gun control |
Firearms regulation, Second Amendment |
immigration |
Immigration policy, border control, citizenship |
tax and inequality |
Tax policy, economic inequality, redistribution |
trade |
Trade policy, tariffs, international commerce |
Documents that match none of the above are implicitly classified as other topic.
Training
| Setting | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Architecture | DistilBertForSequenceClassification (multi-label) |
| Problem type | multi_label_classification |
| Training data | ~200,000 labeled documents |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
| Epochs | 4 |
| Learning rate | 2e-5 |
| Batch size | 16 |
| Max length | 512 tokens |
| Classification threshold | 0.5 |
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "thomasrenault/topic"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()
TOPICS = ["abortion", "democracy", "gender equality", "gun control",
"immigration", "tax and inequality", "trade"]
THRESHOLD = 0.5
def predict(text):
enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
matched = [t for t, p in zip(TOPICS, probs) if p >= THRESHOLD]
return matched or ["other topic"]
print(predict("We need stronger border security and immigration reform."))
# ["immigration"]
print(predict("Tax cuts for the wealthy only increase inequality in America."))
# ["tax and inequality"]
Intended Use
- Academic research on political agenda-setting and issue salience
- Topic trend analysis across congressional speeches and social media
- Cross-platform comparison of elite vs. citizen political communication
Limitations
- Trained on US English political text — may not generalise to other political systems or languages
- Annotation by GPT-4o-mini introduces model-specific biases in topic boundaries
- Topics reflect the specific research agenda of the parent project; other salient topics (healthcare, climate, etc.) are out of scope
Citation
If you use this model, please cite:
@misc{renault2025topic,
author = {Renault, Thomas},
title = {thomasrenault/topic: Multi-label political topic classifier for US political text},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/thomasrenault/topic}
}
- Downloads last month
- 56
Model tree for thomasrenault/topic
Base model
distilbert/distilbert-base-uncased