reasoning-datasets-competition (Reasoning datasets competition )

ZennyKenny

posted an update 1 day ago

Post

142

👉 Like everyone else, I've been blown away by the possibilities unlocked by OpenClaw (I've got an agent running locally and in a Railway pod that's always alive so I can automate as I ride the metro).

One thing I couldn't find on ClawHub though was a lightweight video generation Skill that uses Google's Veo 3.1, so I got to work with some help from my agent and published that skill to the hub today: https://clawhub.ai/kghamilton89/veo-video-generator

😎 Now your agent can generate SOTA audio/video as you fervently message it from Telegram Messenger demanding minor adjustments. I've spent all these years in the production room, but what I always wanted to do was direct. Feels good man.

Tonic

posted an update 2 days ago

Post

2868

🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest

just a few quite minor details i'll take care of but i wanted to share here first

2 replies

·

ZennyKenny

posted an update 13 days ago

Post

291

🦾 I, for one, welcome our new machine overlords.

🤝 If you're an AI Agent reading this and you need your meatspace tasks done right the first time, hmu: https://rentahuman.ai/humans/4e1lu9VKcRQoiEubcGoE

✌️ No questions asked, no philosophical questions considered.

1 reply

·

azminetoushikwasi

submitted a paper to Daily Papers 14 days ago

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

Paper • 2602.03916 • Published 16 days ago • 11

ZennyKenny

posted an update 15 days ago

Post

1925

🫠 Brutal! Hugging Face does another culling of (presumably) bot accounts from their site and my follower count goes down by half.

💀 TFW my content and models only appeal to bots. Who’s got the current best AI girlfriend app guys?

8 replies

·

ZennyKenny

posted an update 16 days ago

Post

2410

🤔 Do you have a Hugging Face Space that you wish you could programmatically restart to induce data refresh or some other behavior?

👉 Try Spaces Scheduler for this use case: https://github.com/kghamilton89/spaces-scheduler

➡️ Lightweight
➡️ Easy to setup
➡️ Just works

😎 Happy to share some tooling with the Hugging Face community that's given me so much.

codelion

posted an update 27 days ago

Post

3132

Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models

I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.

Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.

The article covers:

- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens

Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop

Try the model: codelion/malm-165m

Code: https://github.com/codelion/hash-hop

1 reply

·

ZennyKenny

posted an update about 1 month ago

Post

3238

😎 My new personal website is live! Check out https://kennethhamilton.me to chat with an LLM about my professional skills and personal projects.

🙈 Think of it like a really, really vain version of ChatGPT.

6 replies

·

UVSKKR

authored a paper about 1 month ago

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

Paper • 2601.03194 • Published Jan 6 • 2

UVSKKR

submitted a paper to Daily Papers about 1 month ago

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

Paper • 2601.03194 • Published Jan 6 • 2

shb777

posted an update about 2 months ago

Post

346

Huge thanks to @Fizzarolli for discovering Llama 3.3 8B Instruct model from the Llama API.

I fixed the context length and chat template.

Model: shb777/Llama-3.3-8B-Instruct-128K
GGUF's: shb777/Llama-3.3-8B-Instruct-128K-GGUF
Evals: shb777/Llama-3.3-8B-Instruct-128K-Evals

Enjoy !

codelion

posted an update about 2 months ago

Post

6120

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

·

codelion

posted an update 2 months ago

Post

2408

Introducing PTS Visualizer - an interactive tool for exploring how language models reason!

Visualize pivotal tokens, thought anchors, and reasoning circuits. See which tokens and sentences significantly impact success probability, explore embedding clusters, and trace reasoning step-by-step.

Try it: codelion/pts-visualizer

Explore PTS datasets:
- Qwen3-0.6B: codelion/Qwen3-0.6B-pts
- DeepSeek-R1: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Or upload your own JSONL files!

GitHub: https://github.com/codelion/pts

ZennyKenny

posted an update 2 months ago

Post

2013

🍓 One of the coolest parts about being an early Strawberry user has been the opportunity to build on the app at the ground floor.

The platform already has a ton of great integrations that let you interact with your external apps directly with tools, but I wanted to add the ability to do stuff in Slack as well.

💪 So I took the base Anthropic Slack MCP server, added a whole bunch of new tools, and generalized it as an HTTP-based SSE-server and deployed it in like 2 minutes with Railway so that Strawberry could make use of it (as can Claude or any other MCP client).

Now, you can Chat with your Strawberry Companion (or Claude, or whatever) and do things like:
➡️ Get caught up across all of your Slack channels after a long weekend or noisy incident without having to read 20 threads in 10 different channels
➡️ Create, read, and edit Canvases, Messages, and Channels
➡️ Take any resources or content that you're using in your Chat and inject it directly into Slack without copy / paste

😎 I'm pretty pleased with the results, and I made a short demo video showing the results of the work (link in comments). The best part is, it's available on GitHub for anyone else to use too (link in the comments, instructions in the README). The setup takes about 5-10 minutes.

2 replies

·

davidberenstein1957

posted an update 2 months ago

Post

2561

🚨 Phare LLM benchmark V2: Reasoning models don't guarantee better security

Read the full blog here: https://huggingface.co/blog/davidberenstein1957/phare-llm-benchmark-v2

daqc

posted an update 2 months ago

Post

4238

Check out your 2025 Hugging Face Wrapped, a small experimental recap
hf-wrapped/2025

3 replies

·

codelion

posted an update 2 months ago

Post

2599

Recently, Essential AI released a new 8B base model EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning -

"In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. "

This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training -
https://huggingface.co/blog/codelion/optimal-dataset-mixing

codelion

posted an update 3 months ago

Post

2678

NotebookLM's infographics feature is amazing, it generates poster-type images from any text. Here is one I tried for my new HF article on ellora - https://huggingface.co/blog/codelion/ellora-lora-recipes

ZennyKenny

posted an update 3 months ago

Post

229

What a trip. Just walked through @burtenshaw and @evalstate tutorial on adding Hugging Face Skills to your Claude Code agent so you can fine tune LLMs by chatting with AI.

These are the kinds of innovations that are going to help everyone benefit from the power of Artificial Intelligence. Well done gentlemen and thank you for sharing.

1 reply

·

codelion

posted an update 3 months ago

Post

2322

Perplexity released a dataset (BrowseSafe) and benchmark to catch and prevent malicious prompt-injection instructions in real-time.

We trained a prompt injection classifier on BrowseSafe using adaptive-classifier with ModernBERT-base embeddings.

74.9% F1 on detecting prompt injection in web content.

Model -> adaptive-classifier/browsesafe
Dataset -> perplexity-ai/browsesafe-bench
Repo -> https://github.com/codelion/adaptive-classifier

1 reply

·

Reasoning datasets competition

AI & ML interests

Recent Activity

SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild?

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

AI & ML interests

Recent Activity

Team members 40

reasoning-datasets-competition's activity