The moment we've been waiting for โ ACE-Step dropped their new model: Ace-Step 1.5 ๐ ๐ ACE-Step/Ace-Step1.5 And the best part? It's released under the MIT license. We've already started integrating it into our project. Let's go ๐
Think you know which AI papers go viral? Test your instincts! I built a little game where you try to guess the popularity of AI research papers from the Hugging Face Daily Papers feed.
How it works: You'll see two papers side by sideโread the titles, check the abstracts, and pick which one you think got more upvotes from the HF community.
It's a great way to discover trending AI research while having fun. Tests your intuition about what the ML community finds interesting.
The core idea: instead of treating physics as a soft condition the model can work around during optimization, enforce it strictly via reinforcement learning. The paper focuses on rigid body dynamics - collisions, pendulums, free fall, rolling.
We Built a Music App with ACE-Step โ Looking for Feedback
Hey everyone,
We've been building AceSteps โ a platform where anyone can create music using the ACE-Step model (ACE-Step/ACE-Step-v1-3.5B). You can mint your tracks as NFTs, tokenize them into 100,000 fractional shares, and trade them on Uniswap V4. When your song gets popular, token holders earn from ad revenue automatically. It's a Farcaster Mini-App on Base Network.
But we want to make it better, and we'd love your input:
What's the one feature that would make you actually use an AI music tool regularly? Andd any suggestions on how we can make this model better? Actually sharing here for this question. ๐ค
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fineโtuning? We ran headโtoโhead tests on Qwen3โ4B (10k+ highโquality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradientโnorm spikes made training unstable. MuonClip (Kimi K2โs clipping) stabilizes long pretraining runs, yet in our smallโscale fineโtune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muonโs spikes become catastrophic or if clipping wins out.
Published a new blogpost ๐ In this blogpost I have gone through the transformers' architecture emphasizing how shapes propagate throughout each layer. ๐ https://huggingface.co/blog/not-lain/tensor-dims some interesting takeaways :