In a Training Loop 🔄

527 2269 27194

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

reacted to XiangpengYang's post with 🔥 about 7 hours ago

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)! We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥 🔍 What makes VideoCoF different? 🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results. 📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×). 🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control. ⚡ Fast inference update 🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use. 🔗 Links 📄 Paper: https://arxiv.org/abs/2512.07469 💻 Code: https://github.com/knightyxp/VideoCoF 🤗 Demo: https://huggingface.co/spaces/XiangpengYang/VideoCoF 🧩 Models: https://huggingface.co/XiangpengYang/VideoCoF 🌐 Project Page: https://videocof.github.io/ #VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

updated a collection about 7 hours ago

Spaces for Image-to-Image / Video

liked a Space about 7 hours ago

XiangpengYang/VideoCoF

View all activity

Organizations

reacted to XiangpengYang's post with 🔥 about 7 hours ago

Post

453

🚀 Introducing VideoCoF: Unified Video Editing with a Temporal Reasoner (Chain-of-Frames)!

We’re excited to introduce VideoCoF, a unified framework for instruction-based video editing that enables temporal reasoning and ~4× video length extrapolation, trained with only 50k video pairs. 🔥

🔍 What makes VideoCoF different?
🧠 Chain-of-Frames reasoning , mimic human thinking process like Seeing → Reasoning → Editing to apply edits accurately over time without external masks, ensuring physically plausible results.
📈 Strong length generalization — trained on 33-frame clips, yet supports multi-shot editing and long-video extrapolation (~4×).
🎯 Unified fine-grained editing — Object Removal, Addition, Swap, and Local Style Transfer, with instance-level & part-level, spatial-aware control.

⚡ Fast inference update
🚀 H100: ~20s / video with 4-step inference, making high-quality video editing far more practical for real-world use.

🔗 Links
📄 Paper: https://arxiv.org/abs/2512.07469
💻 Code: https://github.com/knightyxp/VideoCoF
🤗 Demo: XiangpengYang/VideoCoF
🧩 Models: XiangpengYang/VideoCoF
🌐 Project Page: https://videocof.github.io/

#VideoEditing #DiffusionModels #GenerativeAI #ComputerVision #AI

4 replies

reacted to YatharthS's post with 🔥 about 18 hours ago

Post

829

I just released LayaCodec, a highly efficient neural audio tokenizer/codec for TTS models, far better than most previous audio tokenizers.

🤯 Next-gen TTS models that use this could achieve several 100s of times real-time speed while producing clearer audio!! 🤯

GitHub repo: https://github.com/ysharma3501/LayaCodec
Model: YatharthS/LayaCodec

reacted to etemiz's post with 🚀 about 23 hours ago

Post

256

Today's winner is Ling 1T with a score of 38!

Btw AHA2 is in the works, with more domains, better comparison LLMs and questions, overall better signal.

reacted to sergiopaniego's post with 🔥 about 23 hours ago

Post

751

🎄 last talk of the year about open AI and HF today at Universidad Rey Juan Carlos for undergrad students

always a pleasure to be back at my alma mater

🎅 slides: https://github.com/sergiopaniego/talks

1 reply

reacted to unmodeled-tyler's post with 👍 about 23 hours ago

Post

111

NEW MODEL: vanta-research/apollo-astralis-2

VANTA Research is pleased to share our new Apollo Astralis 2 model! This version improves on the capabilities of the last, increasing performance by 20 points in commonsense Q&A, and 10 points overall.

Apollo Astralis 2 is built on the new Ministral 3 8B architecture, and is perfect for logical reasoning, math/science problem-solving, and multi-step analytical thinking.

Additionally, this model was trained on our previously released vanta-research/human-ai-collaboration-2 dataset, making it perfect for human-AI collaboration.

Check it out!

reacted to prabhatkr's post with 🧠 1 day ago

Post

1511

Language Dexterity Benchmark

I am working on a new benchmark to establish human language dexterity. My hypothesis is that certain language allow for more accurate dexterous behaviour - Pointed, unambigous, and confusion-free references of parts of speech in small and large contexts. There are certain languages with high degree of accurate grammar like Sanskrit, Esperanto, and Turkish. I am native Sanskrit speaker.
I have plans to establish this benchmark and test this hypothesis across 100 langauges. I have created 25 task prompts for text, image, video and robotics manipulation. We can test langauges across multiple popular models. Here is the github link: https://github.com/ParamTatva-org/Linguistic-Dexterity-Benchmark

1 reply

reacted to martinsu's post with 🔥 1 day ago

Post

1564

I wasted days on a GPU node on a bug that shouldn't exist

So I was fine-tuning TildeOPEN-30B and the outputs were... weird. Token ID 179 (<0x00>) kept appearing between almost every token pair. Took me a bit to figure out what was going on.

Turns out I used the fast tokenizer for training, but the model was trained on the slow one. Silent failure.

Well... long story short—TGI uses (forces) the fast tokenizer, no questions asked. And you'll have agile's kryptonite: silent failure. If the model was trained on slow, it's a silent disaster.

I got curious and wrote a quick script to check how common this is. Ran it on 6,014 LLM HF models overnight.

Roughly 10% of HF model downloads have mismatched tokenizers. Not all mismatches are catastrophic, but some are brutal — like chat template markers inflating from 1 token to 3, silently wrecking context windows and causing model act weird.

This wasn't rigorous research, but the drift is real. And the worst part? 968 models(out of 500+ downloads) have both fast and slow tokenizers present, but they still produce different outputs. No missing files, no errors — just silent degradation.

TGI defaults to the fast tokenizer, as does AutoTokenizer.from_pretrained(). If a fast tokenizer doesn't exist, it auto-generates one. If your model was trained on slow, you get silent degradation. Output looks fine; the model just performs worse. Sometimes really worse. You'd never know.

If model was trained on fast tokenizer, its fine, but how do You know?

The root cause? Either model authors run HF conversion and upload both without verifying, or users run TGI, which always forces(converts to) fast .

The result of this fight with tokenizers is martinsu/tildeopen-30b-mu-instruct

It's based on TildeOPEN-30B (a solid EU HPC multilingual base). Nothing fancy—just a proper instruction fine-tune where I didn't mess up the tokenizer this time.

Full article: https://github.com/martins-u/tokenmagedon

1 reply

reacted to jzhang533's post with 🔥 1 day ago

Post

1221

To not waste Hugging Face inference providers credits. I have created a fully automated blog that generates daily city guides: https://see-the-world-by-llm.github.io/ .

The system autonomously picks a city, generates bilingual content (En/Zh) using random text generation models.

reacted to danielhanchen's post with 🔥 1 day ago

Post

1474

Mistral's new SOTA coding models Devstral 2 can now be Run locally! (25GB RAM) 🐱
We fixed the chat template, so performance should be much better now!
24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF

🧡Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2

reacted to etemiz's post with 👍 1 day ago

Post

189

Fine tuning is also important in a RAG system. The LLM will bring its own opinion sometimes and tell a different answer which is contrary to the retrieved knowledge. One should use an aligned LLM to produce the final answer.

2 replies

reacted to anakin87's post with 👀 1 day ago

Post

134

💭 Do thinking traces make Language Models learn better? Curious what others think

𝗦𝗰𝗲𝗻𝗮𝗿𝗶𝗼
You take an instruction-following LM.
You want to train it with a GRPO-style RL algorithm on a task like Tic Tac Toe.
Rewards are outcome-based, applied only at the end of each episode: win/loss/draw, format adherence...

During training, the model could just output answers, but a common choice is to make it also output thinking traces.

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻
Does forcing the model to produce thinking traces during training actually improve learning❓

💬 I'd like to hear your thoughts. Share ideas and links to relevant papers and resources.

From what I've understood so far, the answer seems to be 𝘆𝗲𝘀.

1️⃣ If you force the model to think during training, it becomes a model that thinks at inference time. It naturally allocates more budget (tokens) to a problem, which tends to improve performance.

2️⃣ While the model's "reasoning" already exists in its activation space, using explicit thinking traces as a scratchpad allows training to steer and shape that reasoning.

3️⃣ As the model produces more traces during training, the RL algorithm can progressively give higher rewards to the reasoning patterns that lead to better outcomes.

reacted to sergiopaniego's post with 🔥 1 day ago

Post

1469

TRL now includes agent training support for GRPO‼️

Train 🕵️ agents with 🔧 tools, enabling interaction with external functions and APIs.

And of course, a new notebook and scripts to get you up to speed

📘 notebook tutorial: https://github.com/huggingface/trl/blob/main/examples/notebooks/grpo_agent.ipynb

📂 script examples: https://github.com/huggingface/trl/blob/main/examples/scripts/grpo_agent.py

📦 TRL v0.26.0 release: https://github.com/huggingface/trl/releases/tag/v0.26.0

2 replies

reacted to tomaarsen's post with 🔥🔥 1 day ago

Post

2275

🐦‍🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details:

- CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.

- Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.

- Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass output_scores=True to get similarity scores returned. This can be useful for some distillation losses!

- Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet!

- Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it.

Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0

I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!

reacted to toshas's post with 🔥 1 day ago

Post

2000

Introducing 🇨🇭WindowSeat🇨🇭 –– our new method for removing reflections from photos taken through windows, on planes, in malls, offices, and other glass-filled environments.

Finetuning a foundation diffusion transformer for reflection removal quickly runs up against the limits of what existing datasets and techniques can offer. To fill that gap, we generate physically accurate examples in Blender that simulate realistic glass and reflection effects. This data enables strong performance on both established benchmarks and previously unseen images.

To make this practical, the open-source Apache-2 model builds on Qwen-Image-Edit-2509, a 20B image-editing diffusion transformer that runs on a single GPU and can be fine-tuned in about a day. WindowSeat keeps its use of the underlying DiT cleanly separated from the data and training recipe, allowing future advances in base models to be incorporated with minimal friction.

Try it out with your own photos in this interactive demo:
🤗 toshas/windowseat-reflection-removal

Other resources:
🌎 Website: huawei-bayerlab/windowseat-reflection-removal-web
🎓 Paper: Reflection Removal through Efficient Adaptation of Diffusion Transformers (2512.05000)
🤗 Model: huawei-bayerlab/windowseat-reflection-removal-v1-0
🐙 Code: https://github.com/huawei-bayerlab/windowseat-reflection-removal

Team: Daniyar Zakarin ( @daniyarzt )*, Thiemo Wandel ( @thiemo-wandel )*, Anton Obukhov ( @toshas ), Dengxin Dai.
*Work done during internships at HUAWEI Bayer Lab

reacted to YerbaPage's post with 🔥 1 day ago

Post

664

How to compress long code context? 📚

Check out our LongCodeZip [ASE 2025] 🔥

Code & Demo: https://github.com/YerbaPage/LongCodeZip
Paper: LongCodeZip: Compress Long Context for Code Language Models (2510.00446)

reacted to ronantakizawa's post with 👍 1 day ago

Post

201

Introducing the trending-stocks-yahoo-finance dataset: a compilation of the most trending stocks on Yahoo Finance from July 2024 to October 2025.

This dataset captures each trending stock's max price, max market cap, best rank on Yahoo Finance, PE ratio, and trading volume.

#stocks #investing #trading

ronantakizawa/trending-stocks-yahoo-finance

2 replies

reacted to KingNish's post with 🔥 1 day ago

Post

2064

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

reacted to ovi054's post with 🚀 1 day ago

Post

2102

Z-Image Turbo + LoRA ⚡

ovi054/Z-Image-LORA

Z-Image Turbo is the No. 1 trending Text-to-Image model right now. You can add a custom LoRA and generate images with this Space.

👉 Try it now: ovi054/Z-Image-LORA

3 replies

reacted to unmodeled-tyler's post with 👀 1 day ago

Post

428

Mistral Vibe is awesome!

I've been playing around with it a bit today, and have really been enjoying myself. I'm somewhat picky with CLI tools, so design is important to me. I think Mistral nailed it there. It's easy, clean, intuitive, and Mistral provides free API access to their devstral models to get started which is great.

Yesterday I built an interactive model selector for ollama in Aider. I decided to also configure one for Vibe because simple, clean integrations make my brain happy.

I've included a screenshot below of Mistral Vibe running GLM-4.6 through Ollama Cloud and the model selector!

2 replies

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity