huggingPartyParis

community

https://partiful.com/e/oWOMGoPxB5D37qw5F8yN

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

altndrr authored a paper 19 days ago

Specificity-aware reinforcement learning for fine-grained open-world classification

altndrr authored a paper 19 days ago

Large Multimodal Models as General In-Context Classifiers

nicoboou authored a paper 25 days ago

XFACTORS: Disentangled Information Bottleneck via Contrastive Supervision

View all activity

fffiloni

posted an update 7 days ago

Post

3849

I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄

3 replies

fffiloni

posted an update 12 days ago

Post

450

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

Ujjwal-Tyagi

posted an update 18 days ago

Post

389

We have now LTX 2.3 with more better visual quality and richer sound, check it out! Lightricks/LTX-2.3

Ujjwal-Tyagi

posted an update 29 days ago

Post

2892

Public reports allege that Anthropic gobbled up trillions of tokens of copyrighted material and public data to build their castle. 🏰📄 Now that they're sitting on top, they're begging for special laws to protect their profits while pulling the ladder up behind them. 🪜🚫

But the hypocrisy meter just broke! 📉 They are accusing Chinese labs like DeepSeek, Minimax, and Kimi of "huge distillation attacks. The Reality is that You can't just loot the entire internet's library, lock the door, and then sue everyone else for reading through the window. Stop trying to gatekeep the tech you didn't own in the first place. Read the complete article on it: https://huggingface.co/blog/Ujjwal-Tyagi/the-dark-underbelly-of-anthropic

3 replies

Ujjwal-Tyagi

posted an update about 1 month ago

Post

224

Qwen 3.5 Model is here! Supporting 1m context length by default, It is giving much good performance and competitive to Claude Opus 4.6, Qwen/Qwen3.5-397B-A17B, here it's GGUF: unsloth/Qwen3.5-397B-A17B-GGUF, Follow me and turn on the notification for the latest news!

Ujjwal-Tyagi

posted an update about 1 month ago

Post

3027

GLM 5 is insane, it ranks #4 Globally!

4 replies

Ujjwal-Tyagi

posted an update about 2 months ago

Post

1370

Finally we got a benchmark and research paper on ai safety, I am very excited to see what comes next on protecting AGI AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security (2601.18491)

AI45Research/ATBench

stevenbucaille

posted an update about 2 months ago

Post

233

LWDetr is available in 🤗 transformers !
Checkout the collection to find the original paper, model weights and a demo space : https://huggingface.co/collections/stevenbucaille/lwdetr

malper

authored 9 papers about 2 months ago

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

Paper • 2303.12513 • Published Mar 21, 2023

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

Paper • 2310.16781 • Published Oct 25, 2023

WAFFLE: Multimodal Floorplan Understanding in the Wild

Paper • 2412.00955 • Published Dec 1, 2024

Ujjwal-Tyagi

posted an update 2 months ago

Post

1817

There is a new open-source music generation model called HeartMuLa. It offers strong, competitive performance compared to Suno and supports English, Chinese, Japanese, Korean, and Spanish. It is optimized to run easily on RTX GPUs and other consumer-grade hardware. HeartMuLa/HeartMuLa-oss-3B
https://github.com/HeartMuLa/heartlib

1 reply

Ujjwal-Tyagi

posted an update 2 months ago

Post

2789

So, Koreans are also doing great progress behind Chinese,
Their two open source ai models that are actually good in coding. upstage/Solar-Open-100B skt/A.X-K1

1 reply

Ujjwal-Tyagi

posted an update 2 months ago

Post

220

Finally we have the best powerful open source music gen model rivaling Suno v5: https://heartmula.github.io/

AI & ML interests

Recent Activity

Team members 970

HuggingPartyParis's activity