10x demo speedup
#1
by cbensimon HF Staff - opened
PR contents
- Remove CPU offloading (not needed)
- Load pre-compiled FLUX blocks (built with FlashAttention-3)
Before: 130s
After: 13s
(including ZeroGPU init)
cbensimon changed pull request title from Accelerate demo to 10x demo speedup
cbensimon changed pull request status to open
wanghaofan changed pull request status to merged