Spaces:
Sleeping
Sleeping
Commit History
fix cmd
6c8a230
fix cmd
a1a7aac
fix build
1b9b118
fix build
d937c3a
fix dockerfile path
03ff3a5
add meta
36ff0ea
chore: track binaries with git-lfs
aa000f7
chore: track binaries with git-lfs
f33d63d
add sync task
46ebeba
Handle negative value in padding (#3389)
6e115ac
unverified
Treboko
commited on
models : update`./models/download-ggml-model.cmd` to allow for tdrz download (#3381)
0b65831
unverified
talk-llama : sync llama.cpp
4321600
sync : ggml
a0af6fc
ggml: Add initial WebGPU backend (llama/14521)
4b3da1d
Reese Levine
commited on
ggml : initial zDNN backend (llama/14975)
6dd510c
common : handle mxfp4 enum
fd4c0e1
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (llama/15379)
a575f57
vulkan: disable spirv-opt for bfloat16 shaders (llama/15352)
cf24af7
vulkan: Use larger workgroups for mul_mat_vec when M is small (llama/15355)
054584a
vulkan: support sqrt (llama/15370)
e5406c0
Dong Won Kim
commited on
vulkan: Optimize argsort (llama/15354)
80a188c
vulkan: fuse adds (llama/15252)
ad199b1
vulkan: Support mul_mat_id with f32 accumulators (llama/15337)
41a76e6
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (llama/15334)
a6fa78e
OpenCL: add initial FA support (llama/14987)
8ece1ee
opencl: add initial mxfp4 support via mv (llama/15270)
1a0281c
lhez
shawngu-quic
commited on
vulkan : fix out-of-bounds access in argmax kernel (llama/15342)
78a1865
vulkan : fix compile warnings on macos (llama/15340)
e3107ff
ggml: initial IBM zDNN backend (llama/14975)
449e1a4
CUDA: fix negative KV_max values in FA (llama/15321)
6e3a7b6
HIP: Cleanup hipification header (llama/15285)
7cdf9cd
vulkan: perf_logger improvements (llama/15246)
d48d508
ggml: fix ggml_conv_1d_dw bug (ggml/1323)
4496862
cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300)
59c694d
Sigbjørn Skjæret
commited on
finetune: SGD optimizer, more CLI args (llama/13873)
f585fe7
HIP: bump requirement to rocm 6.1 (llama/15296)
58a3802
ggml : update `ggml_rope_multi` (llama/12665)
b4896dc
ggml : repack block_iq4_nlx8 (llama/14904)
db4407f
CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132)
c768824
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors over RPC (macOS & others) (llama/15188)
c8284f2
HIP: disable sync warp shuffel operators from clr amd_warp_sync_functions.h (llama/15273)
8fca6dd
sycl: Fix and disable more configurations of mul_mat (llama/15151)
7b868ed
Romain Biessy
commited on
opencl: allow mixed f16/f32 `add` (llama/15140)
345810b
CUDA cmake: add `-lineinfo` for easier debug (llama/15260)
008e169
CANN: GGML_OP_CPY optimization (llama/15070)
73e90ff
Chenguang Li
commited on
musa: fix failures in test-backend-ops for mul_mat_id op (llama/15236)
4168dda
CANN: Add broadcast for softmax and FA (llama/15208)
db87c9d
kleidiai: fix unsigned overflow bug (llama/15150)
9d5f58c
Charles Xu
commited on
cuda: refactored ssm_scan and use CUB (llama/13291)
7a187d1
David Zhao
commited on