Commit History

ggml-impl.h: fix build on POWER9 (llama/12855)
3a1d5ca

Piotr Kubaj commited on

CANN: Support Opt CONV_TRANSPOSE_1D and ELU (llama/12786)
3b46fdc

Chenguang Li commited on

vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (llama/12833)
4b7a407

jeffbolznv commited on

vulkan: Use fp16 for the flash attention P*V multiplication (llama/12783)
4e46f41

jeffbolznv commited on

cuda : add f32 to bf16 copy op (llama/12806)
9dcb047

Sigbjørn Skjæret commited on

llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)
e7cb2dc

ggerganov commited on

ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187)
87f1ea3

cmdr2 commited on

ggml : add bilinear upscale support (ggml/1185)
4c5e449

Diego Devesa commited on

ggml : add more generic custom op, remove deprecated custom ops (ggml/1183)
ba7a5f8

Diego Devesa commited on

Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (llama/12812)
3d4b079

Neo Zhang Jianyu commited on

opencl: better identify Adreno GPU (llama/12760)
5560cd6

lhez commited on

cuda : fix HIP and MUSA BF16 (llama/0)
6dc5583

ggerganov commited on

sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (llama/12734)
7d3e668

jeffzhou2000 commited on

CANN: fix typo in ggml-cann (llama/12733)
65ced74

jeffzhou2000 commited on

CANN: Refactor to reduce duplicate code (llama/12731)
44ac81c

hipudding commited on

musa: fix compilation warnings in mp_22/31 (llama/12780)
090ad80

R0CKSTAR commited on

vulkan: fix NaN issue in flash attention shader (llama/12776)
77d7613

jeffbolznv commited on

vulkan: Use unclamped loads for flash attention mask (llama/12720)
a76ef69

jeffbolznv commited on

Vulkan: Tune Vulkan mmq int dot shader for performance (llama/12767)
b3bf710

OccamRazor commited on

sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (llama/12625)
27cbcc9

Nicolò Scipione commited on

cmake: fix ggml-shaders-gen compiler paths containing spaces (llama/12747)
1c89b7d

Ronny Brendel commited on

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (llama/12630)
ee422be

jeffbolznv commited on

vulkan: set cmake minimum and project name in vulkan-shaders (llama/12744)
2459781

jeffbolznv commited on

CUDA: Prefer vector flash decoding kernel for Gemma models (llama/12738)
5d7a13f

Gaurav Garg JohannesGaessler commited on

vulkan: Fix missing cmake logic for dot product extension (llama/12721)
7a1e8f8

jeffbolznv commited on

fix MUSA compiler warning (llama/12704)
8d43aa6

a3sh commited on

CANN: Support operator SIN COS ARGMAX (llama/12709)
904aaf5

Chenguang Li noemotiovon commited on

Simplify and improve CUDA graphs through use of indirect copy pointers (llama/9017)
a2fdbe6

Alan Gray slaren commited on

CANN: Fix failed test cases (llama/12708)
7d5f3d4

hipudding commited on

opencl: use `max_alloc_size` in backend ctx instead of querying again (llama/12705)
3847456

lhez commited on

vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)
5ab06d6

jeffbolznv commited on

cmake: remove caching from vulkan coopmat checks (llama/12719)
fac18c1

bandoti commited on

vulkan: Implement grouped query attention in the coopmat2 FA shader (llama/12559)
e7bebe6

jeffbolznv commited on

Vulkan: Fix mmq int dot float cache size (llama/12722)
1cecf5d

OccamRazor commited on

llama : add option to override model tensor buffers (llama/11397)
3d000b6

Diego Devesa commited on

ggml : simplify Arm fp16 CPU logic (ggml/1177)
fb13b88

ggerganov commited on

CUDA: don't convert BF16 weights to FP32 (ggml/1174)
332bcaf

Sigbjørn Skjæret commited on

coreml : set convert_to="mlprogram" in convert
d41b883
unverified

danbev commited on

ci : disable freeBSD job in build.yml (#3064)
9374466
unverified

danbev commited on

examples : add HEAPU8 to exported runtime methods (#3062)
2339555
unverified

danbev commited on

ruby : make Ruby bindings installed with build options (#3056)
8d0a50d
unverified

KitaitiMakoto commited on

whisper : add no_context parameter to whisper_params (#3045)
0e991f8
unverified

sachaarbonel commited on

examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (#3038)
880d905
unverified

fujimotos commited on

ruby: use CMake in build process (#3043)
470918e
unverified

KitaitiMakoto commited on

docs : update README.md to note newer nvidia gpus (#3031)
9401dde
unverified

Jeff Klassen commited on

addon.node : support max_context api for addon.node (#3025)
6c51a9b
unverified

Lin Xiaodong linxiaodong commited on

whisper : reduce delta_min from 1000ms to 100ms (#3028)
d3e767a
unverified

ggerganov commited on

docs : document how to use 'WHISPER_FFMPEG' build option (#3029)
aa64fa0
unverified

fujimotos commited on

docs : fix README.md (#3024)
db55b1e
unverified

Ekaitz Zárraga commited on

xcf : use check for visionos build version (#3021)
919663c
unverified

danbev commited on