Spaces:
Running
Running
Commit History
ggml : full ALiBi support (llama/7192)
192bda4
Introduction of CUDA Graphs to LLama.cpp (llama/6766)
08fc76d
agray3
slaren
commited on
Add an option to build without CUDA VMM (llama/7067)
38b1143
ggml : add Flash Attention (llama/5021)
34d3b03
ggml : group all experts in a single ggml_mul_mat_id (llama/6505)
f0b5c67
CUDA: fix matrix multiplication logic for tests (llama/6667)
6ccb5a5
feat: implemented sigmoid function (ggml/806)
cd0c122
Justina Cho
commited on
llama : add Command R Plus support (llama/6491)
8cf7097
unverified
ggml : mul_mat_id use the same tensor for all the experts (llama/6387)
26fdc9f
unverified
ggml: bypass code incompatible with CUDA < 11.1 (#2020)
32f4e35
unverified
sync : ggml (#2001)
cbbfa9e
unverified
ggml : reuse quantum structs across backends (llama/5943)
bb0625f
unverified
1.5 bit: we can do even better (llama/5999)
36cc71e
unverified
Better 1.5 bit quantization (llama/5971)
f3a62cc
unverified
ggml : add ggml-common.h to deduplicate shared code (llama/5940)
0a37735
unverified
ggml : introduce ggml_status (ggml/750)
151c676
unverified
cuda : fix data race in soft max (llama/5853)
d1b60e4
unverified
slaren
commited on
ggml : IQ3_S improvements (llama/5829)
06a8e30
unverified
ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (llama/5760)
9a07f42
unverified
IQ4_XS: a 4.25 bpw quantization (llama/5747)
0ee1bfb
unverified
cuda : replace remaining shfl_xor with calls to warp_reduce functions (llama/5744)
753b30d
unverified
Engininja2
commited on
CUDA: fix DEBUG_CUDA_MALLOC (llama/5729)
f18f386
unverified
code : normalize enum names (llama/5697)
93e0830
unverified
IQ3_S: a much better alternative to Q3_K (llama/5676)
32589c9
unverified
Introduce backend GUIDs (ggml/743)
a7eb9f6
unverified
UEXTM.com
slaren
commited on
ggml : always define ggml_fp16_t as uint16_t (llama/5666)
bc567d3
unverified
sync : llama.cpp (ggml/0)
f8e8d34
unverified
cuda : ignore peer access already enabled errors (llama/5597)
a817d85
unverified
slaren
commited on
ci : enable -Werror for CUDA builds (llama/5579)
df03a10
unverified
cuda, metal : fix nans in soft_max (llama/5574)
44164ac
unverified
1.5 bit quantization (llama/5453)
9c3aa6a
unverified
ggml : add ALiBi support for ggml_soft_max_ext (llama/5488)
26c019a
unverified
cuda : print message when initialization fails (llama/5512)
1f047ca
unverified
slaren
commited on
CUDA: mul_mat_vec_q tiling, refactor mul mat logic (llama/5434)
c0cfa9b
unverified
CUDA: more warps for mmvq on NVIDIA (llama/5394)
7ab774c
unverified
CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (llama/5386)
3ff7660
unverified
CUDA: mul_mat_vec_q max. batch size 8 -> 4 (llama/5370)
7aa3216
unverified
CUDA: mul_mat_vec_q for batch sizes > 1 (llama/5351)
ae45b38
unverified
cuda : fix LLAMA_CUDA_F16 (llama/5262)
5fd8fb7
unverified
slaren
commited on
llava : add MobileVLM support (llama/5132)
f17a416
unverified
JidongZhang-THU
slaren
commited on
sync : ggml (llama/0)
cdb7964
unverified
SOTA 3-bit quants (llama/5196)
4649943
unverified
`ggml_cuda_cpy` support for 4d tensors and float16->float32 upcasting (ggml/686)
75d438c
unverified
John Balis
slaren
commited on
ggml : add Vulkan backend (llama/2059)
5a97aba
unverified
cuda : fix tensor size calculation for non-split buffer (llama/5145)
8f3eb65
unverified
slaren
commited on
cuda : fix 2-bit quants on amd hip (llama/5105)
aadbd67
unverified
Engininja2
commited on