[HN] Llama.cpp: Full CUDA GPU Acceleration
[HN] Llama.cpp: Full CUDA GPU Acceleration
github.com CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp
This PR adds GPU acceleration for all remaining ggml tensors that didn't yet have it. Especially for long generations this makes a large difference because the KV cache is still CPU only on master ...
0 comments