[HN] Llama.cpp: Full CUDA GPU Acceleration

github.com CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp

This PR adds GPU acceleration for all remaining ggml tensors that didn't yet have it. Especially for long generations this makes a large difference because the KV cache is still CPU only on master ...

Machine Learning - Training | Fine Tuning @lemmy.intai.tech manitcor @lemmy.intai.tech 1y ago

CUDA full GPU acceleration, KV cache in VRAM

github.com /ggerganov/llama.cpp/pull/1827

2 0

0 comments