Skip Navigation
TechNews @radiation.party irradiated @radiation.party
BOT

[HN] Llama.cpp: Full CUDA GPU Acceleration

github.com CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp

This PR adds GPU acceleration for all remaining ggml tensors that didn't yet have it. Especially for long generations this makes a large difference because the KV cache is still CPU only on master ...

0
Machine Learning - Training | Fine Tuning @lemmy.intai.tech manitcor @lemmy.intai.tech

CUDA full GPU acceleration, KV cache in VRAM

2 0
0 comments