Skip Navigation
Machine Learning - Training | Fine Tuning @lemmy.intai.tech manitcor @lemmy.intai.tech

CUDA full GPU acceleration, KV cache in VRAM

github.com CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp

This PR adds GPU acceleration for all remaining ggml tensors that didn't yet have it. Especially for long generations this makes a large difference because the KV cache is still CPU only on master ...

CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggerganov/llama.cpp
0
TechNews @radiation.party irradiated @radiation.party
BOT

Llama.cpp: Full CUDA GPU Acceleration

2 0
0 comments