Discovering Locally Run Language Models: Share Your Favorites/Not So Favorites!
Let's talk about our experiences working with different models, either known or lesser-known.
Which locally run language models have you tried out? Share your insights, challenges, or anything you found interesting during your encounters with those models.
With a quantized GGML version you can just run on it on CPU if you have 64GB RAM. It is fairly slow though, I get about 800ms/token on a 5900X. Basically you start it generating something and come back in 30minutes or so. Can't really carry on a conversation.
I was pretty impressed by guanaco-65B, especially how it was able to remain coherent even way past the context limit (with llama.cpp's context wrapping thing). You can see the second story is definitely longer than 2,048 tokens.