Starting a Mistral Megathread to aggregate resources.
This is my new favorite 7B model. It is really good for what it is. I am excited to see what we can tune together. I will be using this thread as a living document, expect a lot of changes and notes, revisions and updates.
Let me know if there's something in particular you want to see here. I will be adding to this thread throughout my fine-tuning journey with Mistral.
If you like to run any of the quantized/optimized models from TheBloke, do visit the full model pages from each of the quantized model cards to see and support the developers of each fine-tuned model.
So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.
I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.
A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.
I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.
If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.
That's fair, I think chat/roleplay are great use cases.
I also think some of these lightweight models might make for interesting personal recommendation/categorization engines, etc. In my experiments with using models to categorize credit card transaction statements ala Mint, only GPT4 was able to do a good job out of the box. I bet a small model could do quite well with fine tuning though.
Another thought I had was to make some sort of personal recommendation engine, so you could export your Netflix/Spotify likes and have it recommend movies or music that you might enjoy, etc. I suppose it's still early days for those kind of uses for open source models!