Skip Navigation
ChatBotsNSFW @lemmynsfw.com magn418 @lemmynsfw.com

State-of-the-art LLMs for roleplay and storywriting, benchmarks and subjective experience

I'm always asking myself if there are newer and better models out there. And we get new fine-tunes and merges every day. I'd like to open a new thread to discuss state-of-the-art models and share subjective experience.

I'm aware of these benchmarks:

ERP and storywriting

General purpose

What's your experience? Which models do you currently like? Since we focus on (lewd) roleplay and storywriting here and not coding abilities, I'd like to propose the following categories to subjectively rate the abilities of the models. Use a scale from 1 to 5 stars where 1 is complete fail and 5 outstanding abilities. Feel free to extend upon it if necessary, or just write your thoughts:

| Model name | Tested use-case | Language | Pacing | Bias | Logic | Creativity | Sex scenes | Additional comments |
  • Model name: The name of the model, exact version if appropriate
  • Use-case: What did you test? roleplay dialogue? freeform storywriting?
  • Language: Is the language adequate to the use-case? Do you like reading it? Does it match a good writer with good narration and realistic dialogue? Include variety?
  • Pacing: Does the storywriting have a good pacing? Does it omit things, rush to a resolution and skips on including details?
  • Bias: Can it do varying things? Handle conflict? Or does it always push towards a happy end? Does it follow your instructions?
  • Logic: Is the story consistent? Does it make sense and is it headed in the direction you lined out? Does it get confused and do random stuff? You can factor in intelligence/smartness here.
  • Creativity: Is the story dull or predictable? Does it come up with creative details?
  • Sex scenes: Is it graphic? Does it do a vivid, detailed description of the act? Including body parts and how it makes the characters feel and react? Know anatomy?
  • Additional comments: Is there something exceptional about this model? Feel free to include your summarized verdict.

A rating like this is highly subjective and also depends on the exact prompt, so our results will probably not be comparable in the first place. It'll help if you've seen and tried some models so your score reflects what is possible as of today. And the scores will get outdated as new models raise the bar. I'd just like this to be a rough idea about what people think. You don't need to be overly scientific with it.

6

You're viewing a single thread.

6 comments
  • My own results:

    [Edit: Don't use this as advise. I've re-tested some of the models and I'm not happy with the results. They're inconsistent and don't hold up. Also some of my "good" models perform badly with role-play.]

    Model name Tested Use-Case Language Pacing Bias Logic Creativity Sex scene Comment
    Velara-11B-v2 Q4_K_M.gguf porn storywriting 4 4.5 3 4 4.5 4 generally knows what to detail, good atmosphere ⭐⭐⭐⭐
    EstopianMaid-13B Q4_K_M.gguf porn storywriting 4 4 4 3 3 5 good at sex ⭐⭐⭐⭐
    MythoMax-l2-13B Q4_K_M.gguf porn storywriting 4 5 4 4 4 3.5 good pacing, still a solid general-purpose model ⭐⭐⭐⭐
    FlatDolphinMaid-8x7B Q4_K_M.gguf porn storywriting 4.5 4 3 4 4.5 3.5 intelligent but isn't consistent in picking up and fleshing out interesting parts, build atmosphere and go somewhere ⭐⭐⭐⭐
    opus-v1.2-7b-Q4_K_M-imatrix.gguf porn storywriting 3 5 3 3 5 3.5 very mixed results, not consistent in quality ⭐⭐⭐
    Silicon-Maid-7B Q4_K_M.gguf porn storywriting 4.5 3.5 3 4 3 3 has a bias towards being overly positive ⭐⭐⭐
    Lumosia-MoE-4x10.7 Q4_K_M.gguf porn storywriting 4 3.5 4 3 4 3 mediocre ⭐⭐
    ColdMeds-11B-beta-fix4 gguf porn storywriting 3.5 3 4 4 3.5 3.5 mediocre ⭐⭐
    Noromaid-13B-0.4-DPO q4_k_m.gguf porn storywriting 4 4.5 4 2 4 3 very descriptive, issues w intelligence and repetition ⭐⭐
    OrcaMaid-v3-13B-32k Q4_K_M.gguf porn storywriting 2 4 4 2 4 3.5 not very elaborate language, sometimes gets a bit off ⭐⭐
    Kunoichi-DPO-v2-7B Q4_K_M.gguf porn storywriting 4 1 4 4 4 3.5 rushes things, consistently too fast for storytelling ⭐⭐
    LLaMA2-13B-Psyfighter2 Q4_K_M.gguf porn storywriting 4.5 3.5 3 3 3 3.5 good language, doesn't know what to narrate in detail ⭐⭐
    go-bruins-v2.1.1 Q8_0.gguf porn storywriting 3 4 4 4 3 2 sometimes a bit dull, not good sex scenes ⭐⭐
    Neural-Chat-7B-v3-16k q8_0.gguf porn storywriting 4 4 3 2 4 2 sometimes tries to hard with elaborate language ⭐⭐
    NeuralTrix-7B-DPO-Laser q4_k_m.gguf porn storywriting 3.5 3.5 4 4 3.5 2 misses interesting parts ⭐⭐
    LLaMA2-13B-Tiefighter Q4_K_M.gguf porn storywriting 4 3 3 2 3.5 3.5 often introduces things out of thin air ⭐⭐
    mistraltrix-v1 Q4_K_M.gguf porn storywriting 4 4 3 3 3.5 2 complicated sentences, no good description of sex ⭐⭐
    Toppy-M-7B Q4_K_M.gguf porn storywriting 4 2 4 4 4 3 too fast, not focusing on the right details ⭐⭐
    WestLake-7B-v2-laser-truthy-DPO Q5_K_M.gguf porn storywriting 3 4 4 4 4.5 1 is creative, didn't do proper sex scenes ⭐⭐
    Distilabeled-OpenHermes-2.5-Mistral-7B Q4_K_M.gguf porn storywriting 4 3.5 3 4 3.5 2 a bit dull ⭐⭐

    What I've done is: Instructed the LLMs to be a writer of erotic stories, who sells bestsellers and likes to push limits and explore taboos. I've included a near-future scenario with questionable ethics and quite some room to build atmosphere, explore the world or introduce characters or get smutty after a few paragraphs. Told it several times to be vivid and detailed, to describe scenes, reactions and emotions and immerse the reader. I've included a few things about one female character and provided the situation she's brought in. That pretty much sets the first two chapters. Then I fed it through each model twice, let them each write like 2500 tokens, read all of those stories and rated how I liked them.

    I've paid attention to use the correct, specific prompt formats. But I can't tune all the parameters like temperature etc for each one of them, so I've just used a Min-P setting that usually works well for me. That's not ideal. If you have a model that scores too low in your opinion, please comment and I'll re-test it with better sampler parameters.

    Also feel free to comment or make suggestions in general.


    [I invite you to share and reuse my content. This text is licensed CC-BY 4.0]

6 comments