You can try entering some words/emojis in this SD tokenizer , and you will see that they match the vocab.json list : https://sd-tokenizer.rocker.boo/
Divide vocab.json list two categories:
Suffix tokens = tokens that end with </w> , which represents whitespace. Like "banana " .
Prefix tokens = tokens without whitespace. Like "post" used in words like "postapocaplypse " or "postman "
Prefix tokens give properties to suffix tokens when placed in front of them.
If you have a means to invoke random prefix- , suffix- and emoji- tokens you can now mix them to create crazy results
I recommend using Notepad++ to sort the tokens.
Follow the cross attention rule: "SD reads the prompt left to right , one token at a time, finding association from the previous token to the current token (and the inage rendered thus far)"