I've successfully generated my own dataset and finetuned a LLM on it!
Holy crap! I've finally done it. I've generated a dataset of conversations (all locally), cleaned them up and then finetuned it on open-llama-7b (just to test) and IT WORKED! AHHHHH! happy noises
Okay I gotta go to sleep now. I have to get up for work in less than five hours. I'll be cleaning up everything, commiting code and doing the write up this week.
I'm just too tired to deal with writing up everything until Friday, since I have a lot of hours at my day job this week.
But, if by chance anyone is watching this, you can see the code that I checked into the dev branch of Kontour and the configuration file used to generate the datasets I used as well as the scripts in data-cleaning to reformat everything.