I wonder how much variation there would be if you got a single model to produce a couple of gigabytes of tiny children's stories.
Might be an interedting research project.
I wonder how much variation there would be if you got a single model to produce a couple of gigabytes of tiny children's stories.
Might be an interedting research project.
There is one already: https://arxiv.org/abs/2305.07759 https://huggingface.co/datasets/roneneldan/TinyStories
6.5GB of tiny stories, as requested. ;)
My comment was, in-fact, a subtle reference to this.
The best opening I got from my own TinyStories trained model was.
Once upon a time, in a small town, there was a large town.
Which I just love as an evocative idea.
SimpleStories is a more diverse version: https://huggingface.co/datasets/SimpleStories/SimpleStories
Texts in Gutenberg have 20GB, and full Wikipedia (English texts) have 80-110GB.
So to LLM-generate 6.5GB of tiny stories is quite a permutation in action :)