Looking for NLP datasets with small vocabularies
I'm looking for NLP datasets/corpora with small vocabulary sizes -- less than 5K unique words, but smaller still is better. I've tried looking for e.g. datasets of children's books but I've not found anything that has a decent length of corpus while also having a small vocabulary size (e.g. any children's book with very small vocabulary sizes also tends to be very short!) If anyone can share anything fitting this description I'd be really interested to hear.
Topic data dataset nlp machine-learning
Category Data Science