MIT CSAIL researchers used a natural language-based logical inference dataset to create smaller language models that outperformed much larger counterparts.
It's interesting that they were able to get a model with 350M parameters to outperform others with 175B parameters