The Weekly Blip: August 13th 2023
How to best measure LLM performance + Getting copied & kicked by Shopify
It can be hard to measure the performance of language models with datasets - because often those same datasets are used in their training.
There are some off-beat tests that do better: