In Defense of Small Data - Part 2: Don’t Fear the Anecdata
I had lunch at the experimentation startup StatSig this week, and they showed off their product. It’s a great tool overall, but one feature stood out to me: Log Samples.
When you set up experiments on Statsig, it automatically pulls up real-time samples of your logging to inspect. For many features, this isn’t unusual. For many engineering tools, this is downright obvious. But all too often in data, we don’t inspect the raw input, until after we see something go wrong with the aggregate.
All too often in data, “anecdote” is a dirty word.
How many times across Data Science have we seen an average as a metric, and a year later someone publishes a big analysis saying: “Our product has outliers! The average isn’t really what’s going on!”
How many times have we started projects where we look at the average, then the percentiles, then the outliers, and then finally look at individual samples?
But when we do that, we’re going against what engineering and product have learned long ago: It (usually) works better if you start small, see what works, and scale it up. Engineers do this with code - they write little functions and debug, write some tests cases, and then later scale it. Product does this with their familiar mantra of “product market fit first, then scale.” But for some reason many of us in data miss this.
So, what’s the best practice? I suggest:
Dogfood the product.
Inspect your own data.
Create aggregates just for you, and check if they’re sensical.
Inspect a random set of anonymous user data. Better if you can select for outliers in some way.
Figure out if the weirdest users will break an aggregate in any way.
THEN build a metric.