In Defense of Small Data - Part 1: Statistical Power Is Boring

Jul 14

When I went from Facebook to working at Oculus, people would ask me: "Aren't you worried about having too little data? How will you get results from experiments?" When I focused on Oculus Enterprise, they asked the same. And when I moved from Meta to a startup (which I'll be writing about soon), they asked the same, again.

And so, my answer was always the same: No, I'm not worried. Because statistical power is boring.

Let me explain why.

In stats, we learn that to get value out of an experiment, we need to believe that the effect size will be enough to get signal above the noise. In Data Science, we realize that larger n -> lower variance -> more likely to find effects. But to me, that misses the point of experiments. The point is to find out something about the product, and about what users want.

And so when you move from a mature space of 1B rows of data, to a growing space of 10k rows of data - it's true, you do make a trade-off where you will only get results on tests with more effect size. But you also are making a second trade-off, which is that in a new space you don't know anything about the product or customer yet. So you get the opportunity to test much bigger swings, things with much higher effect sizes, things like which entire product lines work or what entire segments of users fit. Things that, to me at least, are more interesting.

So if you ever find yourself worried about working analytics in a space of "only" thousands of rows of data - don't be. Not only has the history of statistics largely been developed in this space, before the hyper-modern era of Big Data. But also, there's usually a second trade-off: It usually means you'll get to work on interesting problems.

Edmund Helmer

In Defense of Small Data - Part 1: Statistical Power Is Boring

In Defense of Small Data - Part 2: Don’t Fear the Anecdata