When I was in graduate school, economists discovered clustered standard errors. Or so I assume because it almost became a joke that the first question in any seminar was "did you cluster your standard errors?"

Lately I've been getting the same question from referees on my field experiments, and to the best of my knowledge, this is wrong, wrong, wrong.

So, someone please tell me if I'm mistaken. And if I'm not, a plea to my colleagues: this is not something to write in your referee reports. Please stop.

I guess I should explain what clustering means (though if you don't know already there's a good chance you don't care and it's not relevant to your life). Imagine people in a village who experience a change in rainfall or the national price of the crop they grow. If you want to know how employment or violence or something responds to that shock, you have to account for the fact that people in the same village are subject to the same unobserved forces of all varieties. If you don't, your regression will tend to overstate the precision of any link between the rainfall change and employment. In Stata, this is mindlessly and easily accomplished by putting ", cluster" at the end of your regression, and we all do it.

This makes sense if you have observational data (at least sometimes). But if you have randomized a program at the individual level, you do not need to cluster at the village level, or some other higher unit of analysis. Because you randomized.

Or so I believe. But I don't have a proof or citation to one. I have asked some of the very best experimentalists in the land this week, and all agree with me, but none have a citation or a proof. I could run a simulation to prove it, but surely someone smarter and less lazy than me has attacked this problem?

While I'm on the subject, my related but nearly opposite pet peeves:

  • Reviewing papers that randomize at the village or higher level and do not account for this through clustering or some other method. This too is wrong, wrong, wrong, and I see it happen all the time, especially political science and public health.
  • Maybe worse are the political scientists who persist in interviewing people on either side of a border and treating some historical change as a treatment, ignoring that they basically have a sample of size of two. This is not a valid method of causal inference.