Divide by less

From Slime Mold Time Mold’s “On the Hunt for Ginormous Effect Sizes”:

But here’s something they don’t always tell you: p-hacking is only an issue if you’re doing research in the narrow range where inferential statistics are actually called for. No p-values, no p-hacking. And while inferential statistics can be handy, you want to avoid doing research in that range whenever possible. If you keep finding yourself reaching for those p-values, something is wrong.

…

Studying an effect that is truly ginormous makes p-hacking a non-issue. You either see it or you don’t. So does having a sufficiently large sample size. If you have both, fuggedaboudit. Studies like these don’t need pre-registration, because they don’t need inferential statistics. If the suspected effect is really strong, and the study is well-powered, then any finding will be clearly visible in the plots.

The point of science isn’t to achieve perfect correctness in the global knowledge game, getting a high score on the List of True Things. It’s to figure out things that will improve people’s lives when that knowledge is acted upon. People are imperfect detectors of signal, of course, and statistics have their use in some situations. But grubbing in the soil for p-values is an unhelpful default we should wean ourselves off of. If the effect is a subtle nudge that disappears when you look at it in the wrong light, we could spend a lot of time arguing whether that wrong light is the “real light” or if there’s “really something there”. Or we could just stop and say: well, whether or not there’s something there, it’s not that much of something, and there are plenty of Muches of Somethings out in the world, so why not look at one of those instead?

Take scurvy. If you go months without vitamin C, you get Much of Bad Things Happening to You. Get some vitamin C in you, and Much Of Return To Health Happens To You. This is what a deficiency disease actually looks like. It’s Much! No one is running a meta-analysis or looking at the p-value to determine if vitamin C has an effect on scurvy; the analysis is You Won’t Believe What Happened When These Sailors Ate Some Watercress! If you’re looking in to a particular nutrient deficiency and you need to tease apart a maybe-effect with statistics, then whether or not it lands on the side of “significant” or “not significant”, it’s clearly not that significant.

You don’t need to be a nutrient truther to default to being skeptical about science that doesn’t ground out in anecdote.¹ But what about the other way around: anecdote that doesn’t replicate in science? Supposedly one of the chief purposes of science is to soberly look at a presumed trend and see whether it fails to replicate. But what does it actually mean when an anecdotal giant effect doesn’t hold up in general?

Well, what does “in general” mean? All people, everywhere? “Unprotected sex results in childbirth” fails to generalize for everyone, but it’s obviously true and a ginormous effect. If your volunteers were from a community filled with mostly people incapable of giving birth, you could probably make a study where the sex→childbirth link wasn’t statistically significant. But all that means is that your representations were wrong.

This is obvious because “capable of giving birth” is something we can easily recognize and are used to knowing about ourselves. But every effect of the form “highly meaningful for a few, largely unnoticeable in others” could be governed by similar dynamics! If you have a scientific study showing that an intervention was a great help for 5% of people and irrelevant for the other 95%, your job isn’t to open a Jupyter notebook and run a significance test to figure out if that intervention is “real science”; it’s to figure out what language describes the commonalities of that 5% such that anyone reading along at home will know if it’s worth trying for them. Our modern conception of science often gets this exactly backwards: by “removing outliers” and bounding findings in some scoring system, effects that are obviously ginormous when you see them in front of you get dully labeled as “failure to replicate” because you picked a lot of people it wasn’t true for also. But that’s your fault for assuming your ontologies were already adequate! It doesn’t make the effect less ginormous!

When an effect is anecdotally ginormous, all that a study can ever truly disprove is “The granularity of description I’ve chosen is enough to predict who ought to expect an effect”. If you see high magnitude for small numbers of people, then you know there’s something for you to find and it’s just relying on proper ontological work to create the new conceptual handle you need. It’s a damn shame that so much of science is focused instead on small magnitude effects that presume to be held for large numbers of people. There’s no universal person, and the sooner we accept that and stop centering tests that equate evidence against universality with evidence against effect, the sooner we can get back to proper ginormous science.

I am like 85% a nutrient truther.↩︎

Links to this post

Collin’s Razor: Look at the biggest and smallest results

Date

July 27, 2022

Author

Collin Lysford

Divide by less

Links to this post

Date

Author

Tags