Everyone is not the same

by Suspended Reason

Pulled this together from old notes circa 2021. Reading Collin’s Representation & Uncertainty a few months later helped me clarify my thinking on some of these issues. As a wordcel, I no doubt make some mistakes here, but I wanted to lay out a crude/naive version of the argument, because (1) it helps me understand why early stages of the rigorizing pipeline need to be performed before statistical analysis becomes useful (2) my writing the clumsy version makes it more likely that @collin.lysford will be motivated to respond with a more sophisticated version and correct my errors… It’s also very possible that he’s already written about these ideas elsewhere, but perhaps it’s useful for them to be in the TIS ecosystem and written up in slightly different language. I have no problem if this doesn’t end up on the blog if we don’t think it’s rigorous enough or a meaningful contribution.

Isaac Sapphire writes:

Everyone is not the same

As I keep poking at the gordian knot that is nutrition, weight, diet, exercise, and all the cultural shit wrapped up around them, the biggest thing that jumps out is that different people are different. The plan that’s good for a 6’ 1” male movie star who’s prepping for a role under the guidance of the trainer he’s been with for a decade and literally goes to the gym as his job is going to be WAY different than the advice that’s good for a 5’ 3” mom who’s binge eating yoyo diet habit is clearly ballooning her weight and destroying her ability to walk and going land her in an electric scooter and probably a grave earlier than necessary, which is different advice than is good for an anorexic with a BMI of 16, which is different than advice that is good for a young person who just started a physically intense job or a person packing for a month-long back country hike.

Culture, class, race, religion/ethics, and gender are packed in to this so densely its insane.

Indexicality is everywhere, and yet inexact science fields’ over-reliance on statistics leads them to search for universal insights.

You know those studies of new medications, where they find it only improved outcomes for like 12% of people in their target demographic, so they abandon it? Sometimes that 12% constitutes its own meaningful subdemographic, for whom a cure with a near-universal success rate was just discovered and trashbinned because researchers couldn’t figure out what properties defined that subdemographic, or how to reliably identify its members. There are no statistics in the Kingdom of God: As we develop an increasingly comprehensive list of preconditions, the success rate of our interventions should approach 100%.

Race, gender, etc are only the most legible, conspicuous axes that people vary on. Most of the iceberg of interpersonal (psychological, neurological, biological, environmental) variation we don’t even have words to describe yet. This is why conceptual engineering—a mode of philosophy, taking place early in the rigorizing pipeline—is so essential to good statistical practices further down the pipeline. You’ll never get statistically significant results if your sampled population is too diluted, and you won’t know whether it’s diluted unless you’ve properly conceptualized the space.

A psychiatrist friend mentioned that the crappy statistics for SSRI efficacy are largely the result of SSRIs getting prescribed to lots of people who are depressed in very different ways for different reasons. You can imagine there being an SSRI-treatable neurochemical imbalance which displayed symptoms superficially similar to those of radically different sorts of problems. (E.g. people who are depressed because their lives are genuinely out of control and they never developed a strong sense of agency.). So SSRIs get used on a buncha people with Y condition, and a buncha people with !Y condition, so that the overall efficacy looks comparable to placebo; as a result, many rationalistic commentators dismiss SSRIs, and think we need to abandon neurochemical theories of depression. OTOH, actual practitioners in the field continue to use it, and to find it useful—likely in part because they themselves are already doing the informal, soft” conceptualization work up front of identifying patient profiles which are more or less amenable to SSRIs.

Another more cutting example: The same dose of Lexapro can lead to radically different blood/brain levels of Lexapro, depending on individuals’ liver metabolisms. This means that many people in a trial of 10mg Lexapro are functionally getting 2.5mg Lexapro in their bloodstreams, but they’re lumped in as interchangeable for the purposes of the study—which, no wonder why such a study might show poor results.

In conversation with RIPDCB and Neil last year about fantasy football, we talked about sports analytics, and how you’ll get radically different analytic claims depending on how you decide to represent a given situation. Neil called this the central postrationalist insight.” For instance, a naive statistical analysis would say, X player has Y injury, only 18% of players ever recover to peak performance after Y injury, don’t bet on this guy. But an analyst with a better understanding of the game would put together a finer-grained category for comparison: player X is under 30 years old, he plays a position where Y muscle group isn’t as important, and we know he has a lot of grit, he’ll be in the gym grinding his PT which expedites recovery time. Crispy Chicken called this conceptualization,” which I think is a good word for it.

NB: This dynamic must already be well-understood and accounted for in statistics—but it seems to be under-accounted for in actual practice, when it comes to statistical analyses performed by non-mathematicians in fields like psychology and nutrition. Perhaps in part out of a fear of (being accused of) p-hacking. (Have preregistration practices exacerbated the problem?) If anyone can point me to more information about these dynamics, I’d be indebted.

I’ll close with a quote from Slime Mold Time Mold responding to Maciej Cegłowski’s 2010 essay Scott And Scurvy.” Tracing the intellectual history back a bit (from Collin/SMTM/Mastroianni’s writing) Scott and Scurvy” seems to have played an important role in making these ideas intuitive:

We’re taught to see splitting — coming up with weird special cases or new distinctions between categories — as a tactic that people use to save their pet theories from contradictory evidence. You can salvage any theory just by saying that it only works sometimes and not others — it only happens at night, you need to use a special kind of wire, the vitamin D supplements from one supplier aren’t the same as from a different supplier, etc. Splitting has gotten a reputation as the sort of thing scientific cheats do to draw out the con as long as possible.

But as we see from the history of scurvy, sometimes splitting is the right answer! In fact, there were meaningful differences in different kinds of citrus, and meaningful differences in different animals. Making a splitting argument to save a theory — maybe our supplier switched to a different kind of citrus, we should check that out” — is a reasonable thing to do, especially if the theory was relatively successful up to that point.

Splitting is perfectly fair game, at least to an extent — doing it a few times is just prudent, though if you have gone down a dozen rabbitholes with no luck, then maybe it is time to start digging elsewhere.

Scurvy isn’t the only case where splitting was the right call. Maybe there’s more than one kind of fat. Maybe there are different kinds of air. Maybe there are different types of blood. It turns out, there are! So give splitting a chance.

Reality is weird, and you need to be prepared for that.