# Pragmatic information theory

Back in the day, I spent a lot of time thinking about information theory, and Suspended asked me to share some thoughts on what a more “pragmatic” version of information theory might look like—one that describes actual human or animal communication, rather than just a passing of digital messages.

I’II start with a little preamble, which is that, basically, this isn’t a weakness of information theory, just a lack of ambition from the field. Claude Shannon didn’t get around to it was hard and he was working on other things, and most of those who have taken up information theory have either gone deeper into its use for digital protocols, or applied it to human language in a much more macro way than this. There are exceptions, but I haven’t seen anyone want to actually do something like “Make linguistic ethnomethodology rigorous with information theory,” and that feels like money being left on the table.

Information theory today is about communicating a fixed set of symbols and making sure you get that out on the other end. That’s not what we usually think of as communication.

What we need more than anything is a definition that doesn’t force us to define all the psychological ways communication happens, but is closer to reality than “just transfer this string of symbols.” I’d be willing to go far as saying that science has only helped us understand humans insofar as it has given us mathematical abstractions that are better than studying humans directly. So what’s the mathematical abstraction that will take us one step further than strings of symbols over a noisy channel?

I’d argue it’s a combination of information theory, causality, and statistics. Information theory introduces the idea of a noisy channel, where I write a bunch of symbols that can be corrupted or flipped in the transfer. Now in communication, what we actually care about, is I send you symbols, and what I get back is what you end up doing with those symbols.

And what we want to establish is the capacity of that channel. Now how do we establish the capacity of the channel, given that I don’t necessarily know what I’m doing to you with those symbols? I (for example) try to see all the ways in which I can manipulate you, and how reliably I can do so, and use that as a capacity of the channel. This definition is precisely causality, right? It’s what I can do, to flip different bits or symbols, that ends up causing something. I’m trying to deconstruct the system into a graphical model that tells me which nodes are causing which other ones, and which ones just happen to be correlated with each other, and are caused by some other, underlying variable. So what we want is information theory where we discover the capacity of different channels by probing the causal framework. We also have to think about what errors might arise from different kinds of channels and observations, so we’ll need some classical statistical machinery.

What we need is a description of communication a bit more nuanced than this that describes some of the boundary conditions of human social coordination, take a look at what kind of second order effects they might lead to, then see whether we can simulate them with LMs and look for their traces in people.