Learning to manipulate

by Possible Modernist

In his book, Human Compatible, Stuart Russell makes what seems like a very strong claim about content-selection algorithms for social media. Russell writes:

Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user’s preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. … Like any rational entity, the algorithm learns how to modify the state of its environment–in this case, the user’s mind–in order to maximize its own reward.

The likening of recommendation systems to rational entities aside, the idea that we are being influenced by algorithmic content selection is not a particularly controversial claim. But suggesting that these algorithms are explicitly learning to modify our preferences, rather than learning to present things we’re likely to enjoy, is not something I had heard before.

Unfortunately, it’s not clear from the context how seriously we are meant to take this claim. Russell is one of the most notable people in the field of AI research, and runs a lab devoted to such topics, so it’s entirely possible that this claim is based on research being conducted by his own group. On the other hand, the claim is being made in a book for popular consumption, so there is no citation given, and it could perhaps just be speculation.

Searching around, I eventually found a video (“Social Media Algorithms Are Manipulating Human Behavior”) in which Russell repeats the claim, in a slightly more extreme, but also slightly more hedged, form:

You can get more long-run clickthroughs if you change the person into someone who is more predictable, who’s, for example, addicted to a certain kind of violent pornography. And so YouTube can make you into that person by, you know, gradually sending you the gateway drugs and then more and more extreme content. … And so it learns to change people into more extreme, more predictable mainly, but it turns out probably more extreme versions of themselves.

The host, however, quite sensibly asks for clarification: Why is the person that’s extreme more predictable?”

Only then does Russell explain that,

Well, I think this is an empirical hypothesis on my part, that if you’re more extreme, you have a higher emotional response to content that affirms your current views of the world. … What I’ve described to you seems to be a logical consequence of how the algorithms operate and what they’re trying to maximize, but I don’t have hard empirical evidence that this is really what’s happening to people, because the platforms are pretty opaque.

We can forgive Russell some of his imprecision, given that he is speaking extemporaneously, but this nevertheless makes it absolutely clear that at least some of what he is saying is conjecture on his part, which is information that is sorely missing from the claim in his book (and other places where he has made similar statements).

Breaking this down a bit, I don’t believe there is any evidence (that I am aware of) that content recommendation algorithms have somehow systematically learned to progressively get people addicted to a certain kind of violent pornography”. Indeed, the whole radicalization” hypothesis seems to be quite weakly supported, with at least one study of actual user behavior on YouTube finding no trend towards more extreme content or any evidence of recommendations leading to more extreme preferences.

Nevertheless, upon further reflection, I think the core claim that Russell is making (that recommendation systems learn to modify our preferences) is actually true, but in quite a trivial sense.

If we imagine a new user coming to a platform, the recommendation system knows nothing about them (except a weak prior), but there is also a huge amount that the user does not know. For any given person, there is potentially a whole universe of content that they have no preferences about, specifically because they are not even aware that it exists! Thus, at least initially, a good way for a recommendation engine to maximize rewards is to get the user to click on something that a) they are not familiar with, and b) that if they like it, they will watch much more of the same thing. Moreover, this is something that should be fairly easy to learn.

Assuming a new user views something they were previously unfamiliar with, the system can reasonably be said to have changed their preferences, in the sense that they now have at least some preference about it. And if it’s the type of content that has a huge back catalog (such as, say, a prolific podcaster), it will be especially easy to make future recommendations.

In other words, I think it’s fair to say that recommendation systems do learn to modify user preferences, at least to the extent that they learn to suggest content from channels that have the potential to be unfamiliar to some users, and which some number of such people are likely to enjoy. This clearly does not exhaust the space of possibilities, but seems almost certain to be some part of it.

It still seems bizarre to me to suggest, as Russell does, that the recommendation systems don’t learn to suggest content that people will enjoy, as there is also almost certainly at least some of that happening. We can also conclude that Russell is also at least a little bit cavalier in how he presents his hypotheses. But modifying preferences? Absolutely. That’s something that happens every time we engage with any content. It’s just happened to you.