Things which trend downwards
When long-term participation or payment is gated by short, high-stake selection games, selection candidates engage in seduction and advertising. Whether they are winning attention, proving their merit, or enticing a selector into commitment, the candidates are on their best behavior.
As a result, we can expect that most situations which are gated this way will degrade overtime. Brands, personalities, product—Uber’s pricing will always go up; on-flight amenities will slowly drop off after the introduction of a new airline.
Certain things have lifespans. It is more expensive to repair than replace them, or repair is little more than compositional replacement. When you create or purchase such objects, they will be in their best shape at the very beginning, and everything afterwards will perform more poorly.
Many new cultural practices and techologies are “solution fads” to specific types of interpersonal strategy games. Their efficicacy is directly bound up in how many people have adopted and actively deploy or recognize them. The anthropic principle for discovering any social fad—be it through recommendation (social, algorithmic) or cultural exposure—is that, cet par, you are late to it, or at the peak of its adoption, because this is when most individuals who are exposed will be exposed. Gentrifying neighborhoods, popular travel destinations, hip bars—these things will typically be in worse shape a few years after you discover them, than when you first discovered them.
I said, cet par, things you encounter will be at the halfway point of their lifespans, a la normal distribution. Trends will be peaking and mainstreaming. The empire will be middle-aged. The tipping point—the peak of the hill, the spot where the rollercoaster starts tipping into gravity, where entropy takes its toll and structure falls apart.
Ceteris is rarely the paribus, of course—but this lets us establish a baseline against which to understand deviations, and the dynamics which drive them.
solution fads fashion stagnation selection games marketing seduction
Caught in the torque
Suspended Reason recently wrote a post about the concept of torque policy, which we can summarize as recommending a course action that exceeds what is strictly necessary, in part because you assume that most people will fail to fully adhere to the recommendation.
There’s a bunch of great examples in that post, but the one that immediately came to mind for me was the question of drinking alcohol during pregnancy.
This is sort of a complicated one for all the reasons that pregnancy is complicated: it involves truly dramatic structural and hormonal changes to the body, hopes and fears about the future, and a real possibility of things going badly wrong. Unsurprisingly, there are multiple industries built up around stoking and satisfying the desires of expecting parents to prepare for, celebrate, and safeguard the future of their unborn child.
Although they don’t quite benefit in the same ways as other players, doctors are an instrumental part of this ecosystem, as they will provide the expert advice on what women should and should not do during pregnancy. For many women in the U.S., part of the advice they will get from their doctor will be to completely avoid alcohol while pregnant.
We know enough to know that heavy drinking during pregnancy can lead to profound impairments for unborn children, as in fetal alcohol syndrome, and for that reason, it seems important to make sure pregnant women know that they should avoid excessive alcohol consumption. But what is excessive? Well, that’s where things get complicated.
First off, it’s going to depend on the person (e.g., body weight, etc.). But more importantly, we just don’t have a very good scientific understanding of this. Overall, there seems to be very little evidence that mild alcohol consumption will have detrimental effects, but it’s also a case where running a randomized control trial is probably impossible. On the other hand, it is usually reasonable to assume that most things have some sort of dose response, where the more you drink the worse it will be. Unfortunately, we just don’t know where the “safe” line is, if there is one.
All of the above is probably familiar to most people, but here’s the interesting question: where is the torque?
The obvious answer is that the doctor is torquing their patient: they know that if they say an occasional glass of wine is fine, then some women might drink more than that, which could be getting into dangerous territory. So they just tell everyone that they shouldn’t drink anything, in hopes that those who would cheat will end up sticking closer to zero than they otherwise would have.
But is that actually how doctors are thinking about this? Perhaps the real torque is on the insurance side of things. Perhaps there are liability issues, where doctors know that they could be held responsible for even things that are not strictly their fault, and so they are being torqued (across all diagnoses) into being extra cautious.
Alternatively, perhaps the torque is being conducted by medical advisory boards or scientific agencies putting out broad recommendations which they hope that doctors will follow. Perhaps they expect that if they tell doctors to explain all of the nuances to their patients, then some doctors will be excessively lenient in their recommendations, and so those writing the policy are attempting to torque the doctors.
Or is it possible some patients are in some sense torquing themselves? Perhaps some women fear that if they internalized the fact that there is little or no evidence of harmful effects from mild alcohol consumption during pregnancy, then they might get careless at some point, and so prefer to receive and believe the recommendation to remain teetotal?
To be honest, I don’t know enough about how the medical system works behind the scenes to know which of these might be relevant. Nevertheless, it is interesting to think that torque policy may in some cases be quite distributed, and hard to pin down to a single source.
torque policy alcohol diet pregnancy self-experimentation ACiM
Honesty is the best policy
Saoirse Ryan, DAO’n’Out: A History of Crypto Parties 2016-2020:
To the properly house-trained individual, the truth becomes an excellent source of excuses. Dallas was not yet house-trained, but he was getting there. And contra story logic, those instances of laziness and shirked duty do not, inevitably, eclipse those instances of observance.
“Honesty is the best policy” is strongly contingent on strong cultural technologies for monitoring. (The monitoring constrains, so your self-presentation has to be ballpark with what a third-party observer would witness/tell from monitoring tech.) It is highly socially and culturally indexical as advice, e.g. it is most true if you live in a small community that is ecologically huddled such that everyone is monitoring everyone else, nearly all the time. The more distant, and less huddled or ecologically proximate, individuals are, the less monitoring capacity. (The more costly it is to monitor.)
Monitoring makes it costlier to lie on several dimensions. The liar must change or constrain his behavior to match the lie whenever witnessed, on whatever information channels are monitored. He must invest considerable cognitive overhead in maintaining his story. And the chance that he is discovered and punished retributively are higher.
opticracy deception honesty reputation ledger strategic interaction
Pragmatic truth-seeking leads to correspondences
by Crispy Chicken and Suspended Reason
- It’s not clear that we’ll ever have clear correspondence to a general material truth. Indeed, information theoretically, it seems unlikely that we can document enough about the universe to reveal such patterns.
- However, there’s reason to believe we’re getting at real correspondences when we’re reliably able to get things done.
- This is often called instrumental truth, it’s the direct output of deciding something is “true” if it gets you want.
- Imagine a situation in which instrumental truth, didn’t derive correspondences: this would imply that you’re somehow tricking yourself exactly right every time, into getting things done.
- It would be like one of those bits in a sitcom where a stupid character, through a bunch of highly unlikely events, manages to pull off something deftly by random chance.
- But that’s exactly it: these things can’t happen a lot because they are overwhelmingly unlikely as the heuristic continues to work.
- The natural rebuttal is: maybe these heuristics only work in certain contexts, and fair enough!
- But the idea that we know the absolute contexts of our statements is insane. This fact is already true about all but constructed mathematical statements.
- Scientific theories often have more robustness, because they are put through processes that are meant to filter out indexical truths.
- However, this leaves us in a strange situation: we call non-indexical truths and become estranged from our access to indexical truths whose scope we can’t characterize.
- This is about information theoretic description complexity: you don’t know how to express the contexts of your instrumental truths, because you haven’t been exposed to enough diversity of cases to express the contextual variables within which they hold.
pragmatism truth correspondence theory evolutionary epistemology patterns instrumental truth information theory
40 minute meetings
So, one piece social tech I’ve found useful lately is the 40 minute meeting. Specifically aren’t calls that go for 40 minutes—they’re calls that are planned to go for 40 minutes.
The idea is this: Some conversations, your partner will dread a long, extended call. Maybe they guard their time closely, maybe they aren’t as invested in meeting. Other conversations, your partner might be excited to chat, might be more invested. Maybe you don’t want them to feel short-changed, like you’re being stingy with your time.
So you suggest a 40 minute meeting. In their heads, “mood-wise,” they can either round this up to an hour, or down to thirty minutes. And when it comes time to meet, they can easily call it quits ten minutes early—no harm no foul—or they can participate in extending the call to a full hour without feeling like they’re imposing.
Consider the alternative: if you leave halfway through an hour-long call, or extend a half-hour call to an hour, you can plausible get called an asshole, or insult/inspire resentment in your conversational partner.
Two interaction patterns I see in play here:
You are providing both yourself and your partner with breathing room to coordinate and optimize on the spot, when the time comes, rather than committing up front to what may be sub-optimal terms. In other words, you are staying empowered.
You are giving both your partner, and yourself grounds for justifying or defending either a 30 or 60 minute meeting as “appropriate” or etiquette-respecting behavior. This is essentially a preemptive removal of potential future “attack surface” by playing to some envisioned hypothetical social court’s sense of “reasonableness.”
narrative defensibility justifiability reasonableness empowerment commitment coordination ethnomethods
Drinking steel
In a recent post Feast of Assumption quite correctly points out that we should not unduly conflate scientific showmanship and self-experimentation, with the latter activity being an important avenue for advancing knowledge, and in some cases the only avenue for ethical experimentation.
I mostly agree, and yet, I can’t quite help shake the feeling that there is something else going on with a lot of the actual self-experimentation that takes place under the name of science.
Later in the book which I quoted in my original post, the author returns to Boyle, and discusses his extensive attempts to treat his various ailments and protect himself against the onset of gout and “the stone”.
“In Hooke’s view, the treatments provided by contemporary physicians were as likely to kill as to cure…. On the other hand, Hooke was entirely committed himself to a therapeutic regime of drug taking. … Hook was prepared to swallow any number of potentially dangerous substances, producing violent and disagreeable reactions in the body, with alarming frequency. On 1 September 1672, for example, he ‘drank steel’ (a mixture prepared by quenching hot steel in wine or, alternately, steeping a piece of steel in wine for several days). On 2 September, he ‘tasted tincture of wormwood, eat raw milk, wrapt head warm and slept well after’. On the third, he took three ounces of infusion of crocus metallicus and vomited. (He also had an orgasm, which he regarded as part of his therapeutic regime, and ‘slept pretty well’.) On the fourth, he purged seven times and reported feeling ‘disordered somewhat by physic’. … On the eighth and fifteenth respectively, he drank three pints of Dulwich water (a proprietary medicine laced with metallic compounds), which refreshed him. … The entry for 22 September 1672 runs: ‘Read Serlio’s Treatise on Architecture, took syrum of popys [opium], slept little with sweat and wild firghtfull dreams.’ He took opium again on 29 September.”
It’s fascinating to read about one of the founders of the scientific revolution so rampantly experimenting with injecting various substances. The willingness to experiment clearly exhibits an impressive degree of courage, curiosity, and belief in discovery, and yet the lack of systematicity seems to limit the potential for producing useful knowledge. Indeed, despite his varied intake, Boyle seems to have never discovered that drinking metal is not great for you, as a general rule.
As a contrast to Boyle, consider this description of by Joseph Heinrich of the process used by indigenous peoples for removing cyanide from manioc (quoted in Scott Alexander’s review):
“In the Americas, where manioc was first domesticated, societies who have relied on bitter varieties for thousands of years show no evidence of chronic cyanide poisoning. In the Colombian Amazon, for example, indigenous Tukanoans use a multistep, multiday processing technique that involves scraping, grating, and finally washing the roots in order to separate the fiber, starch, and liquid. Once separated, the liquid is boiled into a beverage, but the fiber and starch must then sit for two more days, when they can then be baked and eaten. Figure 7.1 shows the percentage of cyanogenic content in the liquid, fiber, and starch remaining through each major step in this processing.”
Surely, one would think, not every step of this is required, and we could find a better way. But the author’s point is partly that deviation from what is known to work is risky, especially when you’re talking about the potential for poisoning yourself. Manioc is an especially pernicious case, as the ill effects of not properly removing cyanide might not be observable for decades.
Clearly the willingness to experiment (even on oneself) has been an important part of the engine which has transformed the world, and those who were willing to experiment on themselves have contributed to our understanding, sometimes at great personal risk or cost. And yet there is something about Boyle’s compulsive self-experimentation which makes me think first and foremost of a kind of Hunter S. Thompson-like obsession with sampling everything, which can’t be entirely separated from a kind of bravado.
With respect to those who developed and self-administered early vaccines for covid, I do tend to agree with Feast of Assumption that they were likely motivated more by their ability to (hopefully) protect themselves and their loved ones in an overly-restrictive regulatory environment, more than they were by any performative impulse.
There is an interesting question, however, as to whether there was any path by which their work could have plausible led to a widely distributed vaccine. Presumably they knew that such an outcome would almost certainly be reserved for those following the traditional government-approved channels, with limited prospect of approval for something only tested in unauthorized trials.
There was certainly some value in their self-experimentation (we at least know that they weren’t immediately killed by the nasal spray), but it’s unclear how much has been learned from their work, and it doesn’t seem to have had much benefit beyond the original recipients. Motives were no doubt mixed, as they almost always are, but the path they chose, combined with the publicity, leads me to think that at least part of that mix was a bit of the old “look what we can do” machismo, rather than just a pure quest for knowledge.
science self-experimentation drugs diet progress Robert Hooke Joseph Henrich Hunter S. Thompson
Short-term v. long-term in selection games
Crossposted from Suspended Reason.
Natural Hazard prompts:
“The biggest problem with misrepresenting who you are in dating is that you might succeed.” Generalized: “The biggest reason to not strategically misrepresent yourself in a selection game is that you might succeed.” What properties of a selection game determine if or when this is good advice?
Selection games, in my factoring, are ~entrance examination, hosted by one (super)organism to evaluate another (super)organism, prior to admitting the evaluated party past city gates (so to speak). These superorganisms, wrapped as they are by the constructed and constantly maintained boundaries (walls) necessary for preserving homeostasis, are faced with the existential problem of bringing in goal-advancing resources from the outside—energy, minerals, symbiotic agents—while simultaneously preventing entry by adversarial forces—chemicals, fungi, parasites, spies—whose intentions or effects run counter to the superorganism’s. Hence these boundaries’ semi-permeability.
At many levels of organization, this detection system is performed by some combination of intelligent evaluators, automated tests, and opticraticratic assessment. The evaluated party always wants in, and the selecting party only wishes to admit those evaluateds which will advance its own goals as host and selector. Thus while the selector/evaluator wants “truth,” in an evolutionary epistemology sense of the word, the evaluated is agnostic to truth; his primary interest is entry, and if strategically minded, he will present any version of himself which will secure this entry.
The question “When does the applicant shoot himself in the foot through misrepresentation?” then translates to the question, “When does an applicant not actually want an unconditional admission?” Alternatively, “When are the evaluated party and the evaluating party deeply aligned as to the selection game’s goals and outcome?”
(Because we are dealing with questions of alignment, we must recognize, following Schelling, that motivations are always mixed; there is never a situation of total misalignment and thereby conflict, nor of pure alignment and therefore cooperation. I will speak of what is in reality a spectrum as if it were dichotomous; when I say “aligned” you may substitute “approaches the limit of alignment,” and so on.)
I fear my answers will be somewhat underwhelming, but I’ll lay them out anyways.
One such occasion: when the applicant shares—is aligned with—the larger agenda of the hosting superorganism. That is, the superorganism hosting the selection game does not seek entrants for its own sake, but for the sake of some other goal or project, in which they enlist the entrants. If a volunteer truly and purely believes in the mission of the NGO he applies for—if he is relatively indifferent to the possibility of (say) living abroad, working on a team, gaining moral relief, and instilling their life with meaning—then his interests, and the interests of the committee assessing him, are aligned. On a date, or even in a sexual hookup encounter, the more one party is interested in the other party’s future experience and outcomes, the less incentive there is for deception. To generalize: cet par the extent to which an applicant, in a selection game, is aligned with the larger interests (beyond the selection game) of its hosting institution, the more the applicant is aligned with the interests of the hosting institution within the selection game, i.e. the less incentive there is for misrepresentation.
A second occasion: when the selection game is only a preliminary stage in an ongoing, always revocable assessment to determine fit, in which a selected party does not receive rewards linearly w/r/t how long he goes without having his entrance permissions revoked. Compare three highly lucrative, prestigious grant awards, each of which covers a four-year project funding period, with recipients’ activity closely monitored. In the first, funding and title are given in toto, up-front, and cannot be revoked. In the second, funding is doled out steadily, month over month, and one’s “title” works much like job history on a CV—cet par, the symbolic capital allocated roughly tracks how long one held the position. Finally, in the third, funding and title are only given retroactively, at the end of the four years. In the first get-up, there is strong incentive for scammers; in the last, there is very little. The constant, intimate surveillance that comes post-entrance (“within the city walls”) allows much closer, harder-to-fake assessment, and a scammer who is not qualified will be expelled; they will have gained nothing, and only wasted time. We can track similar dynamics in sexual hookups vs. dating—someone looking for a long-term partner is much less incentivized to misrepresent himself than someone looking for a one-night stand.
This brings us to the third occasion, related to the second: when the benefit of entrance is only dispensed when the match in actuality is “correct.” We see constant scamming to gain entrance to universities, because even if one graduates from an elite college with poor grades (e.g. because the program was too academically demanding), a significant earnings advantage, network advantage, etc over peers who attended less rigorous programs is conferred through title alone. While some especially poor students may be unable to graduate—in a situation similar to that of the second occasion—there are many students who would only gain admission through misrepresentation, but who are still readily able to graduate and receive the title. On the other hand, a university system where titles held no benefit (and where tuition prices were all equal, where one’s peers were unknown or randomized, etc etc…), and where the only or primary benefit of university was the education itself, then applicants would have strong incentive to find a “correct” match between themselves and the universities, so that the program fits their needs and weaknesses.
The fourth and most obvious occasion is: when misrepresentation may be discovered and caught. Misrepresentation in selection games where honesty is legally required can risk costs far exceeding what is gained by selection.
I think together, these three occasions explain why, in fact, there is relatively high alignment in many games between selected and selector, while highlighting which games we should expect to be most scammy. They are not perfectly distinct categories; many overlap, and could well be re-carved; but together, they hopefully give a sense of the terrain.
long-termism short-termism selection games deception strategic interaction superorganisms opticracy alignment Thomas Schelling principle-agent problem representation symbolic capital
Boing! Or; utility is not a function
This article was originally published in a dream I had, where it was titled simply “Boing!” The current title is chosen as a compromise between legibility and an appreciation of the original work.
Let’s say that you’re Hazard trying to jump as high as possible on an indoor trampoline. If you wanted to, you could conceptualize your jumping as a function. It takes in an input force and it gives you an output height. Then you could, as Hazard did1, imagine that you are not “jumping as high as possible” but instead “optimizing your jump function”.
This becomes a problem when you remember the trampoline is indoors. If Hazard jumps too high, he’ll bonk his head! Is that what he wants? Well, no; he just never thought about it. So fine, he’s not optimizing his jump function, he’s optimizing his utility function for jumps. It goes up as the jumps get higher and higher, but craters into negative utility if he hits his head:
This is allowed in the technical sense of how functions work. You can pick a different output for every input point, and “optimizing” can mean finding the highest point in the output, even if it doesn’t come from an extreme value of input. But then why did Hazard think that what he was doing was “optimizing a jump function”? Because his ceiling is pretty high and he doesn’t have rocket legs, so by just doing a regular jump, he’s not in serious danger of hitting his head.
If he was going to try to REALLY “optimize his jump function”, he could try to make a small explosion happen under his trampoline, or jury-rig a homemade jetpack, or all sorts of other things. And then he’d bump his head and say “wah!” and not actually be happy. “Optimize my jump function” for Hazard actually ended up meaning “Try my hardest to jump high, but only the regular kinds of trying, not any insane horseshit.”
This is a pretty common pattern of human behavior. Rather than trying to “fully explore” a fitness landscape to optimize the mystical “utility”, people will often just pick a frame that chops out the bad extreme values (no jetpacks!), even if it means not having the absolute most fun possible (notice that Hazard would get more utility out of a little jetpacking). Hazard isn’t walking around mathematically optimizing for the perfect jump; he’s creating a space where he can try his hardest at jumps without danger of overshooting, and then playing in that space.
Not all functions are uniformly increasing or decreasing, but when we find value in conceptualizing patterns as functions, it’s often because of those straightforward types of relationships. The more that the landscape is full of critical points, sudden switches, and other weirdness, the more we try to carve out a space within the landscape where we can follow simpler rules. To the extent that “utility” is an attempt to explain why agents actually do the things that they do (and what else could it mean?), we shouldn’t think of it as a single numeric value to be optimized. You need to consider the rules we follow when considering which things can even be thought of as functions. And while you could try to “compile” those rules into a master “utility function”, that’s not how you generated those rules, that not how you follow them, and that’s not how you’ll maintain them over time. So what’s the point of conceiving of your behavior as a “utility function”, aside from getting to pretend to yourself that mathematical models of behavior might someday start working better than the abysmal record they’ve had so far?
To be clear, I am not recounting a thing that actually happened, I am attempting to accurately transcribe an article I wrote in the dream where this thing actually happened.↩︎
Maui pool party
How the kids in the pool play: Find a base game or paradigm, then explore the nearby space of mechanisms and gimmicks. When local variations are exhausted, change the template and repeat, rotating the cast, switching into different pairing-offs and arrangements, each game giving rise to new alliances and rivalries. Games, or “shticks,” are often discovered through the steady accumulation of mechanics and gimmicks, which through their bundling give rise to emergent patterns and interestingness.
Some shticks are relatively narrow in possibility, and stable in arrangement—like tossing a football—but these activities are also the site of improvisation and variation, in throw style and effort exerted, in subskill explored and worked on. Children, especially children who play together regularly, are able to switch fluently between shticks, often without explicit communication; that is, stigmergically. But you can also tell who feels less comfortable together, more rigid, by the rigidity of their definition of the game, and their reluctance to stray outside the bounds of agreed-upon play, or to renegotiate the contract of play in vivo, or to trespass too much upon one another.
Older participants (older siblings, parents)—and the more experienced parties, which ages correlates with—tend to facilitate challenges for younger, less-skilled players. Play by these facilitating parties is often geared around creating and maintaining the right level of difficulty for their opponent. Third-party players who are, or feel they are, relatively close to the level of the facilitating party, may momentarily intrude on or participate in (what are informally understood to be) two-party games in order to reaffirm their matched skill status by playing “for real” what, in the two-party setup, is largely mock.
A large group of siblings, mixed age and sex, Von-Trapp style, are playing with a nerf football in the shallow end. Two of the boys, age 14 and 12, being playing toss. A third, age 10, emerges from the jacuzzi and becomes the “monkey in the middle,” trying to intercept the throws. When he does, the boy whose pass he caught switches out to the middle. Their two younger sisters, 8 year-old twins, begin jumping up on their older brothers, attempting to wrestle the ball, but in vain. To get a better angle over the 12 year old brother, who is now the “monkey,” the 14 year old stands up on a shallow step, so that he is a few feet above the surface, with a better angle to make and catch throws. Now, the young girls try a new game, to pull or wrestle their brother down from the step, so that he falls back into the pool. He dispatches each with ease, but you get the sense that they are not actually trying to push him, more that they enjoy the thrill of being picked up and thrown themselves by their brother. One tries to grab at his waist, and receives a very specific kind of grab’n’toss back into the pool. The next twin, immediately behind her sister, then replicates her sister’s grab, looking to get in on the specific game mechanic, and is treated to the same toss. In the midst of this, the brother is trying to catch and throw the football back and forth to his brother, and prevent its interception by the 12 year old. The 10 year old is getting somewhat tired of his available mechanics, and now tries to climb up the side-stairs, jumping off them into the water to catch his brother’s throws. Meanwhile, one of the lightweight younger sisters, on a particularly high toss into the water, nearly catches the football tossed her 14 year old brother’s way. Now, this becomes the gimmick: she asks him to launch her, and then she tries to complete the throw from her 10 year old brother across the pool. The goals and provisional relationships—who is competing, who is coordinating—have morphed radically.
Overwhelming legibility
One of the fundamental ways in which academia is productive is by organizing information. One might disagree fervently with the choice of what information is being organized, or even what should count as knowledge, but it is hard to deny that there are many processes at work that involve selection, curation, revision, collection, critique, and discussion. When it is working well, a scholarly community serves to bring together people with shared interests and knowledge, and facilitate discourse that will advance the state of theorizing or understanding. At a more basic level, scholars work to curate resources that tend to become more or less canonical, shaping the slate of information that the next generation will be exposed to. Such processes, however, are not necessarily unique to academia.
For various reasons (which don’t matter here) I’ve recently been poking around various writings related to what people often call AI safety and alignment. Although this is quickly becoming something of a hot topic, even within relatively academic circles, a huge amount of the conversation about these issues seems to have taken place in more informal venues, such as blogs, forums, podcasts, and even books. From an outsider’s perspective, it is somewhat difficult to know how to think about this work. A cynical view might consider it something akin to fan fiction — amateurs spitballing about big ideas and sharing their writings among friends. A more charitable view might look at it more like an emerging scientific community, perhaps even something akin to the republic of letters that flourished during the early stages of the scientific revolution.
Whatever it is, there is something of a paradox here: unlike much of academia, this body of work is incredibly open. The vast majority seems to be posted to publicly accessible parts of the internet. On top of that, a considerable amount of labor has already been done in terms of curating, linking, recommending, and trying to summarize or contextualize key ideas. In principle, this should make it all the easier to dive into. And yet, perhaps in part because of the sheer volume of writing, and the lack of a more traditional academic hierarchy, it seems unusually difficult to find truly accessible entry points into this literature.
Working through some of the chains of citations, many paths seem to eventually pass through Rohin Shah. He is certainly not the most central figure, but he has written quite a lot about this topic, including the long-running Alignment Newsletter . As a day job, he is a Research Scientist at DeepMind, and he holds a PhD from Berkeley, but he obviously doesn’t confine his writing to normal academic channels.
Browsing through Rohin’s website, I came across what I consider to be one of the most interesting passages I’ve seen in this entire space of ideas. On a Frequently Asked Questions page, Rohin provides answers to a number of questions. Under the question of “What should I learn if I don’t know much about AI alignment yet?”, he writes:
“If you’re used to academia, this will be a disconcerting experience. You can’t just follow citations in order to find the important concepts for the field. The concepts are spread across hundreds or thousands of blog posts, that aren’t organized much (though the tagging system on the Alignment Forum does help). They are primarily walls of text, with only the occasional figure. Some ideas haven’t even been written down in a blog post, and have only passed around by word of mouth or in long comment chains under blog posts. People don’t agree about the relative importance of ideas, and so there are actually lots of subparadigms, but this isn’t explicit or clearly laid out, and you have to infer which subparadigm a particular article is working with. This overall structure is useful for the field to make fast progress (especially relative to academia), but is pretty painful when you’re entering the field and trying to learn about it.”
I find this fascinating for several reasons. First, is there actually anything here that is unique to AI safety and alignment, or is this something that can be said about all fields of study? It is certainly not unusual to have knowledge spread across many disparate sources, often in quite impenetrable prose. Others have also previously discussed the importance of “tacit knowledge” in science — things that everyone knows but are not written down, or subtle aspects of how to work with a scientific apparatus that can only be learned by doing. And disagreements about importance are in some sense the entire ballgame in terms of what scholars do! So what exactly is different here? Is it mostly just that this is a relatively new field, and so doesn’t yet have the equivalent of a textbook? Or is something about the topic — perhaps a kind of mushiness in the ideas themselves?
A second interesting aspect of this is the nature of the community. In contrast to academia, which often relies on heavy doses of prestige and credentials in determining who is taken seriously, the people writing about AI safety and alignment seem to involve a much wider network of amateurs, writing in places where pretty much anyone’s work will be given serious consideration. Clearly various hierarchies nevertheless emerge, with popularity fueled in part by prolific writing and self-promotion. Some of the most notable people in the field, including Rohin himself, do have fairly standard positions within academia and/or corporate research labs, and yet much of their writing on these topics does not get published in formal venues, or even relatively informal places, like arxiv. Instead, this does seem to be a case of working out ideas in public, and a way that eschews the more formal trappings of academia.
Finally, consider the various infrastructural aspects. As Rohin points out, because so much of this writing is published in places like LessWrong, various features of that website, like tags and upvoting, have the potential to act as tremendous aids in navigating the terrain. Indeed, these are arguably much more informative markings than the typical signals found in much academic work. Moreover, almost all traditional academic fields have overwhelmingly failed at creating any kind of equivalent to this (even something as simple as comments on published work!)
All of this makes it seem like this open, informal, and annotated body of work should be far more legible than the equivalent in most fields of research. There doesn’t even seem to be a particularly large number of necessary foundational ideas to be learned (compared to something like say, biochemistry). And yet, somehow it remains very difficult to sort through, or to find anything like an authoritative voice.
All of this suggests a number of possible explanations for this combination of openness and opacity, but additional case studies would be extremely useful. Regardless, if nothing else, it remains a fascinating example of the emergence of a community, and arguably of an entire field, outside of traditional academic channels, and that by itself makes it worthy of further study.
AI safety alignment community science academia legibility blogosphere knowledge logistics prestige credentialism LessWrong rationalism
The world’s answering machine
Peli Grietzer, Amerikkkkka:
And I said, I said, ‘I spent the week deciding Kant was the first Modernist, then spent the weekend discovering that Clement Greenberg called Kant the first Modernist. Which is exactly what I hated about childhood the first time around: you thought you and the world were having a conversation but actually you were talking back to the recorded message on the world’s answering machine.’
Reading Erving Goffman for the first time was like this for me. It’s not that I thought my ideas about opticratics and selection games were original, but if you don’t know where to find the ideas, or you’re missing a searchable concept handle, your only recourse is plowing forward solo. Concepts may be re-invented at great energy expenditure, but the alternative is staying in the same place, pacing back and forth across your holding pattern. Research helps, but discovering novel frames takes more serendipity than systematicity.
Bryan Caplan:
The classic mistake of the old: Thinking there are no new ideas. The classic mistake of the young: Thinking your ideas are new.
So here we are in the middle-way, trying to strike a balance. Maybe novelty doesn’t even matter.
Peli Grietzer Erving Goffman Bryan Caplan knowledge logistics novelty
Five quick reminders about consciousness
We understand a lot about consciousness: If by consciousness one is referring to the nature of how humans perceive, experience, and attend to the world, there is a huge amount that has been discovered, as well summarized by authors like Dennett, Dehane, and many others.
We fundamentally don’t understand consciousness: If we look past the details of human experience, we are ultimately confronted with the fact that we have no good explanation for why or how it is possible for there to be anything like “the experience of something” at even the most minimal level, given our current understanding of the nature of the universe, and it’s possible we never will.
Consciousness exists on a spectrum: We experience varying levels of consciousness in ourselves, and it seems reasonable to posit that the same thing holds across different (types of) beings (which we assume are conscious on some level), but there are no obvious places at which to draw hard lines.
There is no test for consciousness: There are any number of reasons why we might theorize something to be conscious, but there is fundamentally no way to test for it.
Consciousness is a process: If nothing else, we can say that consciousness involves an experience of something in time. If we assume that consciousness is some sort of emergent property of information processing (as many do), consciousness cannot exist while information is not being processed.
(see also Consciousness is not strongly emergent by Crispy Chicken)