Overwhelming legibility
One of the fundamental ways in which academia is productive is by organizing information. One might disagree fervently with the choice of what information is being organized, or even what should count as knowledge, but it is hard to deny that there are many processes at work that involve selection, curation, revision, collection, critique, and discussion. When it is working well, a scholarly community serves to bring together people with shared interests and knowledge, and facilitate discourse that will advance the state of theorizing or understanding. At a more basic level, scholars work to curate resources that tend to become more or less canonical, shaping the slate of information that the next generation will be exposed to. Such processes, however, are not necessarily unique to academia.
For various reasons (which don’t matter here) I’ve recently been poking around various writings related to what people often call AI safety and alignment. Although this is quickly becoming something of a hot topic, even within relatively academic circles, a huge amount of the conversation about these issues seems to have taken place in more informal venues, such as blogs, forums, podcasts, and even books. From an outsider’s perspective, it is somewhat difficult to know how to think about this work. A cynical view might consider it something akin to fan fiction — amateurs spitballing about big ideas and sharing their writings among friends. A more charitable view might look at it more like an emerging scientific community, perhaps even something akin to the republic of letters that flourished during the early stages of the scientific revolution.
Whatever it is, there is something of a paradox here: unlike much of academia, this body of work is incredibly open. The vast majority seems to be posted to publicly accessible parts of the internet. On top of that, a considerable amount of labor has already been done in terms of curating, linking, recommending, and trying to summarize or contextualize key ideas. In principle, this should make it all the easier to dive into. And yet, perhaps in part because of the sheer volume of writing, and the lack of a more traditional academic hierarchy, it seems unusually difficult to find truly accessible entry points into this literature.
Working through some of the chains of citations, many paths seem to eventually pass through Rohin Shah. He is certainly not the most central figure, but he has written quite a lot about this topic, including the long-running Alignment Newsletter . As a day job, he is a Research Scientist at DeepMind, and he holds a PhD from Berkeley, but he obviously doesn’t confine his writing to normal academic channels.
Browsing through Rohin’s website, I came across what I consider to be one of the most interesting passages I’ve seen in this entire space of ideas. On a Frequently Asked Questions page, Rohin provides answers to a number of questions. Under the question of “What should I learn if I don’t know much about AI alignment yet?”, he writes:
“If you’re used to academia, this will be a disconcerting experience. You can’t just follow citations in order to find the important concepts for the field. The concepts are spread across hundreds or thousands of blog posts, that aren’t organized much (though the tagging system on the Alignment Forum does help). They are primarily walls of text, with only the occasional figure. Some ideas haven’t even been written down in a blog post, and have only passed around by word of mouth or in long comment chains under blog posts. People don’t agree about the relative importance of ideas, and so there are actually lots of subparadigms, but this isn’t explicit or clearly laid out, and you have to infer which subparadigm a particular article is working with. This overall structure is useful for the field to make fast progress (especially relative to academia), but is pretty painful when you’re entering the field and trying to learn about it.”
I find this fascinating for several reasons. First, is there actually anything here that is unique to AI safety and alignment, or is this something that can be said about all fields of study? It is certainly not unusual to have knowledge spread across many disparate sources, often in quite impenetrable prose. Others have also previously discussed the importance of “tacit knowledge” in science — things that everyone knows but are not written down, or subtle aspects of how to work with a scientific apparatus that can only be learned by doing. And disagreements about importance are in some sense the entire ballgame in terms of what scholars do! So what exactly is different here? Is it mostly just that this is a relatively new field, and so doesn’t yet have the equivalent of a textbook? Or is something about the topic — perhaps a kind of mushiness in the ideas themselves?
A second interesting aspect of this is the nature of the community. In contrast to academia, which often relies on heavy doses of prestige and credentials in determining who is taken seriously, the people writing about AI safety and alignment seem to involve a much wider network of amateurs, writing in places where pretty much anyone’s work will be given serious consideration. Clearly various hierarchies nevertheless emerge, with popularity fueled in part by prolific writing and self-promotion. Some of the most notable people in the field, including Rohin himself, do have fairly standard positions within academia and/or corporate research labs, and yet much of their writing on these topics does not get published in formal venues, or even relatively informal places, like arxiv. Instead, this does seem to be a case of working out ideas in public, and a way that eschews the more formal trappings of academia.
Finally, consider the various infrastructural aspects. As Rohin points out, because so much of this writing is published in places like LessWrong, various features of that website, like tags and upvoting, have the potential to act as tremendous aids in navigating the terrain. Indeed, these are arguably much more informative markings than the typical signals found in much academic work. Moreover, almost all traditional academic fields have overwhelmingly failed at creating any kind of equivalent to this (even something as simple as comments on published work!)
All of this makes it seem like this open, informal, and annotated body of work should be far more legible than the equivalent in most fields of research. There doesn’t even seem to be a particularly large number of necessary foundational ideas to be learned (compared to something like say, biochemistry). And yet, somehow it remains very difficult to sort through, or to find anything like an authoritative voice.
All of this suggests a number of possible explanations for this combination of openness and opacity, but additional case studies would be extremely useful. Regardless, if nothing else, it remains a fascinating example of the emergence of a community, and arguably of an entire field, outside of traditional academic channels, and that by itself makes it worthy of further study.