Within certain parts of AI research, the term “agent” seems to have recently become a kind of shibboleth. The genesis of this likely derives, in part, from the classic Russell and Norvig textbook, which explicitly defines its focus in terms of what they call agents.1 Others might have in mind a specific technical meaning. More generally, however, it increasingly seems to be a kind of rhetorical way of gesturing towards especially powerful AI systems, the kinds of systems that are different from traditional AI or ML programs, perhaps even the kind that is capable of independent, and potentially dangerous, action or thought.
Etymologically speaking, “agent”, meaning “one who acts”, comes from the Latin agentem, meaning “effective, powerful”, present participle of agere: “to set in motion, drive forward; to do, perform; keep in movement”. Via a hypothetical proto-Indo-European root “ag-”, it connects to a whole host of terms like action, agency, assay, exact, mitigate, strategy, etc.2
Perhaps most commonly within AI, the primary components for something to be called an agent with AI are the ability to “perceive and act in some environment.”3 Others place greater emphasis on something like goal-directed behavior, perhaps even the ability to make and execute plans to achieve those goals. Rusell and Norvig push this even farther, suggesting that, unlike other types of computer programs, “agents are expected to do more: operate autonomously, perceive their environment, persist over a prolonged time period, adapt to change, and create and pursue goals”, though it is somewhat unclear if this is meant to be definitional, or whether all these conditions are necessary (or sufficient).
Unsurprisingly, all of this gets into messy philosophical territory very quickly. Perceiving and acting are relatively easy to think about (although ambiguity remains), but what does it mean to have a goal, or to be effective? (Many people are highly ineffective at accomplishing their goals, and yet it seems strange to say that they don’t have agency).
For example, on some level it seems reasonable to think of a thermostat as having a kind of agency. It perceives the world via a temperature sensor, and it can take actions that modify its environment, by turning a heating or cooling system on or off. It’s also fairly natural to think of it as having an implicit goal, namely to keep the temperature of its room within some particular range of values.
Contrast this to something like a chess playing program. Both the thermostat and the chess player are essentially just mathematical functions, implemented in silicon, that when called, will determine a particular course of action and respond accordingly. These actions might be said to be “effective” to the extent that they help the system achieve its “goal”.
For the chess program, it seems fairly obvious that it should have the goal of winning the game, but can this be fully disentangled from effectiveness? What about a really bad chess program that is easy to beat? Does it still have the goal of winning?
To some extent it seems so, in that it will make moves that will allow it to win against a novice opponent. On the other hand, it is more generally making moves that will rarely lead to victory. Perhaps it is more sensible to think of its goal first and foremost as making legal moves, some of which will be winning. (We could also contrast this to a program that tries to win by any means necessary, including breaking the rules in ways it thinks its opponent won’t notice).
One difference between a chess program and a thermostat is that the former only responds when told to move, whereas the latter is constantly vigilant, responding with an action as soon as its conditions are satisfied. This also seems to be mirrored by what we think of as background software agents that make up part of an operating system — waiting for some conditions to be triggered, and then calling some subroutine (such as turning off the display).
But then what about a rock, or a chunk of glass? In just sitting there, it is “perceiving” the environment (sensing vibrations in the ground and transmitting them through its body). Most of the time it will remain stationary, but if pushed on a slope, it will begin to move and slam into something below it. Why don’t we think of the rock as an agent with the goal of remaining still, until it is pushed? (Certainly I would be unable to remain so still, even if that were my goal).
Language models like GPT-3 are currently more like the chess program than the thermostat. They are mathematical functions that carry out a computation when called, returning some output, though one could of course connect them to some kind of mechanism that would trigger under particular conditions. They are certainly “effective” (to some extent), in terms of their ability to produce text that seems coherent, but does it make sense to think of this as their goal?
I understand why people are eager to mark AI programs in some way, to call attention to what seems like their unusual ability to act “appropriately” in varying circumstances. But the use of the term “agent” seems to do much more work rhetorically than practically, a kind of request that you think of something as possibly being somewhat more than it is.
Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach, 4th US ed. Pearson (2021).↩︎
For example, “An Open Letter: Research Priorities for Robust and Beneficial Artificial Intelligence”↩︎