<aside> 💡 Ideally, the below will be co-signed by a small group of relevant academics.

(So far the main authors are Joe Edelman and Tan Zhi Xuan, with input from Atoosa Kasirzadeh, Ryan Lowe, and Manon Revel.)

</aside>

Modeling Socially Embedded Agents & AI

While academic thought has delivered to us many simplified models of social behavior, the 20th century was dominated by a ‘central axis’ of five social theories: microeconomics, game theory, social choice, mechanism design, and welfare economics. These five were all supported from below by rational choice theory[0], and below that, by an understanding of individual actors with fixed preference profiles or utility functions.

We call these the ‘central axis’, because they were the principle social theories which generated institutional designs, and which justified them. They also provided backing for the dominant political theories, which were freedom- and fairness-based, again focusing on individual actors and the (free or fair) pursuit of their individual preferences.

image.png

Something to note about these central axis theories, is they model agents as ‘context-free’. Individuals are supposedly born with their preference profiles, utility functions, or payoff matrices intact, and there’s no place in the models for shared values, shared norms, shared beliefs, group identities, etc, that shape these individual preferences.[1]

Despite this lack of context (or because of it), we imagine these theories achieved their centrality because they are mathematically expressive and powerful theories, parsimonious, and they sat well with widespread philosophical intuitions. Plus, they were ‘good enough’ for the institution design challenges of their day.

But recently, the field of AI alignment pushes against this state of affairs. We see six reasons for this.

  1. The problem with revealed preference. The need to find objective functions or reward models which can safely be used to train AI brings us right back to the debates in welfare economics (most closely associated with Amartya Sen) about the limits of revealed preference as a measure of benefit. But, in the meantime it’s become clearer that businesses, governments, and other entities have indeed learned to exploit individuals under the guise of serving their preferences, using AI.[2] For instance, social media platforms learned to manipulate user engagement leading to addiction-like behaviors, prioritizing platform growth over individual well-being.

    Alignment methods based on explicit values[3], or norms[4], are starting to emerge and showing clear advantages over revealed preference-based approaches. In general there’s much more of an appetite to overcomes these problems with revealed preference than when these debates happened in the field of development economics.[5]

  2. Socio-technical alignment requires new institutional forms. AI alignment challenges seem to require inventing new forms of governance, and many directions for institutional innovation aren’t supported by the central axis. To give but one example, the central axis version of social choice is blind to the most powerful lever in deliberation: inspiration. The best mechanisms should not just accommodate existing preferences; they should allow for the formation and inspiration of new ones, by creating environments where individuals grow in their understanding of what is good.[6] Can mechanism design catch up with ancient Athens?

    Recently, mechanisms by Conitzer[7] and Klingefjord et al[8] leave the central axis behind and build on different social theories.

  3. The need to model and preserve ‘the social fabric’. One thing we want powerful AI to be careful about is that shared context: the networks of trust, of values alignment, and of normative cooperation which keep society working. The central axis theories, because they don’t model agents as embedded in a social context, mostly pretend this social fabric doesn't exist.[9] This means alignment efforts aimed to preserve or enhance the social fabric are hard to build on these theories!

    We already see the results of this playing out in society: recent decades see metrics based on preferences or transactions go up. Our “wealth” is increasing. But liberals often suspect our capacity for social cognition has declined via misinformation, conspiracies, etc; conservatives suspect we’ve suffered a decline in morals and aesthetics; both tend to say our norms and channels for cooperation have eroded. Whether such declines are happening or not, it doesn’t seem likely that measures of consumption or engagement/revealed preference would show them. These things are important; they should find a prominent place in our social theories and ideas of welfare.

  4. Cooperating AI agents. Another challenge in alignment is to get a vast ecosystem of AI agents cooperating. This turns out to be a sore point for the central axis theories, and a place where ideas about shared norms or values from outside the central axis have already been imported, to address mismatches between the equilibria predicted by context-free models and those observed among real, cooperating agents.

    We can work to make AI agents that cooperate as we do, but to do so we may need to adopt more sophisticated (yet tractable), norm- and value-embedded models of human cooperation.

  5. The challenge of super-wisdom. The central axis assumes agents have fixed preferences, disconnected from one another and from any broader notion of the good**.** This means AIs, individuals and societies cannot collectively aspire to ideals beyond preference satisfaction.

    However you understand our social embedding — whether as shared values, norms, or beliefs — to acknowledge it exists seems to suggest kinds of goodness beyond preference satisfaction. The context of shared values suggests moral progress or learning. A context of evolving norms suggest game-theoretic notions of goodness, such as cooperation at higher scales, or across diverse ecosystems. With a context of shared beliefs, there’s the aspiration to discover higher truths.

    Yet, so long as individual preferences are the yardstick of the good, none of these other notions can be admitted, and no one is allowed to know better than anyone else. To make progress, revealed preferences must be reformulated as exogenous—as a function of underlying values (themselves subject to moral learning), plus social norms and strategic considerations.

  6. New tools. Finally, displacing the central axis theories seems newly feasible. It’s likely that the success of the central axis was partly based on the availability of hard data - behavioral data in the form of votes, purchases, and clicks. This was, for a time, far easier to obtain than intersubjective and qualitative data about shared context.

    LLMs make the systematic investigation of shared norms, values, and beliefs much easier, because they can (1) perform qualitative interviews at scale; (2) simulate people; or (3) because their weights crystalize these linguistic and conceptual aspects of our social fabric as a form we can see.

    This will let new models of socially embedded agents compete for rigor.

All these forces suggest a shift away from the context-free, central axis theories.

This is a paradigm shift, significantly bigger than “behavioral economics”.

Exciting!

Towards Allied Theories of Embedded Agents

We’d like to lay out a program for responding to this situation. What would we like to see?

  1. Commensurate reach and integration. We believe there are versions of every component of the central axis based on (socially) embedded agents rather than agents with only individual preferences, utility functions, or payoffs:

    1. Context-aware modeling of human reasons, values, and decision-making would go beyond the “thin” instrumental rationality accounted for by expected utility theory.
    2. The context-aware version of the AI outer alignment problem does double duty as a reformulation of the social welfare function in welfare economics.
    3. A context-aware (especially norm-aware) theory of human cooperation, coordination, and interdependent choice could supplant game theory as we know it.
    4. There will be context-aware ideas of trade equilibria in microeconomics.
    5. Perhaps most importantly, social choice and mechanism design will experience a renaissance by centering context-aware mechanisms and institutions.

    Let’s call these theories together the Allied Theories of Embedded Agency (ATEA). There’ll be a period of divergent experimentation, but ultimately these theories must fit together tightly, just as the current central axis does.

  2. Formal guarantees. While there are alternatives to the central axis models, they tend to be less useful for reasoning about institutions, interventions, and mechanisms.

    What makes the central axis so good in this regard is that it provides ‘formal guarantees’, verifiable on paper, about the behavior and outcomes of mechanisms and institutions. Guarantees like pareto-optimality, strategy-proofness, and envy-freeness. Via these formal guarantees, the central axis militates for some designs and against others, so designers have results without needing to know a mechanism’s real-world performance.

    We can compare this feature of the central axis against vaguer theories, like those in macroeconomics and Marxism. These lack formal guarantees and, instead, the same theoretical apparatus can often be used to argue either for and against something.

    We believe the ATEA will provide new formal guarantees. Some will concern new kinds of goodness a mechanism or institution can have, which haven’t been properly conceptualized yet.

  3. Algorithms, institutions, and mechanisms that fit us well (and solve real-world problems). The ultimate test of a social theory is whether systems built or justified by it fit us well. What we mean by this[10] is: any set of social theories implies certain skills and activities on the part of agents. Current mechanism design implies agents who optimize, calculate, strategize, collude, reduce values to numbers, estimate probabilities, etc. Standard theories of intelligent agency — which guide the design of both AI systems and their models of humans — do the same. These activities aren’t foreign to human nature, but don’t seem central either. So, we have to put in a kind of effort to participate in the institutions that emerged from this view of agency.

    Other social theories have similarly poor fits: people aren’t natural advocates for class or race interests, nor natural status maximizers, etc. We hope that the ATEA will be different — that norm-following, norm-intuiting, inspiration, and pluralist value-pursuit will be a better account of human nature and rationality than the above, and that therefore algorithms, institutions and mechanisms built on the ATEA will better fit our lives. Ultimately, the proof here is experiential: as new context-aware institutions, algorithms, mechanisms, platforms, and laws develop, we’ll see what it feels like to participate in them.

  4. Social change along an unexpected but optimistic path. If all of the above go well, we’ll see new political theories and even political rhetoric emerging. New political theories will go beyond freedom and fairness, aiming at notions of the good beyond preference satisfaction, as gestured to in point #5 above. These new political theories will legitimate the reorganization of society to achieve these new notions of the good.

The Task Ahead

This will require a ragtag band of geniuses, collaborating closely. You’re reading this because we think your work is exceptional, and you might belong in this academic movement.