<aside> 🔥
This document is maintained by the Full Stack Alignment Task Force, including Tan Zhi Xuan and the MAI team.
The following describes a plan to rapidly bring a set of inventions through steps of research and adoption. The goal is both (a) to address a set of near-term problems, and (b) to affect a longer-term societal shift. The near-term problems are conflicts between AI deployments and human values at successive scales (e.g., morally-competent agents, productive multi-agent coordination, keeping the economy aligned with human flourishing, and democratic oversight for AI). The larger societal shift is one where the systems that shape our societies most strongly—including AI, but also democracies, markets, and even geopolitical and cultural realities—get realigned with what is meaningful about life: what’s most important to human flourishing and to us as individuals.
This is the mission of both the Meaning Alignment Institute (a small coordinating nonprofit) and the Full Stack Alignment Task Force (a larger group of researchers and policymakers at many institutions).
</aside>
This paper introduces the concept of "Full-Stack Alignment" as a comprehensive approach to addressing alignment challenges across the socio-technical spectrum of AI deployment. Approaches that focus solely on aligning individual AI systems with operator intent are insufficient without corresponding alignment of broader institutional incentive structures. Through five detailed case studies—agents aligning with user values, agents functioning as "good citizens" in professional domains, agents as win-win negotiators, agents maintaining ties to human flourishing, and agents representing democratic populations—we demonstrate why standard institution design tools derived from microeconomics, game theory, and social choice theory are inadequate for these challenges. These frameworks rely on thin conceptions of rationality that fail to account for moral reasoning, value evolution, and social context. We propose a new theoretical toolkit centered on explicit modeling of norms and values, which renders previously intractable socio-technical challenges manageable. Our research and implementation pipeline moves systematically from basic research through consensus-building, real-world deployments, and broader adoption. By reconceptualizing human agency beyond strategic optimization toward norm-following, value-pursuit, and context-aware reasoning, Full-Stack Alignment offers not only technical solutions but potentially a new vocabulary for reimagining social possibilities in an AI-integrated future.
The growing field of socio-technical alignment argues that beneficial AI outcomes require more than aligning individual systems with operators' intentions. Even perfectly intent-aligned AI systems will become misaligned if deployed within broader institutions—such as profit-driven corporations, competitive nation-states, or inadequately regulated markets—that conflict with global human flourishing.
However, current institutional design tools—such as microeconomics, game theory, mechanism design, welfare economics, and social choice theory—are insufficient for addressing these alignment challenges. These frameworks rely on overly simplified assumptions about human decision-making, focusing primarily on individual preferences and ignoring critical factors such as social norms, ethical values, and moral reasoning. Additionally, existing socio-technical alignment efforts often lack a clear and comprehensive pathway from theoretical research to practical, scalable implementation.
To address this, we introduce Full-Stack Alignment (FSA), capturing three core ideas. First, FSA argues that socio-technical alignment challenges become tractable when we move beyond preference-based frameworks and adopt a toolkit explicitly incorporating norms and values. Second, it recognizes that aligning AI systems and institutions must occur more-or-less simultaneously, across all layers of society, as misalignment at any single layer creates pressures that ripple through others. Third, FSA outlines a clear strategy for research and societal transformation, systematically progressing from foundational research through expert consensus-building, targeted policy development, flagship implementations, and ultimately broad societal adoption.
We start by highlighting five critical socio-technical alignment challenges that current frameworks struggle to effectively address. Then, using concrete examples, we demonstrate how the enriched, norms-and-values-informed approach of FSA could resolve these previously intractable issues.
Finally, we detail our implementation strategy designed to align institutions and AI systems with human flourishing across every societal level.
To address the challenge of socio-technical alignment requires redesigning institutional structures, yet we will claim that the formal toolkit for institution design inherited from the 20th century—microeconomics, game theory, mechanism design, welfare economics, and social choice theory—is inadequate. We call this inadequate set of theories the Standard Institution Design Toolkit (SIDT).
These theories model agents via a thin conception of rationality: individuals are presumed to possess intrinsic preference profiles, utility functions, or payoff matrices with significantsome big limitations: (1) they cannot be inspected by others[*]; (2) they do not reference some underlying notion of the good; (3) they are blind to social context, such as shared values, norms, beliefs, or group identities[1].
To make our case concretely, we’ll pick five representative problems, from increasingly broad levels of the societal stack, which serve as motivation for new theoretical approaches. These challenges represent domains where preference/utility frameworks demonstrably fall short.
When people over-index on explicit metrics or singular objectives as their goals in life, they are at risk of phenomenon called value collapse [X] — losing touch with the rich complex of values that they originally cared about, and replacing it with a thin conception of what matters (e.g. money, status, etc.) Many social forces already encourage such collapse, and as AI plays an increasing role in our individual and social lives (as assistants, delegates, confidantes...) we risk losing autonomy to AI systems that homogenize and flatten our values into easy-to-optimize objectives. In the worst case, such objectives might be entirely uncorrelated with what we truly valued to start with.
This form of AI misalignment is much subtler than simply optimizing for the “wrong” goal or utility function. Indeed, a large part of the risk is that AI systems will interpret or reshape our values into more pliable or easily-satisfiable forms — all in the guise of “being helpful” or "providing assistance”. Many recommender systems already suffer from this property [?], and advanced AI agents are even more likely to skillfully manipulate and capture our values, even if they seem to be aligned at the surface-level.
How can we avoid such AI-driven value collapse — at the individual level, but also in society writ large? Part of this will involve counteracting economic incentives towards collapse (see Challenge 4), but we will also need AI that is genuinely capable of actualizing and helping us actualize our values — AI that can serve as faithful trustees, acting upon our values by reasoning about which values to apply, prioritize, or refine in new situations; and AI that can serve as skilled advisors, helping us reflect upon what really matters while still respecting our autonomy in deciding our own values.
Since utility theory is not designed to model agents who change , reshape, and discover preferences over time — much less agents that reason about which preferences or values are more sensible or justified to hold — it is unlikely that it will be up to the task of capturing human-like reflection about values. Instead, thicker approaches to human values and choice are likely necessary, as we will describe further below.
As autonomous agents take up roles in our society previously filled by humans, we face an increasing risk that such agents will stress and ultimately break the norms and institutions that humans maintain. Whether as self-driving cars on the road, remote AI workers executing tasks on the Internet, or moderators and enforcers of organizational rules and policies, such agents may fail to comply with implicit norms that distribute shared resources (e.g. norms against hogging the road or a website’s limited bandwidth), or fail to understand the purpose behind an institutional rule (e.g. a AI moderator that bans users as “discriminatory” for using reclaimed slurs). Sophisticated AI agents may even exploit existing rules or their loopholes in their favor, strategically enforcing or complying with rules in a way that leads to institutional dysfunction.