<aside> š
The Meaning Alignment Institute is helping coordinate the Better AI and Institutions via Thick Models of Choice field-building effort. Here are 6 areas where we believe accelerated progress would be timely.
</aside>
Suppliers usually price their offerings in terms of deliverables, not the end benefit consumers hope for: yoga studios sell yoga classes, not fitness or community; compute providers sell instance hours, not whatever end-user benefits they hope to enable.
Thereās an exception to this: outcome-based contracts. In the airline industry, for instance, aircraft maintenance is priced using serviceable flight time, an outcome-based measure. But currently, outcome-based contracts are costly to design and assess. AI could change this: instead of charging for the deliverable, providers can manage a bundle of outcome-based contracts. An AI intermediary drafts the contracts, assesses when theyāre satisfied, and uses pricing to minimize risk for both parties.
<aside> š
More at Market Intermediaries: A Post-AGI Market Alignment Vision
</aside>
In the near-future, autonomous agents will operate and coordinate with one another on our behalf. We want them to achieve shared goods, and search for win-win solutions.
In general, purely strategic agents face problems like the prisonerās dilemma and have limited means for overcoming these coordination failures. We can try to work around them by building society-wide reputation systems for autonomous agents, bargaining structures, the equivalent of āsmall-claims courtā for AIs, etc.
Philosophers like David Velleman have shown another approach via model integrity: agents that understand each other's values and commitments can see reasons to cooperate, where opaque or purely strategic agents would not. This is likely the easier path to multi-agent systems, and includes work in model integrity āevaluationā, interpretability, and even cryptographic ways for agents to prove their values.
<aside> š
More at https://meaningalignment.substack.com/p/model-integrity
</aside>
Thereās a lot of work recently showing the limits of preference-based RLHF for fine-tuning. State of the art fine-tuning approaches are already trying to be values-based (e.g., RLAIF), but a blocker for further advancements, here, is getting quality, human data about values and norms into fine-tuning pipelines
There are many problems to solve: ensuring the values and norms collected are legible, coherent, de-duplicatable, well-structured, etc; ensuring they are the values and norms that guide real behavior, rather than the ones people claim to have but do not enact, etc
(This research area also includes representing the values of existing models beyond single words like āhelpfulnessā or ācuriosityā, so users can understand and assent to those values.)
We believe moral reasoning is similar to, but distinct from, the kind of mathematical reasoning currently targeted by models like o1. One similarity is that moral reasoning steps can be checked for quality, and thus high-quality moral reasoning can be an RL alignment target. Superhuman, explainable moral reasoning ā what we call āWise AIā ā is likely an important target for AI alignment.
This includes work in formalizing moral reasoning based on values (āvirtue ethicsā) and norms (ācontractualismā) and also more practical work in RL fine-tuning based on moral reasoning work thatās already been done, including unreleased moral reasoning work by MAI.
<aside> š
Suggested experimental project in this direction are mentioned in https://meaningalignment.substack.com/p/model-integrity and elsewhere.
</aside>
Amartya Sen and Martha Nussbaumās Capability Approach is a values-based way to measure the welfare of a population that is used to developmental economics contexts.