<aside> šŸŒ€

The Meaning Alignment Institute is helping coordinate the Better AI and Institutions via Thick Models of Choice field-building effort. Here are 6 areas where we believe accelerated progress would be timely.

</aside>

(1) AI-Mediated Outcome-Based Contracts

Suppliers usually price their offerings in terms of deliverables, not the end benefit consumers hope for: yoga studios sell yoga classes, not fitness or community; compute providers sell instance hours, not whatever end-user benefits they hope to enable.

Thereā€™s an exception to this: outcome-based contracts. In the airline industry, for instance, aircraft maintenance is priced using serviceable flight time, an outcome-based measure. But currently, outcome-based contracts are costly to design and assess. AI could change this: instead of charging for the deliverable, providers can manage a bundle of outcome-based contracts. An AI intermediary drafts the contracts, assesses when theyā€™re satisfied, and uses pricing to minimize risk for both parties.

<aside> šŸ‘‰

More at Market Intermediaries: A Post-AGI Market Alignment Vision

</aside>

(2) Model Integrity & Multi-Agent Negotiation

In the near-future, autonomous agents will operate and coordinate with one another on our behalf. We want them to achieve shared goods, and search for win-win solutions.

In general, purely strategic agents face problems like the prisonerā€™s dilemma and have limited means for overcoming these coordination failures. We can try to work around them by building society-wide reputation systems for autonomous agents, bargaining structures, the equivalent of ā€œsmall-claims courtā€ for AIs, etc.

Philosophers like David Velleman have shown another approach via model integrity: agents that understand each other's values and commitments can see reasons to cooperate, where opaque or purely strategic agents would not. This is likely the easier path to multi-agent systems, and includes work in model integrity ā€œevaluationā€, interpretability, and even cryptographic ways for agents to prove their values.

<aside> šŸ‘‰

More at https://meaningalignment.substack.com/p/model-integrity

</aside>

(3) Human Data Pipelines for Values and Norms

Thereā€™s a lot of work recently showing the limits of preference-based RLHF for fine-tuning. State of the art fine-tuning approaches are already trying to be values-based (e.g., RLAIF), but a blocker for further advancements, here, is getting quality, human data about values and norms into fine-tuning pipelines

There are many problems to solve: ensuring the values and norms collected are legible, coherent, de-duplicatable, well-structured, etc; ensuring they are the values and norms that guide real behavior, rather than the ones people claim to have but do not enact, etc

(This research area also includes representing the values of existing models beyond single words like ā€˜helpfulnessā€™ or ā€˜curiosityā€™, so users can understand and assent to those values.)

(4) Wise AI & Advanced Moral Reasoning

We believe moral reasoning is similar to, but distinct from, the kind of mathematical reasoning currently targeted by models like o1. One similarity is that moral reasoning steps can be checked for quality, and thus high-quality moral reasoning can be an RL alignment target. Superhuman, explainable moral reasoning ā€” what we call ā€œWise AIā€ ā€” is likely an important target for AI alignment.

This includes work in formalizing moral reasoning based on values (ā€virtue ethicsā€) and norms (ā€contractualismā€) and also more practical work in RL fine-tuning based on moral reasoning work thatā€™s already been done, including unreleased moral reasoning work by MAI.

<aside> šŸ‘‰

Suggested experimental project in this direction are mentioned in https://meaningalignment.substack.com/p/model-integrity and elsewhere.

</aside>

(5) Values-Based Measures of Welfare

Amartya Sen and Martha Nussbaumā€™s Capability Approach is a values-based way to measure the welfare of a population that is used to developmental economics contexts.