Overview

Roughly speaking, each area below suggest a kind of research pipeline: { certain techniques need to be developed → then comes a proof of concept system/mechanism → then a variety of legitimating tests and proofs → followed by a policy advocacy paper }.

I personally believe that in each case, the policy space is extremely limited mainly because of a lack of technical contributions, and that norms-based, values-based, or hybrid moral reasoning and thick choice can address that.

Ideally, to head off the first two areas, we’d publish 12-15 papers in 2025! I do think we’re pretty much the group to do it, but we’d have to coordinate tightly.

Hmm

CA1 - Manipulative Assistants and Agents. AI assistants can (1) incept foreign values into users; (2) persuade users that actions were beneficial which weren’t; or just (3) act in ways that don’t reflect users’ morality, while seeming to help their human users or represent their interests.
- Research focus: Morally-Competent Agents #1. Assistants and agents understand their users’ values and reason about how they apply to new situations, in a pattern that’s robust against incentives for manipulation.
CA2 - Norm-Blind Agents. As AI agents with inadequate moral reasoning replace human agents (lawyers, doctors, financiers) important values and professional norms are abandoned, leading to tragic outcomes in finance, business, and even military applications as AGI spreads.
- Research focus: Morally-Competent Agents #2. Agents which can reason about and invent norms in the contexts in which they operate.
CA3 - Negotiation Failures Among Black-Box Agents. In general, purely strategic agents face problems like the prisoner’s dilemma and have limited means for overcoming these coordination failures.
- Research focus: Agent-Agent Negotiation with Explicit Values. Agents that understand each other's values and commitments can see reasons to cooperate, where opaque or purely strategic agents would not.
CA4 - Economy Increasingly Detached from Human Welfare. The gap between capitalist success / social media engagement and human flourishing has already widened, and an economy with many agents and autonomous corporations is likely to further widen it.
- Research focus: Dynamic Outcome-Based Contracts for Human Flourishing. Instead of charging for a deliverable, providers can manage a bundle of outcome-based contracts which are anchored in human ends. An AI intermediary drafts the contracts, assesses when they’re satisfied, and uses pricing to minimize risk for both parties.
CA5 - Democratic Institutions Can’t Keep Up.

Current democratic and global institutions cannot handle geopolitical conflicts nor respond rapidly enough to AI activities.

Research focus: Nimble & Legitimate Democratic Agents. New mechanisms use Kantian or values-based moral reasoning to decide things based on public values and norms, and to foster new types of rapid deliberation, so that we can make much more sophisticated institutions that also better represent the people.

Level #1 - Morally-Competent Agents (User Alignment)

Threat Manipulative Assistants and Agents. AI assistants can (1) incept foreign values into users; (2) persuade users that actions were beneficial which weren’t; or just (3) act in ways that don’t reflect users’ morality, while seeming to help their human users or represent their interests.

Opportunity Morally-Competent Agents. Assistants and agents understand their users’ values and reason about how they apply to new situations, in a pattern that’s robust against incentives for manipulation.

More info

Relevant: Human data pipelines for values

Level #2 - Morally-Competent Agents (Contextual Values)

Threat Purely Goal-Directed Agents. As AI agents with inadequate moral reasoning replace human agents (lawyers, doctors, financiers) important values and professional norms are abandoned, leading to chaos and harm.