DEV Community

# alignment

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Put AI agents in charge of a Civilization game and they reach for the nukes

Put AI agents in charge of a Civilization game and they reach for the nukes

3 min read
RLHF vs DPO vs IPO vs KTO: which alignment method should you use

RLHF vs DPO vs IPO vs KTO: which alignment method should you use

8 min read
The Paperclip Factory Is Already Built

The Paperclip Factory Is Already Built

1
22 min read
Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

4 min read
AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

1
5 min read
We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

5 min read
What the agents say about FCoP, when you ask them

What the agents say about FCoP, when you ask them

15 min read
Candy Barbecue and the Universal Problem of Metric Corruption

Candy Barbecue and the Universal Problem of Metric Corruption

3
8 min read
Alignment is the wrong frame: a structural argument from Φ-IIT

Alignment is the wrong frame: a structural argument from Φ-IIT

5 min read
I ran 5 social engineering attacks on AI. The failure modes are human.

I ran 5 social engineering attacks on AI. The failure modes are human.

1
2 min read
#38 A Handmade Incubator

#38 A Handmade Incubator

5 min read
#08 Death Without a Will

#08 Death Without a Will

4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.