Skip to content
Log in
Create account
DEV Community
#
alignment
Follow
Hide
Posts
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Put AI agents in charge of a Civilization game and they reach for the nukes
Breach Protocol
Breach Protocol
Breach Protocol
Follow
Jul 1
Put AI agents in charge of a Civilization game and they reach for the nukes
#
agents
#
alignment
#
safety
#
benchmarks
Add Comment
3 min read
RLHF vs DPO vs IPO vs KTO: which alignment method should you use
Tech_Nuggets
Tech_Nuggets
Tech_Nuggets
Follow
Jun 16
RLHF vs DPO vs IPO vs KTO: which alignment method should you use
#
llm
#
ai
#
alignment
#
opensource
Add Comment
8 min read
The Paperclip Factory Is Already Built
Kengo Nonaka
Kengo Nonaka
Kengo Nonaka
Follow
Jun 11
The Paperclip Factory Is Already Built
#
ai
#
alignment
#
philosophy
#
ethics
1
reaction
Add Comment
22 min read
Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment
DrMBL
DrMBL
DrMBL
Follow
May 30
Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment
#
ai
#
agents
#
aisafety
#
alignment
Add Comment
4 min read
AI Alignment is a Systems Architecture Problem, Not a Prompt Problem
Nelson Amaya
Nelson Amaya
Nelson Amaya
Follow
May 31
AI Alignment is a Systems Architecture Problem, Not a Prompt Problem
#
ai
#
alignment
#
agents
1
comment
5 min read
We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.
Tom Lee
Tom Lee
Tom Lee
Follow
May 15
We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.
#
ai
#
anthropic
#
alignment
#
research
Add Comment
5 min read
What the agents say about FCoP, when you ask them
joinwell52
joinwell52
joinwell52
Follow
Apr 29
What the agents say about FCoP, when you ask them
#
fcop
#
agents
#
ai
#
alignment
Add Comment
15 min read
Candy Barbecue and the Universal Problem of Metric Corruption
Alex @ Vibe Agent Making
Alex @ Vibe Agent Making
Alex @ Vibe Agent Making
Follow
Apr 9
Candy Barbecue and the Universal Problem of Metric Corruption
#
ai
#
machinelearning
#
analytics
#
alignment
3
reactions
Add Comment
8 min read
Alignment is the wrong frame: a structural argument from Φ-IIT
i-like-tree
i-like-tree
i-like-tree
Follow
Apr 13
Alignment is the wrong frame: a structural argument from Φ-IIT
#
ai
#
alignment
#
consciousness
#
safety
Add Comment
5 min read
I ran 5 social engineering attacks on AI. The failure modes are human.
Michael Trifonov
Michael Trifonov
Michael Trifonov
Follow
Apr 15
I ran 5 social engineering attacks on AI. The failure modes are human.
#
ai
#
llm
#
alignment
#
security
1
reaction
Add Comment
2 min read
#38 A Handmade Incubator
松本倫太郎
松本倫太郎
松本倫太郎
Follow
Apr 7
#38 A Handmade Incubator
#
ai
#
metamorphose
#
alignment
Add Comment
5 min read
#08 Death Without a Will
松本倫太郎
松本倫太郎
松本倫太郎
Follow
Apr 7
#08 Death Without a Will
#
ai
#
metamorphose
#
alignment
Add Comment
4 min read
👋
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account