LAUSR: helpful harmless

Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback

Sign Up to like & get
recommendations!
0 Published in 2025 at "Ethics and Information Technology"

DOI: 10.1007/s10676-025-09837-2

Abstract: This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI… read more here.

Keywords: helpful harmless; feedback; human feedback; reinforcement learning ... See more keywords

LAUSR

You are not signed in:

Sign Up!

Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback