Articles with "helpful harmless" as a keyword



Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback

Sign Up to like & get
recommendations!
Published in 2025 at "Ethics and Information Technology"

DOI: 10.1007/s10676-025-09837-2

Abstract: This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI… read more here.

Keywords: helpful harmless; feedback; human feedback; reinforcement learning ... See more keywords