Sign Up to like & get
recommendations!
0
Published in 2025 at "Ethics and Information Technology"
DOI: 10.1007/s10676-025-09837-2
Abstract: This paper critically evaluates the attempts to align Artificial Intelligence (AI) systems, especially Large Language Models (LLMs), with human values and intentions through Reinforcement Learning from Feedback methods, involving either human feedback (RLHF) or AI…
read more here.
Keywords:
helpful harmless;
feedback;
human feedback;
reinforcement learning ... See more keywords