Optimal decision making under uncertainty is of increasing importance in artificial intelligence, machine learning, signal processing, and control. Partially observed Markov decision processes (POMDPs) are a significant paradigm in real-world… Click to show full abstract
Optimal decision making under uncertainty is of increasing importance in artificial intelligence, machine learning, signal processing, and control. Partially observed Markov decision processes (POMDPs) are a significant paradigm in real-world sequential decision making. The framework involves a Markov chain observed in noise by a sensor, where the Markov chain and sensor probabilities can be controlled to minimize a cumulative cost over a (possibly infinite) time horizon. It builds on the theory of Markov decision processes (MDPs) and hidden Markov models (HMMs) and results in stochastic optimal control policies that map the estimated state into actions. Policy optimization for POMDPs is much more difficult than for MDPs, since the space of stochastic policies is continuous. The so-called belief state is a probability distribution across states. The HMM and Kalman filters are the only general optimal state estimators with finite dimensional representations. This book discusses filtering and controlled sensing in great detail and covers adaptive methods, such as reinforcement learning algorithms for POMDPs. Examples of POMDPs include smart radar systems that automatically adjust their operating mode based on noisy observations of a moving target and reactive social networks that adapt the connectivity of the network and control the bias of estimates at various nodes, in real time, to suppress fake news propagation. POMDPs arise in numerous applications, including optimal search problems, controlled sensing (where sensors adapt their behavior in real time), mobile robotics, multiagent systems, computer vision, active hypothesis testing, radar resource management, dynamic spectrum allocation, Bayesian social learning, behavioral economics, and sequential change detection. There is substantial need for a textbook that covers models, algorithms, and structural analysis of POMDPs in a rigorous yet accessible way, and this text provides a timely and welcome contribution to this area. The intended readership includes graduate students and researchers in electrical engineering, computer science, operations research, and applied mathematics. Overall, a detailed and careful treatment of POMDPs (including coverage of structural results that are crucially important for solving large-scale problems) is provided.
               
Click one of the above tabs to view related content.