New Arrivals/Restock

Reinforcement Learning & Optimal Control: Theorems, Proofs, and Python Implementations (Computational Mathematics Library)

flash sale iconLimited Time Sale
Until the end
02
46
16

$47.99 cheaper than the new price!!

Free shipping for purchases over $99 ( Details )
Free cash-on-delivery fees for purchases over $99
Please note that the sales price and tax displayed may differ between online and in-store. Also, the product may be out of stock in-store.
New  $79.99
quantity

Product details

Management number 219223570 Release Date 2026/05/03 List Price $32.00 Model Number 219223570
Category

A rigorous graduate-level reference unifying dynamic programming, stochastic control, and modern reinforcement learning. Twenty-four focused chapters build from measure-theoretic probability and operator theory to cutting-edge algorithms, with complete proofs, sharp assumptions, and tight error bounds.Mathematical foundations. Borel-space MDPs, stochastic kernels, measurable policies, ergodicity, and trajectory measures; contraction mappings, span seminorms, and fixed-point theory.Dynamic programming at scale. Value and policy iteration, modified/asynchronous variants, linear programming duality and occupancy measures, performance difference lemmas, finite-time stopping rules.Long-run criteria. Average cost, the Poisson equation, ACOE, relative value iteration, Blackwell optimality, drift conditions for stability.Episodic control. Stochastic shortest path models with proper policies, boundary conditions, and convergence without discounting.Partial observability. Belief-MDPs, piecewise-linear convex value functions, Bayes filters, and stability of belief dynamics.Continuous time. HJB PDEs, verification theorems, viscosity solutions, and convergent numerical schemes.Linear systems. LQR and algebraic Riccati equations; LQG, Kalman filtering, and the separation principle.Nonlinear optimal control. Differential dynamic programming, iLQR, and model predictive control with recursive feasibility and Lyapunov stability.Exploration fundamentals. Stochastic and contextual bandits, UCB, Thompson sampling, and information-theoretic lower bounds.Stochastic approximation. Robbins–Monro, ODE method, two-time-scale analysis, Polyak–Ruppert averaging, and Markovian noise.Temporal-difference learning. TD(λ), LSTD, GTD-family methods, emphatic weighting, and off-policy stability.Function approximation theory. Projected Bellman equations, MSPBE, fitted value and Q iteration, Rademacher complexity, concentrability, and Bellman rank.Control algorithms. Q-learning, SARSA, double Q-learning, finite-time tabular rates, and divergence with approximation.Policy optimization. Policy gradient theorem, variance reduction, natural gradients, trust-region methods, monotonic improvement guarantees.Actor–critic and entropy regularization. GAE, PPO, SAC, mirror descent and primal–dual views, two-time-scale convergence.Model-based RL. System identification, adaptive LQR, OFU and PSRL, regret bounds, Dyna-style planning, and simulation lemmas.Offline evaluation and control. Importance sampling, doubly robust estimators, FQE/FQI, high-confidence bounds, and pessimism for reliability.Safety and robustness. Constrained MDPs, Lagrangian methods, CVaR and risk envelopes, control barrier functions, robust and distributionally robust RL.Multi-agent settings. Zero-sum and general-sum Markov games, Shapley operators, equilibrium computation, and decentralized learning dynamics. Read more

ISBN13 979-8273616219
Language English
Publisher Independently published
Dimensions 8.5 x 0.86 x 11 inches
Item Weight 2.38 pounds
Print length 380 pages
Part of series Computational Mathematics Library
Publication date November 8, 2025

Correction of product information

If you notice any omissions or errors in the product information on this page, please use the correction request form below.

Correction Request Form

Product Review

You must be logged in to post a review