RL:
Value ineration:
1. Bertsekas, D. P., & Tsitsiklis, J. N. (1989). Parallel and Distributed Computation: Numerical Methods. Prentice Hall. Republished by Athena Scientific in 1997.
2. Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13 (1), 103-130
3. Peng, J., & Williams, R. J. (1993). Efficient learning and planning within the Dyna framework. In Proceedings of the Second International Conference on Simulation of Adaptive Behavior, pp. 281290.