Markov Decision Processes#

The mathematical formalism that we will use to develop in particular reinforcement learning algorithms is the Markov Decision Process, or MDP.

The MDP comprises a state representation (that satisfies the Markov property), an action space, a transition function, a reward function, and a discount factor. In the case that we are estimating the state through sensor data, this becomes a partially observable MDP, or POMDP. In this context, our objective is to find a good policy.