Q Learning#

Through the Bellman equation, we can formulate a bootstrapping objective for estimating the value function (the Q-value function more specifically). This objective involves minimizing the temporal difference error. We discuss strategies for doing this, distinguishing between on-policy and off-policy approaches. Finally, if we have a high dimensional state, such as an image input, we show how we can use a neural network to estimate the state representation, a method called Deep Q Learning, or DQN.