Policy Iteration
Policy Iteration#
{"description": "Policy iteration in Duckietown", "keywords": "reinforcement learning, policy iteration, duckietown, machine learning, ML, AI, embedded AI"}
A general framework for determining a good policy for an MDP is to start by finding the value function, or the value associated with each state, or state-action pair, for that policy.
This indicates our estimate of the discounted return that we would obtain if we started in a given state and then followed the policy forever after. Policy iteration involves iterating between improving the policy and estimating the value function.