Masking in Deep Reinforcement Learning

Introduction I worked on an environment where specific actions are not available at every timestep \(t\) when I started deep reinforcement learning. Let’s illustrate the concept of impossible or unavailable action concretely: Suppose you want to develop an agent to play Mario Kart. Next, assume that the agent has an empty inventory (no banana 🍌 or anything). The agent can’t execute the action “use the object in the inventory”. Limiting the agent to a meaningful choice of actions will enable it to explore in a smarter way and output a better policy.

Read More