Date of Graduation
Master of Science in Computer Science (MS)
Computer Science & Computer Engineering
Second Committee Member
Deep Learning, Exploration Strategy, Machine Learning, Reinforcement Learning
We propose a simple and efficient modification to the Asynchronous Advantage Actor Critic (A3C)
algorithm that improves training. In 2016 Google’s DeepMind set a new standard for state-of-theart
reinforcement learning performance with the introduction of the A3C algorithm. The goal of
this research is to show that A3C can be improved by the use of a new novel exploration strategy we
call “Follow then Forage Exploration” (FFE). FFE forces the agents to follow the best known path
at the beginning of a training episode and then later in the episode the agent is forced to “forage”
and explores randomly. In tests against A3C implemented using OpenAI’s Universe-Starter-Agent,
FFE was able to show on average that it reached the maximum score faster.
Holliday, James B., "Improving Asynchronous Advantage Actor Critic with a More Intelligent Exploration Strategy" (2018). Theses and Dissertations. 2689.