Date of Graduation

5-2018

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Gashler, Michael S.

Committee Member

Beavers, M. Gordon

Second Committee Member

Wu, Xintao

Keywords

Deep Learning; Exploration Strategy; Machine Learning; Reinforcement Learning

Abstract

We propose a simple and efficient modification to the Asynchronous Advantage Actor Critic (A3C)

algorithm that improves training. In 2016 Google’s DeepMind set a new standard for state-of-theart

reinforcement learning performance with the introduction of the A3C algorithm. The goal of

this research is to show that A3C can be improved by the use of a new novel exploration strategy we

call “Follow then Forage Exploration” (FFE). FFE forces the agents to follow the best known path

at the beginning of a training episode and then later in the episode the agent is forced to “forage”

and explores randomly. In tests against A3C implemented using OpenAI’s Universe-Starter-Agent,

FFE was able to show on average that it reached the maximum score faster.

Share

COinS