1. Thesis Proposal
Using Reinforcement Learning to Learn Self-Motivation
Shikha Prashad
Thesis Advisor: Dr. Doug Blank
Reinforcement learning is a tool which can be incorporated into algorithms to allow learning independent of supervision. In this model the agent develops a behavior by learning to earn rewards and avoiding punishments. In doing so, the agent determines, on its own, a way in which to optimize reward. This also allows the agent to adapt to changes in environment or in the goal. This is different from supervised learning, in which the agent is trained on correct input and output pairs. While this reinforcement signal can come from a variety of sources, Marshall et al. have proposed a model in which the agent is driven by "self-motivation, that is, the system's own internally-generated representations and goals" (Marshall et al., 2004).
While the agent is exploring its environment, it will try to anticipate future states. It will also try to find new states, which it has not predicted. However, a system that can predict all future states will not have any new states to find, and a system for every state is new will not be able to make predictions. Marshall et al. think that through the interaction of this competition, the agents reach a balance from which learning occurs. They propose a model in which "after performing an action and observing the results, the robot’s prediction is compared with the actual outcome, and a representation of the prediction error is created" (Marshall et al., 2004).
In their experiment, there are two robots. One is the developing robot, which is in the center of the environment. It cannot move, but it can rotate. The other is the target robot, which moves around in the environment in a way so as to avoid obstacles. The aim is for the developing robot to "learn to track given only an internal reinforcement signal based on the error of its own predictions" (Marshall et al., 2004). An external reinforcement signal was used to compare the data. The data showed that the developing robot was able to track the target robot, although it took more time using internal reinforcement signals.
I propose to do experiments based on this model but in more complex environments. I would like to investigate how (if at all) the developing robot will behave if another target robot is added to the environment. It is mentioned in the paper that "as the predictive system becomes better at anticipating the consequences of the control system's actions, novelty decreases" (Marshall et al., 2004). Thus, the developing robot gets bored of tracking the target robot since it can do that pretty well. What would happen if another moving robot is introduced at this point? The developing robot could keep tracking the first one, or shift its attention to the new robot. However, the addition of the second robot creates a complicated environment because now both the moving robots are interacting with each other as well, thus making the original target robot's behavior less predictable. Now the developing robot is presented with a novel scenario. It can either keep focusing on the original target robot, or change its focus to the new target robot, or keep track of both. However, would it even know that there are two robots present?
A series of experiments, similar to the those outlined in the paper, need to be done to answer these questions. The architecture and algorithm of the model also needs to be analyzed to determine whether the robot may be able to differentiate between the two robots. If it cannot, perhaps the algorithm can be enhanced to add that condition. My final result will be a paper outlining the experiments I conduct, the data collected and its analysis, and any conclusions that can be made.
