UserPreferences

Robot Learning using Neural Networks


Robot Learning Using Neural Networks

The examples we have used so far are useful in understanding how neural networks work, but ultimately we want to create a neural network brain for a robot. That is, how does a robot learn to perform a task? The simplest robot learning tasks are navigational tasks such as learning to move about without bumping into obstacles, or following a wall, or learning to detect a wall.

As we have seen from our earlier examples, in order to train a neural network, the input data is to be presented at the input layer and then, through a process of feedforward propagation, the neural network produces an output at the output layer. In the case of robot learning, typically in navigational behaviors, the inputs into the neural network are the robot's sensor values, and the outputs are commands for the robot's motors. Depending on the environment you are using you may do the learning experiments on real robots or on simulated ones. Additionally, you may do the learning offline or online.

Offline Learning/Training

In the case of offline learning, you first collect a training data set by running a robot using a program that exhibits the desired behavior. That is, while the robot is actually carrying out the desired task, you record its sensor and motor values. These will then become the input and target output values during training. Since all the data is collected prior to training, we call this offline training. Once the network has been trained, you can deploy the network on the robot to examine and test its behavior. Thus, for offline training, one uses the following steps:

  1. Write a behavior (most likely using direct control) that accomplishes the training task

  2. Extract a data set from the robot carrying out the task

  3. Scale and sample the data to create a training data set

  4. Train a network with the data

  5. Deploy and test the network on the robot

  6. Analyze the results

Online Learning

In the case of online learning, you train the robot in real time, on-the-fly. That is, you get the robot's sensor readings, you feed these to the neural network, propagate and get output, you compare the output with the desired output (i.e. what would the correct response should have been), you compute the error and do a step of neural network learning, you move the robot, and repeat the process until the total error is within an acceptable range. Once the network has learned, you can turn off the learning, and then use the network to control the robot. The following steps are preformed for online learning:

  1. Write a network control program with a built-in trainer to compute the correct outputs

  2. Repeat the following until the error reduces to an acceptable level:

    1. Get sensor readings

    2. Scale them

    3. Present them to the input

    4. Propagate and compute network's output

    5. Use the trainer to compute correct output

    6. Compute the error and use it for a learning step

  3. Turn learning off

  4. Deploy and test the network on the robot

  5. Analyze the results

Example: Detecting an obstacle using offline learning

Let us take a very simple navigational task: The robot goes forward and stops when it is about to run into an obstacle. This is a very simple behavior that is easily accomplished using the following direct control program:

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
# robot goes forward and then slows to a stop when it detects something 

from pyrobot.brain import Brain
from pyrobot.brain.conx import *
from time import *

class NNBrain(Brain):

   # Give the front two sensors, decide the next move 
   def determineMove(self, front):
      if front[0] < 0.5 or front[1] < 0.5:   # about to hit soon, STOP 
         print "collision imminent, stopped"
         return(0)
      elif front[0] < 0.8 or front[1] < 0.8:  # detecting something, SLOWDOWN 
         print "object detected"
         return(0.1)
      else:
         print "clear"      # all clear, FORWARD 
         return(0.3)

   def step(self):
      front = [s.distance() for s in self.robot.range["front"]]
      translation = self.determineMove(front)
      print "front sensors", front[0], front[1]
      self.move(translation, 0.0)

def INIT(engine):
   return NNBrain('NNBrain', engine)

In fact, the above program itself can be used to collect data for offline training. But first, we must decide on the architecture of the neural network. We will use two inputs, into which we will feed the front two sensor values (scaled). Let us have 1 unit in the hidden layer and 1 in the output layer. Thus, the training data set will consist of two front sensor values and one target translation value.

Collecting Data

The above program is modified to collect the data set. We will collect 10000 pieces of input-output values. The input values will be stored in a file called frontsensors.dat and the output values will be stored in a file called translatetargets.dat. Once 10000 values have been collected, the robot stops and the program can be terminated.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 

# brain to collect data for offline learning
# robot goes forward and then slows to a stop when it detects something

from pyrobot.brain import Brain
from pyrobot.brain.conx import *
from time import *

def saveListToFile(ls, file):
   for i in range(len(ls)):
      file.write(str(ls[i]) + " ")
   file.write("\n")

class CollectDataBrain(Brain):

   def setup(self):
      self.counter = 0
      self.countstopping = 0
      self.datafile1 = open("frontsensors.dat", "w")
      self.datafile2 = open("translatetargets.dat", "w")
      self.maxvalue = self.robot.range.getMaxvalue()
      print "max sensor value is ", self.maxvalue

   def determineMove(self, front):
      if front[0] < 0.5 or front[1] < 0.5:
         print "collision imminent"
         self.countstopping = self.countstopping+1
         return(0)
      elif front[0] < 0.8 or front[1] < 0.8:
         print "object detected"
         return(0.08)
      else:
         print "clear"
         return(0.15)
   def scale(self, val):
      x = val / self.maxvalue
      if x > 1:
         print "scaled > 1"
         return 1
      else:
         return x
   def step(self):
      front = [s.distance() for s in self.robot.range["front"]]
      translation = self.determineMove(front)
      print "front sensors", front[0], front[1]
      if self.counter > 1000:
         self.datafile1.close()
         self.datafile2.close()
         print "done collecting data"
         self.stop()
      elif self.countstopping > 5:
         self.move(-0.3, 0.2)
         sleep(0.5)
         self.countstopping = 0
      else:
         print "move", self.counter, translation
         saveListToFile([self.scale(front[0]), self.scale(front[1])],
                        self.datafile1)
         saveListToFile([translation] , self.datafile2)
         self.move(translation, 0.0)
         self.counter += 1

def INIT(engine):
   return CollectDataBrain('CollectDataBrain', engine)

Training the Network

Next, we will create the network and train it on the data set we just collected. This is shown in the program below:

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 

# Train a network offline
# Inputs: two scaled front sensor readings 
# Outputs: one translate reading (unscaled) 

from pyrobot.brain.conx import *
from pyrobot.system.log import *

def setFromFile(filename, cols = None, delim = ' '):
   fp = open(filename, "r")
   line = fp.readline()
   lineno = 1
   lastLength = None
   data = []
   while line:
      linedata = [float(x) for x in line.strip().split(delim)]
      if cols == None: # get em all 
         newdata = linedata
      else: # just get some cols 
         newdata = []
         for i in cols:
            newdata.append( linedata[i] )
      if lastLength == None or len(newdata) == lastLength:
         data.append( newdata )
      else:
         raise "DataFormatError", "line = %d" % lineno
      lastLength = len(newdata)
      lineno += 1
      line = fp.readline()
   fp.close()
   print "length of data array is", len(data)
   return data

def saveListToFile(ls, file):
   for i in range(len(ls)):
      file.write(str(ls[i]) + " ")
   file.write("\n")

# Create the network 
n = Network()
n.addLayers(2,1,1)
# Set learning parameters 
n.setEpsilon(0.3)
n.setMomentum(0.0)
n.setTolerance(0.05)
# set inputs and targets (from collected data set) 
n.setInputs(setFromFile('frontsensors.dat'))
n.setTargets(setFromFile('translatetargets.dat'))
# Logging 
log = Log(name = 'E05M01.txt')
best = 0
for i in xrange(0,1000,1):
   tssError, totalCorrect, totalCount, totalPCorrect = n.sweep()
   correctpercent = (totalCorrect*0.1) / (totalCount*0.1)
   log.writeln( "Epoch # "+ str(i)+ " TSS ERROR: "+ str(tssError)+
                " Correct: "+ str(totalCorrect)+ " Total Count: "+
                str(totalCount)+ " %correct = "+ str(correctpercent))
   if best < correctpercent:
      n.saveWeightsToFile("E05M01.wts")
      best = correctpercent
print "done"

As you can see from the above, the training part here is similar to the earlier examples. Notice that we are using the data set collected earlier. After the network has been trained, the weights are stored in a file E05M01weights.wts (for Epsilon = 0.3 and Momentum = 0.0). Once the training is complete, you can also record and plot the total sum-squared error to see how the learning progresses. Notice that we expect the learned neural network to perform within a tolerance of 5%. Learning may take anywhere from 500 to 50,000 epochs depending on the parameters that are set. You should experiment with these values as before to get a good idea of the robustness of the end results.

Deploying and Testing

Once the network has learned, it is time to actually use it as a brain to control the robot and observe its behavior. Since we saved the final weights above, we can easily reconstruct the network and use it. This is shown below:

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 

# Load in saved weights from offline training
# Inputs are the two front sensor readings
# Output is a translate value, used to control the robot

from pyrobot.brain import Brain
from pyrobot.brain.conx import *
from time import *

class NNBrain(Brain):
   def setup(self):
      self.n = Network()
      self.n.addLayers(2,1,1)
      self.maxvalue = self.robot.range.getMaxvalue()
      self.doneLearning = 1
      self.n.loadWeightsFromFile("E05M01.wts")
      self.n.setLearning(0)
   def scale(self, val):
      x = val / self.maxvalue
      if x > 1:
         return 1.0
      else:
         return x
   def step(self):
      # Set inputs
      front = [s.distance() for s in self.robot.range["front"]]
      self.n['input'].copyActivations([self.scale(front[0]),
                                       self.scale(front[1])])
      self.n.propagate()
      translateActual = self.n['output'].activation[0]
      print "move", translateActual
      self.translate(translateActual)

def INIT(engine):
   return NNBrain('NNBrain', engine)

Exercises

  1. Try out the above programs as described. Depending on the robot you choose, you may need to adjust the motor values and scaling.

  2. Once you have all the components, you should try varying various network parameters and observe their effects on the behavior of the robot.

  3. Perform the same steps for training a robot to follow a wall.

Example: Detecting an obstacle using online learning

Next, we will carry out the same training task as above using online learning.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 
 65 
 66 
 67 
 68 
 69 
 70 
 71 
 72 
 73 
 74 
 75 

from pyrobot.brain import Brain
from pyrobot.brain.conx import *
from time import *

class NNBrain(Brain):
   def setup(self):
      self.net = Network()
      self.net.addLayers(3,2,2)
      self.net.setEpsilon(0.25)
      self.net.setMomentum(.1)
      self.maxvalue = self.robot.range.getMaxvalue()
      self.counter = 0
      self.doneLearning = 0

   # Scale the range readings to be between 0 and 1.  Also make close 
   # obstacles register high readings, and distance obstacles low readings. 
   def scale(self, val):
      return (1 - (val / self.maxvalue))

   # The robot will get translate and rotate values in the range [-0.5,0.5],  
   # but the neural network will generate outputs in the range [0,1].  
   def toNetworkUnits(self, val):
      return (val + 0.5)

   def toRobotUnits(self, val):
      return (val - 0.5)

   def determineTargets(self, left, front, right):
      if front < 1.0:
         print "front"
         return([0.0, 0.5])
      elif left < 1.0:
         print "left"
         return([0.0, -0.5])
      elif right < 1.0:
         print "right"
         return([0.0, 0.5])
      else:
         print "clear"
         return([0.5, 0.0])

   def step(self):
      if self.doneLearning:
          self.net.setLearning(0)
      else:
          self.net.setLearning(1)
      # Set inputs 
      left  = min([s.distance() for s in self.robot.range["front-left"]])
      front = min([s.distance() for s in self.robot.range["front"]])
      right = min([s.distance() for s in self.robot.range["front-right"]])
      inputs = map(self.scale, [left, front, right])
      trnTarget,rotTarget = self.determineTargets(left, front, right)
      targets = [self.toNetworkUnits(trnTarget), self.toNetworkUnits(rotTarget)]
      # Learn 
      self.net.step(input=inputs,output=targets) # input and output are the names of the layers
      trnActual = self.toRobotUnits(self.net['output'].activation[0])
      rotActual = self.toRobotUnits(self.net['output'].activation[1])
      # Check if the robot is stuck and give it a kick to unjam it 
      if self.robot.stall:
          print "stuck--reversing"
          self.move(-0.5, 0.0)
          sleep(0.5)
      else:
          if self.doneLearning:
              print "move", trnActual, rotActual
              self.move(trnActual, rotActual)
          else:
              print "step", self.counter, "target", trnTarget, rotTarget, \
                    "network", trnActual, rotActual
              self.move(trnTarget, rotTarget)
      self.counter += 1

def INIT(engine):
   return NNBrain('NNBrain', engine)

You can start and stop the program using the Pyro window controls. Once the learning is done, weights are saved. You can either set the learning off (by setting the value of the variable doneLearning to 1) and then use the same program to deploy and test on the robot, or use the program shown in the offline learning section above (using the saved weights).

Example: Avoiding Obstacles Using Online Learning

In this example, we will try to train a robot to move around without hitting obstacles. The teacher for this brain will be very similar to the obstacle avoidance method used in PyroModuleDirectControl.

We will create a neural network with an input layer of size three, a hidden layer of size two, and an output layer of size two. The three inputs will be the minimum range values on the front-left, front, and front-right sensors. We will scale these values so that they are in the range [0,1] and we will also reverse them so that a 1 means the robot is contacting an obstacle and a 0 means the robot senses nothing in its way. The two outputs will be the translate and rotate values for the robot's next move. The neural network will produce outputs in the range [0,1], but we would like our move values to be in the range [-0.5, 0.5]. There are two methods provided to do the conversion between robot-based units and network-based units.

In the program given below, the setup method creates the neural network, and sets the parameters for learning. It also creates some other useful class variables. The counter keeps track of how many steps have been executed. The flag doneLearning can be used to turn learning on and off.

The method determineTargets is the teacher for this network. It is a very simple obstacle avoidance strategy. When it detects an obstacle in front or to the right, it tells the robot to turn in place to the left. When it detects an obstacle to the left, it tells the robot to turn in place to the right. Otherwise, it tells the robot to go straight. It returns a list containing the appropriate translate and rotate values.

The method step takes care of running the network by doing the following tasks:

In addition, this method checks to see if the robot is stuck. If so, it does not execute the move, but instead tries to back out of the jam.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 
 65 
 66 
 67 
 68 
 69 
 70 
 71 
 72 
 73 
 74 
 75 

from pyrobot.brain import Brain
from pyrobot.brain.conx import *
from time import *

class NNBrain(Brain):
   def setup(self):
      self.net = Network()
      self.net.addLayers(3,2,2)
      self.net.setEpsilon(0.25)
      self.net.setMomentum(.1)
      self.maxvalue = self.robot.range.getMaxvalue()
      self.counter = 0
      self.doneLearning = 0

   # Scale the range readings to be between 0 and 1.  Also make close 
   # obstacles register high readings, and distance obstacles low readings. 
   def scale(self, val):
      return (1 - (val / self.maxvalue))

   # The robot will get translate and rotate values in the range [-0.5,0.5],  
   # but the neural network will generate outputs in the range [0,1].  
   def toNetworkUnits(self, val):
      return (val + 0.5)

   def toRobotUnits(self, val):
      return (val - 0.5)

   def determineTargets(self, left, front, right):
      if front < 1.0:
         print "front"
         return([0.0, 0.5])
      elif left < 1.0:
         print "left"
         return([0.0, -0.5])
      elif right < 1.0:
         print "right"
         return([0.0, 0.5])
      else:
         print "clear"
         return([0.5, 0.0])

   def step(self):
      if self.doneLearning:
          self.net.setLearning(0)
      else:
          self.net.setLearning(1)
      # Set inputs 
      left  = min([s.distance() for s in self.robot.range["front-left"]])
      front = min([s.distance() for s in self.robot.range["front"]])
      right = min([s.distance() for s in self.robot.range["front-right"]])
      inputs = map(self.scale, [left, front, right])
      trnTarget,rotTarget = self.determineTargets(left, front, right)
      targets = [self.toNetworkUnits(trnTarget), self.toNetworkUnits(rotTarget)]
      # Learn 
      self.net.step(input=inputs,output=targets)  # input and output are layer names
      trnActual = self.toRobotUnits(self.net['output'].activation[0])
      rotActual = self.toRobotUnits(self.net['output'].activation[1])
      # Check if the robot is stuck and give it a kick to unjam it 
      if self.robot.stall:
          print "stuck--reversing"
          self.move(-0.5, 0.0)
          sleep(0.5)
      else:
          if self.doneLearning:
              print "move", trnActual, rotActual
              self.move(trnActual, rotActual)
          else:
              print "step", self.counter, "target", trnTarget, rotTarget, \
                    "network", trnActual, rotActual
              self.move(trnTarget, rotTarget)
      self.counter += 1

def INIT(engine):
   return NNBrain('NNBrain', engine)

After training for a while, you might want to see what the output activations look like with learning turned off. In the Pyro command line, enter:

brain.net.setLearning(0)

A neural network robot controller

Now that you have a neural network accurately predicting what the innate teacher controller will do, we can use that output to actually control the robot. To test this, enter the following line at the Pyro command line:

brain.netControlled = 1

This will cause the motor commands to be gotten from the output layer's prediction of what the network predicted.

  1. How does the neural network controller compare to the teacher?

  2. How could you test it?

A more complex example

To see an example of a more complex neural network application go to PyroModuleNeuralNetworksAdvanced.

Further Reading

  1. Omidvar, O., Van der Smagt, P., (Eds.). (1997) Neural Systems for Robotics. New York : Academic Press

Up: PyroModuleNeuralNetworks Next: Conx Implementation Details