UserPreferences

Autoassociative and Recurrent Networks


Auto-association Networks

Sometimes it is useful to build a network that is trained to reproduce its input as its output. Typically this type of network will be given a hidden layer size that is smaller than the input size. This forces the network to represent the input patterns in fewer dimensions, creating a compressed representation. These compressed representations often reveal interesting generalizations about the data. Below is an example of an auto-associative network that tries to reproduce all three-bit binary patterns. Here we use the method addLayers() to create input, hidden, and output layers of the given sizes and connect them up, rather than creating and connecting each layer separately as we did in the previous example. Since the inputs and outputs are identical in this kind of network, we can specify this by calling the method associate on the appropriate layers.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 

from pyrobot.brain.conx import *

n = Network()
n.addLayers(3,2,3)
n.setInputs([[1,0,0],[0,1,0],[0,0,1],[1,1,0],[1,0,1],[0,1,1],[1,1,1]])
n.associate('input','output')
n.setReportRate(25)
n.setEpsilon(0.5)
n.setMomentum(0.7)
n.setTolerance(0.2)
n.setResetEpoch(500)
n.setResetLimit(2)
n.train()
n.setLearning(0)
n.setInteractive(1)
n.sweep()

Above we specified the maximum number of epochs to run in one learning trial, using the method setResetEpoch. If the network has not been able to find a solution within 500 epochs, in this case, it will re-initialize the weights and begin again. You can specify the number of restarts allowed using the method setResetLimit.

Recurrent Networks

Certain problems require a memory of the recent past to be solved, but feed-forward networks are purely reactive and cannot succeed at such problems. Unlike feed-forward networks, recurrent networks allow backward connections that can be used to build up memory. One popular style of recurrent networks was developed by Elman and is called an Elman network or an SRN (which stands for Simple Recurrent Network). In an SRN, after forward propagating through the network, the hidden layer activations are copied to a context layer. The context layer, which is always the same size as the hidden layer, has weighted connections back to the hidden layer. Next we will demonstrate how to create an Elman-style recurrent network for dealing with a time dependent problem.

Suppose we have two sequences of symbols that we would like the network to remember, such as A, B, C and A, C, B. We will randomly choose a sequence, then show the network the sequence, one item at a time, and ask it to predict the next item. We will do this repeatedly without any breaks as shown here:

Input:  A, B, C, A, C, B, A, B, C, A, B, C ...
Output: B, C, A, C, B, A, B, C, A, B, C, A ...

Notice that certain positions in the stream of inputs are predicatable, but other positions are not. For example, after an A and a B, a C must follow. Similarly, after an A and C, a B must follow. Also at the end of either sequence an A must follow. But it is impossible to predict what will come after an A; it could either be a B or a C with equal probability. We'll need a way to encode the letters for the neural network. An "A" will be represented as the bit string 1 0 0, a "B" as 0 1 0, and a "C" as 0 0 1.

In the code below, we create a simple recurrent network. Next we add three layers. Both the input and output layers will be of length 3 in order to store our bit string encoding. Then we define the patterns we will be using. Next we define the two different sequences we will use. The network will automatically randomize the order in which these two sequences are presented to the network. Since the output of the network should be equal to the next input, we can specify this by calling the method predict on the approriate layers. Because we want the patterns to be treated as one continuous sequence, we should not initialize the context layer between sub-sequences.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
from pyrobot.brain.conx import *

n = SRN()
n.addSRNLayers(3,2,3)         # input of 3, hidden/context of 2, output of 3
n.setPatterns({"A":[1, 0, 0], # symbols we will use during training
               "B":[0, 1, 0],
               "C":[0, 0, 1]})
n.setInputs([['A','B','C'],   # sequences will be presented in random order
             ['A','C','B']])
n.predict('input','output')   # task is to predict next input
n.setReportRate(100)
n.setEpsilon(0.1)
n.setMomentum(0)
n.setTolerance(0.2)
n.setStopPercent(0.75)        # not all inputs are predictable
n.setResetEpoch(8000)
n.setResetLimit(0)
n.setSequenceType("random-segmented")
n.train()

n.setLearning(0)
n.setInteractive(1)
n.sweep()

After training is complete, look carefully at the interactive results. Notice that the network has learned to correctly predict the first and last position of every sub-sequence. For the unpredictable middle position in each sub-sequence, the network has learned that either a B or C will appear, but not an A. This is evident because the activations of the second and third position of the output are around 0.5 while the activation on the first position of the output is close to 0.0.

The above program will run until the network has a performance of the two sequences greater than 75%. Because the weights are changing during the epoch, the program can actually break out of the loop and, if tested on the sequence again, it might not perform the same. To make sure that the network really does get 75% of the outputs correct, we can use the cross validation mechanism to verify it. Usually, cross validation is used to measure the performance of a system on a subset held out for testing. We will, however, use the same set for testing and training in this example. You need only specify the inputs and targets for the input and output layers, respectively, as in the following example:

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 

from pyrobot.brain.conx import *

n = SRN()
n.addSRNLayers(3,2,3)         # input of 3, hidden/context of 2, output of 3
n.setPatterns({"A":[1, 0, 0], # symbols we will use during training
               "B":[0, 1, 0],
               "C":[0, 0, 1]})
n.setInputs([['A','B','C'],   # sequences will be presented in random order
             ['A','C','B']])
n.crossValidationCorpus = ({"input" : ['A','B','C','A','C','B']},
                           {"input" : ['A','C','B','A','B','C']} )
n.predict('input','output')   # task is to predict next input
n.setReportRate(100)
n.setEpsilon(0.1)
n.setMomentum(0)
n.setTolerance(0.2)
n.setStopPercent(0.75)        # not all inputs are predictable
n.setResetEpoch(8000)
n.setResetLimit(0)
n.setSequenceType("random-segmented")
n.train()

n.setLearning(0)
n.setInteractive(1)
n.sweep()

If you run the above program, you will likely notice that the final percent cross of the cross validation set (CV) is less than 75%. However, if you add this:

n.setUseCrossValidationToStop(1)

then the program will use the cross validation percentage (but only check every reportRate epochs). See Conx Implementation Details for more information.

Combining a Recurrent Network with an Autoassociative Network

The Recursive Auto Associative Memory (or RAAM) was developed by Jordan Pollack. See if you can figure out what this network does: PyroRAAMExample.

Further Reading

  1. Elman, J. (1990) Finding Structure in Time. Cognitive Science.14:179--211.

  2. Pollack, J. (1990) Recursive distributed representations. Artificial Intelligence. 46, 77-105.

More: SRNModuleExperiments Next: Robot Learning using Neural Networks Up: PyroModuleNeuralNetworks