UserPreferences

Incremental Neural Networks


This page has been deprecated. Please see the new Cascade Correlation Network page.

1. Incremental Neural Networks

You will need version 1.161, or later, of conx.py to run these experiments.

All of the neural networks we have looked at in this module have had static structures. That is, none of the models change the layers or connections during training. They have only adjusted weights (one could claim that setting a weight to zero effectively removes the incoming node from the network). We will now look at some models that do dynamically alter their architectures.

1.1. IncrementalNetwork class

This network class begins with no hidden layers, and incrementally adds them as it needs them. The class has a special "candidate" layer of units from which it incrementally draws new nodes as hidden layers (or part of a hidden layer). As new candidate units are recruited into the main network, the new unit's input weights are frozen. The idea is that the candidate nodes represent some feature detector, and is frozen from ever changing afterwards. This saves time in not having to retrain these weights. (This is similar to the RAVQ and SOM models examined in PyroModuleRAVQ, GovernorForNeuralNetworks, and PyroModuleSelfOrganizingMap)

New hidden units can be put into the network in two ways: cascade, or parallel. Cascading hiddens have the output of one going into all later hiddens. Parallel hiddens appear all on one level.

Candidate nodes are trained using standard backprop on error. This is not cascade correlation in which hiddens are trained to maximize error variance, but has many similarities. In this network, the candidate nodes are trained with backpropagation of error, but don't actually contribute to the activation on the output (or any where else) until they are recruited.

Let's take a look at an example of an incrementally-growing neural network:

from pyrobot.brain.conx import *
net = IncrementalNetwork("cascade") # "parallel" or "cascade"
net.addLayers(2, 1) # sizes
net.addCandidateLayer(8) # size

We will build an incremental network to learn the XOR problem. The net.addLayers() adds layers automatically by size. This creates a 2-input 1-output network with no hidden layer. We set the new hidden layers to be added in a cascading (rather than parallel) style. We create a candidate layer containing 8 units. These are 8 units that will be trained by backprop of error, but will not contribute to the output layer's activations.

Now, we define the XOR problem as before, and set a low tolerance:

net.setInputs( [[0, 0], [0, 1], [1, 0], [1, 1]])
net.setTargets([[0], [1], [1], [0]])
net.tolerance = .25

Finally, we train the network:

net.reportRate = 100
cont = 0
while True:
    net.train(750, cont = cont)
    if not net.complete:
        net.recruitBest()
        cont = 1
    else:
        break

First, we train for 750 steps on the 2-layer network. Recall that it is impossible for a 2-layer network to learn a non-linear separation problem. However, we are also training on the candidate nodes as well.

While we haven't learned the pattern, we recruit the best node from the candidate layer, and continue training for another 750 epochs.

When the network has finished learning, we can then see the final network and check to make sure that it has learned the problem:

net["candidate"].active = 0 # make sure it is not effecting outputs
net.displayConnections()
net.interactive = 1
net.sweep()

Here is the whole program:

from pyrobot.brain.conx import *

net = IncrementalNetwork("cascade") # "parallel" or "cascade"
net.addLayers(2, 1) 
net.addCandidateLayer(8) 
net.setInputs( [[0, 0], [0, 1], [1, 0], [1, 1]])
net.setTargets([[0], [1], [1], [0]])
net.tolerance = .25
net.reportRate = 100
cont = 0
while True:
    net.train(750, cont = cont)
    if not net.complete:
        net.recruitBest()
        cont = 1
    else:
        break
net["candidate"].active = 0
net.displayConnections()
net.interactive = 1
net.sweep()

This should learn within 500 or 1000 epochs. We know that a non-batch learning network can learn it in less than 100. Why does IncrementalNetwork take longer? What good is IncrementalNetwork?

1.2. Variations

Parallel or cascade?

What does it mean to be the best candidate?

Cascade correlation?