1. Conx Implementation Details:Quickprop
Quickprop is a slight variation of the standard backpropagation of error algorithm. Quickprop and a set of additional speed-up tricks for backprop were explored by Scott Fahlman in the paper
Faster-Learning Variations on Back-Propagation: An Empirical Study. This page describes the changes made to conx, the backprop simulator of the Pyrobot project.
In conx, you can trigger this set of enhancements with a single command:
net = Network() net.quickprop = 1
These enhancements are broken down into the following 5 changes.
1.1. Sigmoid Prime Offset
This change has actually always been in conx. Fahlman noticed that some weights get stuck with normal backprop. He says:
-
This problem is due to the "flat spots" where the derivative of the sigmoid function approaches zero. In the standard back-propagation algorithm, as we back-propagate the error through the network, we multiply the error seen by each unit j by the derivative of the sigmoid function at o[j], the current output of unit j. This derivative is equal to o[j] * (1 - o[j]); I call this the sigmoid-prime function. Note that the value of the sigmoid-prime function goes to zero as the unit's output approaches 0.0 or to 1.0.
Fahlman's solution to this is simple: add a small value (0.1) to the sigmoid-prime result. In conx, that looks like:
def ACTPRIME(activation):
return (activation * (1.0 - activation)) + sigmoid_prime_offset
Most of the time this doesn't interfere with backprop and only helps to increase the rate of learning. However, in at least one case it was known to cause some oscillation. You can set this value with:
net.sigmoid_prime_offet = 0.0 # removes the effect net.sigmoid_prime_offet = 0.1 # the default
1.2. Error Function
Previous to conx version 1.187, the error at a unit was the straightforward difference between target and activation. Fahlman suggests using a function that exaggerates the difference the larger the error is in a non-linear fashion. One such method of doing this is the Hyperbolic arc-tangent:
def errorFunction(target, activation):
if self.hyperbolidError:
return Numeric.arctanh(target - activation)
else:
return target - activation
You can change this by simply setting hyperbolicError to 0 (will give you regular difference error) or by replacing the error function with one of your own:
net.hyperbolicError = 0 # OR: net.errorFunction = lambda t, a: t - a # linear difference
1.3. Symmetric Offset Representations
Fahlman reports that people have had success in speeding up backprop by using so-called "symmetric representations" for inputs and targets. These representations go from -0.5 to 0.5 rather than 0.0 to 1.0.
You can turn this on and off with:
net.symmetricOffset = 0.5 # turns on symmetric reps net.symmetricOffset = 0.0 # turns off symmetric reps
Also, there is a net.autoSymmetric that will automatically convert inputs and targets in the .setInputs and .setTargets methods. You need to set .symmetricOffset and .autoSymmetric before calling these methods. In addition, .setTargets and .setInputs is very simple, currently not handling patterns. It does a one-time replacement of each value with (value - symmetricOffset).
1.4. Split Epsilon
Fahlman noted that the effect of the learning parameter epsilon can have a varying effect depending on the number of weights coming into a unit. To counter this, he suggests dividing epsilon by the fan-in coming into a particular unit.
You can change this in conx with:
net.splitEpsilon = 1 # turns on epsilon divided by fan-in net.splitEpsilon = 0 # uses epsilon directly as the learning rate
1.5. Quickprop Weight Update Procedure
This is the most radical departure from backprop, but is quite simple.
In regular backprop, the change in a particular weight is:
dweight[j][i] = epsilon * wed[j][i] + momentum * dweight[j][i]
where wed[j][i] is the first derivative (slope) of the weight. Fahlman makes an approximation to the second derivative and computes the change to go directly to the minima of the parabola:
dweight[j][i] = dweight[j][i] * wed[j][i] / (wedLast[j][i] - wed[j][i])
It is a little more complicated in that it needs to take care of a couple of special cases. Two additional parameters are introduced here. The first is called net.mu, and sets the maximum growth factor. Also, a net.decay value is used to keep weights from growing too large.
In conx, you can turn on all of the above and quickprop with:
net.quickprop = 1
Quickprop seems fairly sensitive to the following parameters:
net.mu # typical ranges between 1.75 and 2.50 net.maxRandom # typical ranges between 0.5 and 4.0 net.epsilon # typical ranges between 0.01 and 5.0
Recall that net.maxRandom determines that the random weights will be initialized between -maxRandom and +maxRandom. Quickprop likes the weights much further from zero than you typically have with regular backprop.
1.6. Variations
You can use each of these tricks independently of one another. However, all of them seem to be additive and all work well together.
1.7. Example
from pyrobot.brain.conx import Network net = Network() net.quickprop = 1 # set first, before any changes to default qp settings or loading of inputs or targets net.maxRandom = 1.0 # set before layers are created, or after a net.initialize() net.addLayers(2, 2, 1) net.mu = 2.25 net.epsilon = 4.0 net.setInputs( [[0, 0], [0, 1], [1, 0], [1, 1]] ) net.setTargets( [[0], [1], [1], [0]] ) net.resetLimit = 5 net.resetEpoch = 40 net.train() # to do it again: # net.initialize() # net.train()
