UserPreferences

CascadeCorNetIssues


At this point, the cascade-crrelation netowork class seems to work in many cases. However, sometimes the weights will overflow. I don't know exactly what to do about this. It seems like a bug, but Fahlman mentioned this issue in his paper on Quickprop (and since Cascade-correlation uses quickprop) it might just be because certain parameters are set to unacceptable values. Initially, this problem happened on every run, but I changed the default value for self.epsilon from 4.0 to 0.55 (which is what Fahlman seems to use in many cases). Also, setting the patience parameter to 12 (Fahlman's default) makes the issue more rare but does not eliminate it. It also seems like the weights really should not overflow since there is a weight decay term. Espeically on such a well tested problem as XOR.

Does anyone have any ideas on what is going on? I have committed cascor.py to Pyro and if it is run as main it should have a good chance of producing a crash caused by overflow. Uncommenting the setup function in the cascade correlation network class can set the random seed to a specific value to make the runs predictable. The crash always seems to happen while training the candidates. This could mean that my correlation derivatives are being erroneously computed. Or perhaps the weight changes somehow don't happen correctly there.

The patience parameter seems to have a strong effect on the frequency of crashes. Setting epsilon to .55 and patience to 12 epochs produces an overflow frequency of about 1 in 200 trials on XOR. Doubling the patience to 24 causes overflow crashes to occur at around 182 per 200 trials on XOR. A smalled epsilon than .55 seems to make it less likely to learn the problem successfully and more likey to crash. Crashing usually seems to happen while training the candidate weights so something subtle could be wrong with that process. Perhaps a small error causes the weights to be trapped in some exponential growth pattern.

I tried turning off quickprop and using normal backprop. The weights still overflow, and of course, training is less effective. This almost suggests that something serious is wrong. Once again, crashing is not especially frequent.

UPDATE: I think I have fixed the problem. When I computed the derivative of the correlation with respect to the weights I used the conx function called ACTPRIME thinking that it was the derivative of the activation function. However, ACTPRIME is the derivative of the activation function IN TERMS OF the activation function. The reason this caused the weights to grow without bound and eventually overflow even with momentum==0 backprop (instead of quickprop) is because I applied ACTPRIME to the net input of a neuron. This meant that the squashing function was never applied, so a term in the weight change formula that is proportional to the weights is allowed to grow. The larger the absolute value of the net input, the larger the value of ACTPRIME of the net input. The larger magnitude of ACTPRIME of net input the larger the weight changes. The larger the weight changes, the larger abs(weights) becomes. Weights with large absolute values cause even larger netinputs on the next iteration. Thus the weights grow exponentially under certain conditions. If magnitude of the weights remain smaller than 1 for most of the training, they don't overflow since this exponential regime is not entered.

Anyway, at this point, Cascade correlation seems to be working. I am still doing tests and cleaning up code, comments, and docstrings.