UserPreferences

QuickPropSummary


QuickProp Algorithm
Sample network nodes:       2 input, 1 hidden, 1 output   
Total nodes:                5 (1 bias)
Connections:                from every input (0-2) to hidden (3)
                            from every input and hidden (0-3) to output (4)

1)  Matrix of connections is built and random weights assigned
        
        Layer to        Layer from      Weight
        3                    0          -0.494200
        3                    1           0.072800
        3                    2           0.638400
        4                    0           0.120400
        4                    1          -0.501200
        4                    2          -0.147000
        4                    3           0.106400



2)  Training Loop - 1 Epoch:

        Clear Slopes
        
        Read 1 training line
             Pass Forward
             Pass Backward
        Repeat for all training lines

        Update Weights
Pass Forward:
        Create Output Vector size = number of nodes
                   initialize with bias and training line  [1,0,1]

        Values of all remaining Output vector elements:

        From first hidden node through all nodes in network[n]
                For each connection [c] to that node
                Output [n] = Activation(sum(training value at [c] * Weight of [c]

            eg:  Output[3] = Activation(sum(1*-0.4942  + 0*0.0278  + 1*0.6384))
                 Output[4] = Activation(sum(1* 0.1204  + 0*-0.5012 + 1*-0.1470 + Output[3]*0.1064))

        Sigmoid Activation function:
                if sum < -15 return -.5
                if sum >  15 return  .5
                else return (1/(1.0 + e^(-sum)) -.5) 

        AsymSigmoid Activation function:
                if sum < -15 return 0
                if sum >  15 return 1
                else return (1/(1.0 + e^(-sum))) 


        Output vector at end of first forward pass [1, 0, 1, 0.035988, -0.05692]
Pass Backward:
1)  For all Output nodes:
 
        Calculate ErrorSums[node]
                  diff = goal - output
                  Errorsums[node] = ln( (1+diff)/ (1 - diff))

2)  Working backwards through all nodes[n] to the first hidden node:

        Error[n] = Activation_prime(Output[n]) * Errorsums[n]
                 For each conection [c] to that node:
                   ErrorSums[c] += Error[n] * Weight[n][c]
                   Slopes[n][c]    += Error[n] * output[c]

        Activation_Prime for Sigmoid:  SigmoidPrimeOffset (0.1) + (0.25 - value*value)
        Activation_Prime for AsymSigmoid:  SigmoidPrimeOffset (0.1) + (value * (1.0 - value)
Update Weights:
          Variables:
                Decay:                  -0.0001
                Momentum:                0.9
                MaxFact(mu):             1.75
                ModeSwitchThreshold:     0.0
                Epsilon:                 0.5
                SplitEpsilon             1.0
                Shrink Factor            0.6363    MaxFact/(1+Maxfact)


From First Hidden node [n] through all:
       For each connection to that node [c]:
               
        Calculate next step:

           If Delta_Weights[n][c] >  Mode Switch Threshold or < -ModeSwitchThreshold
                (last step was significantly positive or negative)

                     if Slopes[n][c]  is positive or negative  and SplitEpsilon is 1
                        next_step += (Epsilon * slopes[n][c]) / # connections to [n]

                     if Slopes[n][c] greater or less than shrink factor * prevSlopes[n][c] 
                            (the slope was close to the previous slope)
                 
                        next_step += MaxFactor * DeltaWeights[n][c] (largest step)
                      else
                        next_step += ((slopes / (prevSlops - Slopes)) * DeltaWeights[n][c]

           Else - normal gradient decent:

             if SplitEpsilon:
                next_step += ((Epsilon * slopes[n][c]) / # connections to [n])
                          + (Momentum * DeltaWeights[n][c])
    
        Update vectors with next step:
             
              DeltaWeights[n][c] = next_step
              Weights[n][c]      += next_step

Clear Slopes:

        From first hidden node [n] through all:
                 for each connection [c] to that node:

                 PrevSlope[n][c] = slopes[n][c]
                 Slopes[n][c] = (Decay * Weights[n][c])