Sample network nodes: 2 input, 1 hidden, 1 output
Total nodes: 5 (1 bias)
Connections: from every input (0-2) to hidden (3)
from every input and hidden (0-3) to output (4)
1) Matrix of connections is built and random weights assigned
Layer to Layer from Weight
3 0 -0.494200
3 1 0.072800
3 2 0.638400
4 0 0.120400
4 1 -0.501200
4 2 -0.147000
4 3 0.106400
2) Training Loop - 1 Epoch:
Clear Slopes
Read 1 training line
Pass Forward
Pass Backward
Repeat for all training lines
Update Weights
Pass Forward:
Create Output Vector size = number of nodes
initialize with bias and training line [1,0,1]
Values of all remaining Output vector elements:
From first hidden node through all nodes in network[n]
For each connection [c] to that node
Output [n] = Activation(sum(training value at [c] * Weight of [c]
eg: Output[3] = Activation(sum(1*-0.4942 + 0*0.0278 + 1*0.6384))
Output[4] = Activation(sum(1* 0.1204 + 0*-0.5012 + 1*-0.1470 + Output[3]*0.1064))
Sigmoid Activation function:
if sum < -15 return -.5
if sum > 15 return .5
else return (1/(1.0 + e^(-sum)) -.5)
AsymSigmoid Activation function:
if sum < -15 return 0
if sum > 15 return 1
else return (1/(1.0 + e^(-sum)))
Output vector at end of first forward pass [1, 0, 1, 0.035988, -0.05692]
Pass Backward:
1) For all Output nodes:
Calculate ErrorSums[node]
diff = goal - output
Errorsums[node] = ln( (1+diff)/ (1 - diff))
2) Working backwards through all nodes[n] to the first hidden node:
Error[n] = Activation_prime(Output[n]) * Errorsums[n]
For each conection [c] to that node:
ErrorSums[c] += Error[n] * Weight[n][c]
Slopes[n][c] += Error[n] * output[c]
Activation_Prime for Sigmoid: SigmoidPrimeOffset (0.1) + (0.25 - value*value)
Activation_Prime for AsymSigmoid: SigmoidPrimeOffset (0.1) + (value * (1.0 - value)
Update Weights:
Variables:
Decay: -0.0001
Momentum: 0.9
MaxFact(mu): 1.75
ModeSwitchThreshold: 0.0
Epsilon: 0.5
SplitEpsilon 1.0
Shrink Factor 0.6363 MaxFact/(1+Maxfact)
From First Hidden node [n] through all:
For each connection to that node [c]:
Calculate next step:
If Delta_Weights[n][c] > Mode Switch Threshold or < -ModeSwitchThreshold
(last step was significantly positive or negative)
if Slopes[n][c] is positive or negative and SplitEpsilon is 1
next_step += (Epsilon * slopes[n][c]) / # connections to [n]
if Slopes[n][c] greater or less than shrink factor * prevSlopes[n][c]
(the slope was close to the previous slope)
next_step += MaxFactor * DeltaWeights[n][c] (largest step)
else
next_step += ((slopes / (prevSlops - Slopes)) * DeltaWeights[n][c]
Else - normal gradient decent:
if SplitEpsilon:
next_step += ((Epsilon * slopes[n][c]) / # connections to [n])
+ (Momentum * DeltaWeights[n][c])
Update vectors with next step:
DeltaWeights[n][c] = next_step
Weights[n][c] += next_step
Clear Slopes:
From first hidden node [n] through all:
for each connection [c] to that node:
PrevSlope[n][c] = slopes[n][c]
Slopes[n][c] = (Decay * Weights[n][c])
