1. Introduction to Connectionism
This is a brief introduction to connectionist networks (also called artificial neural networks). It follows the text from
Chapter 3 of
Learning to See Analogies: A Connectionist Exploration. Also, the pseudocode is Python, and the actual code examples are from
PyroRobotics.org.
1.1. History of Artificial Neural Networks
Highlights through the history of artificial neural networks:
-
1943 Warren McCulloch (a neurophysiologist), Walter Pitts (a mathematician) explored how a neuron might work by making a model with electrical cicuits.
-
1949 Donald Hebb wrote The Organization of Behavior which described how pathways could be strengthed over time
-
1956 Dartmouth Summer Research Project on Artificial Intelligence
-
1960s Frank Rosenblatt (a neurobiologist) began work on the Perceptron, a two-layer model
-
1969 Marvin Minsky and Seymour Papert wrote the book Perceptrons which criticized the lack of rigour in the field, and proved that a simple problem (XOR) could not be solved with a two-layer network like the Perceptron.
-
1974 Paul Werbos developed the backpropagation learning method
-
1975 Fukushima developed trainable multilayered network model Cognitron for recognizing handwritten characters
-
1985 John Hopfield (physics) presented a paper to the national Academy of Sciences showing how such models could be used in practical ways
-
1986 David Rumlehart, Jay McClelland and their research team published the Parallel Distributed Processing (PDP) books (two volumes and a handbook).
-
1988 Steve Grossberg and Gail Carpenter develop Adaptive Resonance Theory
-
1990s - 2000s A variety of variations have been explored; many have left the field
-
"...our intuitive judgment that the extension (to multilayer systems) is sterile"
Personally, I feel that AI has split into two distinct paradigms:
-
rational models - based on a centralized database of facts, rules, symbols and logic engines
-
emergent models - based on a decentralized network of activation, flow, and numbers
These two pardagims, in my opinion, have little to do with one another. That is, emergent models can certainly show rational, rule-like behavior. But the implementation of emergent models have nothing to do with how rational models operate.
Let's explore this issue.
1.2. Machine Learning
Given an input A, the system can respond back with output B. This could be done with a simple table:
| Input | Output |
| A | B |
| C | D |
| E | F |
| G | H |
| I | J |
| ... | ... |
You see any problems with this approach?
1.3. Network Mechanics
A neural network is:
-
A Neural Network is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns.
Items:
-
Network, composed of
-
layers, composed of
-
units/nodes
-
an activation function
-
weights between layers
-
Training patterns, composed of
-
input patterns
-
target patterns
-
Testing patterns, saved to test training
These training patterns are presented to the network repeatedly (in epochs).
The network can be thought of as computing a single function, g(A) -> B. But it may be more accurate to think of the pattern A being associated with output B.
Let's see how this would work.
1.3.1. Forward propagation of activation
Networks are often group into layers. This makes the implementation easy to compute.
|
Proof: ANNs are Turing Machine equivalent. See Franklin and Garzon.
Proof: A three-layer network can compute anything that an N-layer can compute; if a many-layered network can compute something, then there is a three-layer network that can compute the same thing. However, that doesn't say anything about whether that computation can be learned!
The node:
|
The net input is a weighted sum of all the incoming activations plus the node's bias value:
for m in toNodes:
netInput[m] = bias[m]
for i in fromNodes:
netInput[m] += (weight[m][i] * activation[i])
where weight[m][i] is the weight, or connection strength, from the i-th node to the m-th node, activation[i] is the activation signal of the i-th node, and bias[m] is the bias value of the m-th node.
After computing the net input, each node has to compute its output activation. The activation function used in backprop networks is generally:
def activationFunction(netInput):
return 1.0 / (1.0 + exp(-netInput))
for m in toNodes:
activation[m] = activationFunction(netInput[m])
However, without training, this can't do much. You already know one method of getting some weights. How would that work?
Hint: Darwin.
That would work, but would be slow. Why? A better method is the backpropagation of error.
1.3.2. Backpropagation of Error
for m in toNodes:
error[m] = (desiredOutput[m] - actualOutput[m])
delta[m] = error[m] * actualOutput[m] * (1 - actualOutput[m])
for i in fromNodes:
weightUpdate[m][i] = (EPSILON * delta[m] * actualOutput[i]) + (MOMENTUM * weightUpdate[m][i])
1.3.3. Example Representations
1.3.4. Example Problem
Building Neural Networks using Conx
1.4. Related Networks
1.5. Why Neural Networks?
-
They can learn a function that we may not know how to program.
-
When they learn, they generalize.
-
They are the easiest way to show different levels of computation
# File: NNxor.py
# import all the conx API
from pyrobot.brain.conx import *
# create the network
n = Network()
# add layers in the order they will be connected
n.addLayer('input',2) # The input layer has two nodes
n.addLayer('hidden', 2) # ADD HIDDEN
n.addLayer('output',1) # The output layer has one node
n.connect('input','hidden','output') # ADD HIDDEN
# provide training patterns (inputs and outputs)
n.setInputs([[0.0,0.0],[0.0,1.0],[1.0,0.0],[1.0,1.0]])
n.setOutputs([[0.0],[1.0],[1.0],[0.0]])
# set learning parameters
n.setEpsilon(0.5)
n.setTolerance(0.2)
n.setReportRate(1)
# learn
n.train()
How does this generalize? Run like this python -i NNxor.py:
>>> n.propagate(input = [.5, .5])
Is that what you would expect? How does the network generalize overall? Add the following to your file:
def symbol(n):
return ".123456789#"[int(round(n * 10))]
def test(net):
resolution = 50.0
for i1 in range(0, int(resolution)):
print " ",
for i2 in range(0, int(resolution)):
output = net.propagate(input = [i1/resolution, i2/resolution])
print symbol(output[0]),
print
print
And try this:
>>> test(n) >>> n.initialize() >>> n.train() >>> test(n)
1.6. Levels of Computation
Example of Holistic Computation:
A Case Study of RAAM, and the associated
paper.
1.7. Points to Ponder
-
Can an artificial neural network learn to do something that it wasn't explicitly trained?
-
Does an artificial neural network just learn a set of rules?
