1. Introduction to Connectionism
This is a brief introduction to connectionist networks (also called artificial neural networks). It follows the text from Chapter 3 of Learning to See Analogies: A Connectionist Exploration. Also, the pseudocode is Python, and the actual code examples are from pyrorobotics.com.
1.1. History of Artificial Neural Networks
Highlights through the history of artificial neural networks:

1943 Warren McCulloch (a neurophysiologist), Walter Pitts (a mathematician) explored how a neuron might work by making a model with electrical cicuits.

1949 Donald Hebb wrote The Organization of Behavior which described how pathways could be strengthed over time

1956 Dartmouth Summer Research Project on Artificial Intelligence

1960s Frank Rosenblatt (a neurobiologist) began work on the Perceptron, a twolayer model

1969 Marvin Minsky and Seymour Papert wrote the book Perceptrons which criticized the lack of rigour in the field, and proved that a simple problem (XOR) could not be solved with a twolayer network like the Perceptron.

1974 Paul Werbos developed the backpropagation learning method

1975 Fukushima developed trainable multilayered network model Cognitron for recognizing handwritten characters

1985 John Hopfield (physics) presented a paper to the national Academy of Sciences showing how such models could be used in practical ways

1986 David Rumlehart, Jay McClelland and their research team published the Parallel Distributed Processing (PDP) books (two volumes and a handbook).

1988 Steve Grossberg and Gail Carpenter develop Adaptive Resonance Theory

1990s  2000s A variety of variations have been explored; many have left the field

"...our intuitive judgment that the extension (to multilayer systems) is sterile"
Personally, I feel that AI has split into two distinct paradigms:

rational models  based on a centralized database of facts, rules, symbols and logic engines

emergent models  based on a decentralized network of activation, flow, and numbers
These two pardagims, in my opinion, have little to do with one another. That is, emergent models can certainly show rational, rulelike behavior. But the implementation of emergent models have nothing to do with how rational models operate.
Let's explore this issue.
1.2. Machine Learning
Given an input A, the system can respond back with output B. This could be done with a simple table:
Input  Output 
A  B 
C  D 
E  F 
G  H 
I  J 
...  ... 
You see any problems with this approach?
1.3. Network Mechanics
A neural network is:

A Neural Network is an interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron. The processing ability of the network is stored in the interunit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns.
Items:

Network, composed of

layers, composed of

units/nodes

an activation function

weights between layers

Training patterns, composed of

input patterns

target patterns

Testing patterns, saved to test training
These training patterns are presented to the network repeatedly (in epochs).
The network can be thought of as computing a single function, g(A) > B. But it may be more accurate to think of the pattern A being associated with output B.
Let's see how this would work.
1.3.1. Forward propagation of activation
Networks are often group into layers. This makes the implementation easy to compute.
Proof: ANNs are Turing Machine equivalent. See Franklin and Garzon.
Proof: A threelayer network can compute anything that an Nlayer can compute; if a manylayered network can compute something, then there is a threelayer network that can compute the same thing. However, that doesn't say anything about whether that computation can be learned!
The node:
The net input is a weighted sum of all the incoming activations plus the node's bias value:
for m in toNodes: netInput[m] = bias[m] for i in fromNodes: netInput[m] += (weight[m][i] * activation[i])
where weight[m][i] is the weight, or connection strength, from the ith node to the mth node, activation[i] is the activation signal of the ith node, and bias[m] is the bias value of the mth node.
After computing the net input, each node has to compute its output activation. The activation function used in backprop networks is generally:
def activationFunction(netInput): return 1.0 / (1.0 + exp(netInput)) for m in toNodes: activation[m] = activationFunction(netInput[m])
However, without training, this can't do much. You already know one method of getting some weights. How would that work?
Hint: Darwin.
That would work, but would be slow. Why? A better method is the backpropagation of error.
1.3.2. Backpropagation of Error
for m in toNodes: error[m] = (desiredOutput[m]  actualOutput[m]) delta[m] = error[m] * actualOutput[m] * (1  actualOutput[m]) for i in fromNodes: weightUpdate[m][i] = (EPSILON * delta[m] * actualOutput[i]) + (MOMENTUM * weightUpdate[m][i])
1.3.3. Example Representations
1.3.4. Example Problem
Building Neural Networks using Conx
1.4. Related Networks
1.5. Why Neural Networks?

They can learn a function that we may not know how to program.

When they learn, they generalize.

They are the easiest way to show different levels of computation
# File: NNxor.py # import all the conx API from pyrobot.brain.conx import * # create the network n = Network() # add layers in the order they will be connected n.addLayer('input',2) # The input layer has two nodes n.addLayer('hidden', 2) # ADD HIDDEN n.addLayer('output',1) # The output layer has one node n.connect('input','hidden','output') # ADD HIDDEN # provide training patterns (inputs and outputs) n.setInputs([[0.0,0.0],[0.0,1.0],[1.0,0.0],[1.0,1.0]]) n.setOutputs([[0.0],[1.0],[1.0],[0.0]]) # set learning parameters n.setEpsilon(0.5) n.setTolerance(0.2) n.setReportRate(1) # learn n.train()
How does this generalize? Run like this python i NNxor.py:
>>> n.propagate(input = [.5, .5])
Is that what you would expect? How does the network generalize overall? Add the following to your file:
def symbol(n): return ".123456789#"[int(round(n * 10))] def test(net): resolution = 50.0 for i1 in range(0, int(resolution)): print " ", for i2 in range(0, int(resolution)): output = net.propagate(input = [i1/resolution, i2/resolution]) print symbol(output[0]), print print
And try this:
>>> test(n) >>> n.initialize() >>> n.train() >>> test(n)
1.6. Levels of Computation
Example of Holistic Computation: A Case Study of RAAM, and the associated paper.
1.7. Points to Ponder

Can an artificial neural network learn to do something that it wasn't explicitly trained?

Does an artificial neural network just learn a set of rules?