UserPreferences

TwoSpiralsProblem


1. Two Spirals Problem

The two-spiral problem is a tough classification and generalization problem for many types of learning systems.

The problem is composed of two interwoven spirals, one training to be a member of Class A, the other to Class B.

Here it is graphically:

http://bubo.brynmawr.edu/~dblank/images/two-spirals.png

Can you create a neural network that is able to learn the task, and generalize to points not trained on?

See also: http://www.cs.swarthmore.edu/~meeden/cs81/s06/lab02.html

Here is a test program at a given resolution:

def test(net, resolution = 30):
    for x in range(resolution):
        row = ""
        for y in range(resolution):
            input = (x/float(resolution), y/float(resolution))
            results = net.propagate(input = input)
            c = round(results[0], 1)
            if c == 1.0:
                c = "#"
            else:
                c = str(c * 10)[0]
            row += "%s" % c
        print row

Here is a short example, given that the following data is in a file called "two-spiral.dat":

from pyrobot.brain.conx import *
fp = open("two-spiral.dat", "r")
inputs = []
targets = []
for line in fp:
    data = map(float, line.split())
    inputs.append( data[:2] )
    targets.append( data[2:] )
net = Network()
net.addLayers(2, 10, 10, 1)
net.setInputs( inputs )
net.setTargets( targets)
net.tolerance = 0.4
net.reportRate = 1
cont = 0
while not net.complete:
    net.train(100, cont=cont)
    test(net)
    cont = 1

Here are the 194 data points:

 1.00000  0.50000 1.0
 0.00000  0.50000 0.0
 0.98568  0.60466 1.0
 0.01432  0.39534 0.0
 0.95306  0.70330 1.0
 0.04694  0.29670 0.0
 0.90374  0.79225 1.0
 0.09626  0.20775 0.0
 0.83995  0.86829 1.0
 0.16005  0.13171 0.0
 0.76443  0.92873 1.0
 0.23557  0.07127 0.0
 0.68030  0.97156 1.0
 0.31970  0.02844 0.0
 0.59098  0.99550 1.0
 0.40902  0.00450 0.0
 0.50000  1.00000 1.0
 0.50000  0.00000 0.0
 0.41089  0.98528 1.0
 0.58911  0.01472 0.0
 0.32705  0.95232 1.0
 0.67295  0.04768 0.0
 0.25159  0.90274 1.0
 0.74841  0.09726 0.0
 0.18724  0.83882 1.0
 0.81276  0.16118 0.0
 0.13623  0.76331 1.0
 0.86377  0.23669 0.0
 0.10024  0.67938 1.0
 0.89976  0.32062 0.0
 0.08034  0.59043 1.0
 0.91966  0.40957 0.0
 0.07692  0.50000 1.0
 0.92308  0.50000 0.0
 0.08977  0.41160 1.0
 0.91023  0.58840 0.0
 0.11801  0.32859 1.0
 0.88199  0.67141 0.0
 0.16022  0.25404 1.0
 0.83978  0.74596 0.0
 0.21444  0.19064 1.0
 0.78556  0.80936 0.0
 0.27831  0.14056 1.0
 0.72169  0.85944 0.0
 0.34914  0.10542 1.0
 0.65086  0.89458 0.0
 0.42403  0.08623 1.0
 0.57597  0.91377 0.0
 0.50000  0.08333 1.0
 0.50000  0.91667 0.0
 0.57410  0.09645 1.0
 0.42590  0.90355 0.0
 0.64351  0.12468 1.0
 0.35649  0.87532 0.0
 0.70567  0.16655 1.0
 0.29433  0.83345 0.0
 0.75837  0.22011 1.0
 0.24163  0.77989 0.0
 0.79981  0.28298 1.0
 0.20019  0.71702 0.0
 0.82869  0.35251 1.0
 0.17131  0.64749 0.0
 0.84422  0.42583 1.0
 0.15578  0.57417 0.0
 0.84615  0.50001 1.0
 0.15385  0.49999 0.0
 0.83479  0.57215 1.0
 0.16521  0.42785 0.0
 0.81092  0.63953 1.0
 0.18908  0.36047 0.0
 0.77582  0.69966 1.0
 0.22418  0.30034 0.0
 0.73117  0.75044 1.0
 0.26883  0.24956 0.0
 0.67895  0.79015 1.0
 0.32105  0.20985 0.0
 0.62142  0.81759 1.0
 0.37858  0.18241 0.0
 0.56096  0.83204 1.0
 0.43904  0.16796 0.0
 0.49999  0.83333 1.0
 0.50001  0.16667 0.0
 0.44090  0.82182 1.0
 0.55910  0.17818 0.0
 0.38593  0.79833 1.0
 0.61407  0.20167 0.0
 0.33706  0.76416 1.0
 0.66294  0.23584 0.0
 0.29602  0.72097 1.0
 0.70398  0.27903 0.0
 0.26415  0.67072 1.0
 0.73585  0.32928 0.0
 0.24238  0.61560 1.0
 0.75762  0.38440 0.0
 0.23123  0.55791 1.0
 0.76877  0.44209 0.0
 0.23077  0.49999 1.0
 0.76923  0.50001 0.0
 0.24066  0.44411 1.0
 0.75934  0.55589 0.0
 0.26015  0.39236 1.0
 0.73985  0.60764 0.0
 0.28814  0.34663 1.0
 0.71186  0.65337 0.0
 0.32323  0.30849 1.0
 0.67677  0.69151 0.0
 0.36378  0.27914 1.0
 0.63622  0.72086 0.0
 0.40801  0.25940 1.0
 0.59199  0.74060 0.0
 0.45405  0.24969 1.0
 0.54595  0.75031 0.0
 0.50001  0.25000 1.0
 0.49999  0.75000 0.0
 0.54409  0.25991 1.0
 0.45591  0.74009 0.0
 0.58464  0.27866 1.0
 0.41536  0.72134 0.0
 0.62020  0.30513 1.0
 0.37980  0.69487 0.0
 0.64958  0.33796 1.0
 0.35042  0.66204 0.0
 0.67189  0.37558 1.0
 0.32811  0.62442 0.0
 0.68655  0.41629 1.0
 0.31345  0.58371 0.0
 0.69333  0.45835 1.0
 0.30667  0.54165 0.0
 0.69231  0.50001 1.0
 0.30769  0.49999 0.0
 0.68390  0.53963 1.0
 0.31610  0.46037 0.0
 0.66878  0.57574 1.0
 0.33122  0.42426 0.0
 0.64790  0.60707 1.0
 0.35210  0.39293 0.0
 0.62238  0.63259 1.0
 0.37762  0.36741 0.0
 0.59348  0.65157 1.0
 0.40652  0.34843 0.0
 0.56255  0.66361 1.0
 0.43745  0.33639 0.0
 0.53095  0.66857 1.0
 0.46905  0.33143 0.0
 0.49999  0.66667 1.0
 0.50001  0.33333 0.0
 0.47092  0.65835 1.0
 0.52908  0.34165 0.0
 0.44480  0.64435 1.0
 0.55520  0.35565 0.0
 0.42254  0.62558 1.0
 0.57746  0.37442 0.0
 0.40481  0.60312 1.0
 0.59519  0.39688 0.0
 0.39207  0.57812 1.0
 0.60793  0.42188 0.0
 0.38451  0.55182 1.0
 0.61549  0.44818 0.0
 0.38212  0.52540 1.0
 0.61788  0.47460 0.0
 0.38462  0.50000 1.0
 0.61538  0.50000 0.0
 0.39155  0.47663 1.0
 0.60845  0.52337 0.0
 0.40228  0.45615 1.0
 0.59772  0.54385 0.0
 0.41606  0.43923 1.0
 0.58394  0.56077 0.0
 0.43201  0.42634 1.0
 0.56799  0.57366 0.0
 0.44925  0.41772 1.0
 0.55075  0.58228 0.0
 0.46689  0.41338 1.0
 0.53311  0.58662 0.0
 0.48406  0.41316 1.0
 0.51594  0.58684 0.0
 0.50000  0.41667 1.0
 0.50000  0.58333 0.0
 0.51407  0.42338 1.0
 0.48593  0.57662 0.0
 0.52576  0.43263 1.0
 0.47424  0.56737 0.0
 0.53473  0.44370 1.0
 0.46527  0.55630 0.0
 0.54080  0.45581 1.0
 0.45920  0.54419 0.0
 0.54397  0.46817 1.0
 0.45603  0.53183 0.0
 0.54442  0.48007 1.0
 0.45558  0.51993 0.0
 0.54244  0.49086 1.0
 0.45756  0.50914 0.0
 0.53846  0.50000 1.0
 0.46154  0.50000 0.0

1.1. Additional Information from Neuro Bench at CMU

NAME: Two Spirals

SUMMARY: The task is to learn to discriminate between two sets of training points which lie on two distinct spirals in the x-y plane. These spirals coil three times around the origin and around one another. This appears to be a very difficult task for back-propagation networks and their relatives. Problems like this one, whose inputs are points on the 2-D plane, are interesting because we can display the 2-D "receptive field" of any unit in the network.

SOURCE: Posted to "connectionists" mailing list by Alexis Wieland of MITRE Corporation.

MAINTAINER: neural-bench@cs.cmu.edu

PROBLEM DESCRIPTION:

The accompanying file, two-spirals.c, is a program, based on the code supplied by Alexis Wieland, that generates a data set in the CMU Neural Network Benchmark format. Usage of this program is described in the comments at the top of the program.

METHODOLOGY:

The task is to train on the 194 I/O pairs until the learning system can produce the correct output for all of the inputs. The time required for learning is then recorded. For back-propagation systems, the time is typically recorded in "epochs". Each epoch includes one training presentation for each of the 194 points in the training set. (See the GLOSSARY file.)

The choice of output values for the two sets is up to the experimenter. For uniformity in reporting results, we suggest that the 40-20-40 cirterion be used: an output is considered to be a logical zero if it is in the lower 40% of the output range, a one if it is in the upper 40%, and indeterminate (and therefore incorrect) if it is in the middle 20% of the range.

This task can obviously be solved quickly by methods that essentially do table-lookup, recording the desired result for each point; on the other hand, the problem appears hard for back-propagation networks and is impossible for a linear separator. The interesting question is how learning time varies with the algorithm chosen and the total number of weights (or equivalent) in the network.

VARIATIONS:

It is possible to vary the density of points along the spiral by incrementing the angle by a smaller amount, while still making three complete revolutions in each set. Lang and Witbrock [1] report some results using the 4X version of the problem, with 770 points in all.

RESULTS:

Lang and Witbrock [1] report results obtained on this problem using a 2-5-5-5-1 back-propagation network with short-cut connections: each unit is connected to every unit in all earlier layers, not just to units in the previous layer. Counting the unit thresholds, there are a total of 138 trainable weights in the network.

Three trials were run using standard back-propagation on this network, with a uniform distribution of random initial weights in the range -0.1 to +0.1. Learning rate was initially 0.001 and momentum was initially 0.5; these values were "gradually" increased over 1000 epochs to final values of 0.002 and 0.95. The learning times on the three runs were 18,900 epochs, 22,300 epochs, and 19,000 epochs. (Average = 20,000 epochs.)

Using the same network and starting values, but with a nonlinear "cross entropy" error function, learning times were 16,200 epochs, 8600 epochs, and 7600 epochs, respectively. (Average = 11,000 epochs.)

With the same network and starting values, but using Fahlman's quickprop algorithm and hyperbolic arctangent error [2], the times were 4500 epochs, 12,300 epochs, and 6800 epochs. (Average = 7900 epochs) For this test, epsilon was 0.002, mu was 1.5, and the weight weight decay factor was 0.001.

Stephen A. Frostrom of SAIC reports the following unpublished result, obtained by Dennis Walker of SAIC: Standard back-propagation network, 2-20-10-1 with no short-cut connections (281 weights in all). Learning rate 0.1, momentum 0.7, unit activation running from -0.5 to 0.5, error was set to 0.0 if the output was within 0.15 of target. The task was learned (to within 0.3 tolerance) in 13,900 epochs. With a tighter 2-16-8-1 network, the task was learned, but it required 300,000 epochs.

Fahlman and Lebiere [3] report that the Cascade-Correlation algorithm is able to solve this problem in 1700 epochs (average over 100 trials, all successful). This algorithm builds up a network as it learns. In these 100 trials, the networks constructed ranged from 12 to 19 hidden units, with and average of 15.2 and a median of 15. Thus, the networks are roughly the same size as the 15-hidden-unit network investigated by Lang and Witbrock.

The figure of 1700 epochs for Cascade-Correlation is somewhat misleading, since the Cascade-Correlation epochs require less computation than backprop or quickprop epochs. Fahlman proposes that instead of counting epochs of very different kinds, we measure these algorithms in terms of connection crossings: the number of times a value or error is multiplied by a weight and added to an accumulated value. By this measure, Cascade-Correlation requires about 19 million crossings, while the 8000 quickprop epochs of Lang and Witbrock require about 438 million -- larger by a factor of 23.

Russell Leighton of MITRE reports (unpublished) that with a 2=5=5=5=1 network and more or less standard backprop, he got times of 2680, 4252, and 5097 epochs on three successive runs -- an average of about 4000 epochs. However, he noted a very large variance in other test runs on this problem, with a few runs getting stuck with one or two points misclassified. He used sigmoid units with range -0.5 to +0.5, learning rate of 0.01, momentum of 0.95, hyperbolic arctan error, and added uniform noise in the range of [0.0, 0.05) to the sigmoid-prime calculation to prevent stuck units. Weights were not updated if the absolute value of the output error was less than 0.001. Weights were updated after every epoch and the trial was deemed successful if the output was within 0.4 of the target value for all patterns.

Lang and Witbrock [1] report that the variation with 4X density was learned in 64,000 epochs using the same standard back-propagation setup that they used for the single density case.

Fahlman (unpublished) reports the following results on eight trials (all successful) of the 4X problem using the Cascade-Correlation algorithm: The average time required was 2262 epochs. The nets constructed ranged from 13 to 18 hidden units, with an average of 15.5 and a median of 15. The training parameters used were the same as for the 1X version of this same problem.

REFERENCES:

  1. Kevin J. Lang and Michael J, Witbrock, "Learning to Tell Two Spirals Apart", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988.

  2. Scott E. Fahlman, "Faster-Learning Variations on Back-Propagation: An Empirical Study", in Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, 1988.

  3. Scott E. Fahlman and Christian Lebiere, "The Cascade-Correlation Learning Architecture", in Touretzky (ed.) Advances in Neural Information Processing Systems 2, Morgan-Kaufmann, 1990. A slightly more detailed version of the paper is available as CMU tech report CMU-CS-90-100.

COMMENTS:

Code for both the Quickprop and Cascade-Correlation algorithms is available in the code directory accompanying the benchmark collection.

1.2. Conx has Cascade Corrlation!

See Cascade Correlation Network

1.3. Running Their Code

1.3.1. On Robot Data

See http://www.cs.swarthmore.edu/~meeden/cs81/s06/lab03.html

1.3.2. Two Spirals

Their code is in our CVS repository:

cvs co cascor
cd cascor
make cascor
make cascor-orig
make two-spirals
./two-spirals > two.data
./cascor-orig two.data
python plot.py two.data test.results

http://cvs.cs.brynmawr.edu/cgi-bin/viewcvs.cgi/*checkout*/cascor/TwoSpiralsResults.png?rev=1.1&content-type=image/png&name=.png