1. RAVQ
A RAVQ is an alternative to SOMs. A RAVQ can dynamically create new model vectors to describe novel stable input sequences.
Check the Pyro CVS Repository for the most recent version of ravq.py. http://bubo.brynmawr.edu/cgi-bin/viewcvs.cgi/pyrobot/brain/ravq.py.
1.1. RAVQ Buffer Types
Here is a graphical version of the RAVQ architectures described below and in the governor implementation section:
* Figure a shows the RAVQ with input histories associated with each model vector. Every time a model vector wins, the current input vector is stored in that model vector's history buffer. Figure b shows the RAVQ with a critical input history. The critical input history stores the inputs at the time of a change in the model vector winner. We consider these critical events in the input space, and would work well as training vectors for a network. Figure c shows the critical segment history, an adaptation of the critical input history to capture entire segments of the input sequence surrounding a change in the model vector.
The only version implemented in the RAVQ code is that shown in Figure a. The others must be implemented externally to the RAVQ.
1.2. RAVQ Description
* The RAVQ has three parameters. The first is a window size. This determines how many previous inputs will be used to generate the moving average. The second is an epsilon value. If the moving average is not within epsilon (by a version of the Euclidean metric) of the set of inputs currently in the buffer that moving average is disqualified from being a model vector. The moving average does not reflect a 'typical' input given the window size. This indicates that the inputs are volatile. The third is a delta value. If the moving average is not at least delta from all existing model vectors, it is disqualified from being a model vector. The moving average in this case does not represent something 'new'. In an ARAVQ, there is an additional parameter, alpha, which specifies the learning rate. The ARAVQ algorithm updates existing winning model vectors to better match current inputs provided that the winning model vectors are close enough to the current moving average.
* I've implemented a modified version of the ARAVQ algorithm. This ExperimentalRAVQ determines model vectors using the moving average described above, but determines winning model vectors using the most recent input. This may be more responsive in governing situations where the current input is being stored for later training.
* The ARAVQ and RAVQ algorithms don't specify a neighborhood that can be used for updating purposes as in Kohonen's SOM algorithm. Distance relationships among the model vectors can be calculated. This implementation currently supports a distance map showing the distance between any two model vectors. As with any set of highly dimensional data points, certain techniques can be used to capture the most important correlative features of the data.
1.3. Sample Brain
A sample RAVQ brain is available in the Pyro CVS Repository. http://bubo.brynmawr.edu/cgi-bin/viewcvs.cgi/pyrobot/plugins/brains/RAVQ.py.
1.4. Experimental Setup: Repeating Nolfi and Tani
* Here is the most current version of the experiment I am running. I am attempting to duplicate Nolfi and Tani's results using the techniques of the RAVQ as described in Linaker and Niklasson within the Pyro framework. The current implementation uses a RAVQ to classify both the scaled input vectors and the trained hidden layer activations of a prediction (motor values only) network. The robot is controlled by a prebuilt wall following algorithm. It is hoped that the trained network will be able to control the robot as well.
The current brain (be sure to change the paths!):
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
|
# imported modules
from pyrobot.brain import Brain
from pyrobot.brain.conx import *
from pyrobot.gui.plot.scatter import *
from pyrobot.gui.plot.hinton import *
import pyrobot.brain.ravq
import os
import time
import sys
# log file directories
rootDirectory = "/home/stober/ravq/experiments/"
currentExperiment = "Nolfi_RAVQ_Long/"
currentBrain = "/home/stober/ravq/Nolfi_Ravq.py"
class RAVQBrain(Brain):
"""A brain that uses a RAVQ to classify hidden layer activations in a prediction network."""
def setup(self):
# robot parameters
self.robot.range.units = 'ROBOTS'
self.maxvalue = self.robot.range.getMaxvalue()
# status variables
self.verbosity = 1
self.direction = 1
self.blockedFront = 0
self.wasStalled = 0
self.counter = 0
self.previous = [0.0, 0.0]
# tweakable params
self.timer = 50000 # turns on hiddenRavq
self.stop = 100000
self.sleepTime = 0.1
self.backupTimer = 10000
# create network
self.net = SRN()
self.net.setSequenceType("ordered-continuous")
self.inSize = self.robot.range.count + 2
self.net.addLayers(self.inSize, self.inSize/2, 2)
# defaults - but here explicit
self.net.setBatch(0)
self.net.setInteractive(0)
self.net.setVerbosity(0)
self.net.setInitContext(0)
# initialize network
self.net.initialize()
# learning parameters
self.net.setEpsilon(0.2)
self.net.setMomentum(0.1)
self.net.setTolerance(0.05)
# set learning
self.net.setLearning(1)
# hidden ravq (tweakable parameters)
self.hiddenRavq = pyro.brain.ravq.ARAVQ(10, .2, .5, .03)
self.hiddenRavq.setHistory(0)
self.hiddenRavq.setAddModels(0)
self.hiddenRavq.setLearning(0)
# input ravq (tweakable parameters)
self.inputRavq = pyro.brain.ravq.ARAVQ(10, .2, .6, .03)
self.inputRavq.setHistory(0)
self.inputRavq.setAddModels(1)
self.inputRavq.setLearning(1)
# file IO
self.path = rootDirectory + currentExperiment
if(os.path.isfile(self.path + "exp.lock")):
raise "Lock error!"
else:
try:
os.mkdir(self.path)
except:
pass
lock = open(self.path + "exp.lock", "w")
lock.write("This file locks the experiment directory to prevent overwriting experimental data.")
lock.close()
# archive brain
os.system("cp " + currentBrain + " " + self.path + "archive.py")
self.netInfo = open(self.path + 'nn.dat', 'w')
self.hiddenRavq.openLog(self.path + 'hidden_history.log')
self.hiddenRavqInfo = open(self.path + 'hidden_ravq.dat', 'w')
self.inputRavq.openLog(self.path + 'input_history.log')
self.inputRavqInfo = open(self.path + 'input_ravq.dat', 'w')
self.repositionLog = open(self.path + 'reposition.dat', 'w')
# gui displays
self.plot1 = Scatter()
self.plot1.setTitle('Target: Translate x Rotate')
self.plot2 = Scatter()
self.plot2.setTitle('Activation: Translate x Rotate')
self.hidd = Hinton(self.inSize/2, title = 'Hidden Layer')
self.inputs = Hinton(self.inSize, title = 'Input Layer')
def destroy(self):
self.plot1.destroy()
self.plot2.destroy()
self.hidd.destroy()
self.inputs.destroy()
self.netInfo.close()
self.hiddenRavq.closeLog()
self.hiddenRavqInfo.close()
self.inputRavq.closeLog()
self.inputRavqInfo.close()
self.repositionLog.close()
del self.net
del self.hiddenRavq
del self.inputRavq
def scaleSensors(self, val):
"""From Robots (or anything) to [0, 1]"""
return (val / self.maxvalue)
def scaleMotors(self, val):
"""[-1, 1] to [0, 1]"""
return (val + 1) / 2.0
def kick(self):
"""How to get unstuck."""
self.wasStalled += 1
self.repositionLog.write("STALLED " + str(self.counter) + "\n")
self.move(0.1, 0.0)
time.sleep(.2)
if self.robot.stall:
self.move(-0.1, 0.0)
time.sleep(.2)
if self.robot.stall and self.wasStalled > 10:
self.repositionLog.write("Cannot fix stall at " + str(self.counter) + " " + \
str(self.robot.simulation[0].getPose("Pioneer")')) +
" so repositioning to (796, 2211, 92).\n")
# we'll call this the hand of god
self.set('robot/truth/pose', (796, 2211, 92))
else:
self.wasStalled = 0
def controller(self):
# tweakable parameters
minRange = 0.85
maxRange = 1.25
#rand = random.random() * 0.5
rand = 0.25
# important sensors
minFront = min([s.value for s in self.robot.range["front"]])
minLeft = min([s.value for s in self.robot.range["front-left"]])
minRight = min([s.value for s in self.robot.range["front-right"]])
left = min([s.value for s in self.robot.range["left"]])
right = min([s.value for s in self.robot.range["right"]])
# the decision algorithm
if minFront < minRange:
if not self.blockedFront:
if right < left:
self.direction = 1
else:
self.direction = -1
self.blockedFront = 1
return [0, self.direction * rand]
elif minLeft < minRange:
if self.blockedFront:
return [0, self.direction * rand]
else:
return [rand, -rand]
elif minLeft > maxRange:
if self.blockedFront:
return [0, self.direction * rand]
else:
return [rand, rand]
elif minRight < minRange:
if self.blockedFront:
return [0, self.direction * rand]
else:
return [rand, rand]
else:
self.blockedFront = 0
return [0.5, 0.0]
def backup(self, tag):
self.net.saveWeightsToFile(self.path + "predict" + str(tag) + ".wts")
self.inputRavq.saveRAVQToFile(self.path + "input_ravq" + str(tag) + ".pck")
self.hiddenRavq.saveRAVQToFile(self.path + "hidden_ravq" + str(tag) + ".pck")
def step(self):
"""
1. Determine new motor values.
2. Determine inputs (old sensors + old motors).
3. Step through with new motor values as targets.
4. Move robot.
5. New motor values become old motor values.
"""
# display count
if self.verbosity > 0: print self.counter
# switch between training network and training ravq
if self.counter == self.timer:
self.net.setLearning(0)
self.hiddenRavq.setAddModels(1)
self.hiddenRavq.setLearning(1)
self.net.saveNetworkToFile(self.path + "predict")
self.net.saveWeightsToFile(self.path + "predict.wts")
elif self.counter == self.stop:
self.hiddenRavq.logHistory(0, str(self.robot.simulation[0],getPose('Pioneer')))
self.hiddenRavqInfo.write(str(self.hiddenRavq))
self.hiddenRavq.saveRAVQToFile(self.path + "hidden_ravq.pck")
self.inputRavq.logHistory(0, str(self.robot.simulation[0].getPose('Pioneer')))
self.inputRavqInfo.write(str(self.inputRavq))
self.inputRavq.saveRAVQToFile(self.path + "input_ravq.pck")
self.pleaseStop()
self.destroy()
else:
# train network or ravq
motors = self.controller()
inputs = [self.scaleSensors(s.value) for s in self.robot.range["all"] + map(self.scaleMotors, self.previous)
(error, correct, total) = self.net.step( input = inputs, \
output = map(self.scaleMotors, motors))
self.inputRavq.input(inputs)
# hidden ravq
if self.counter > self.timer:
self.hiddenRavq.input(self.net['hidden'].activation)
self.hiddenRavq.logHistory(0, \
str(self.robot.simulation[0].getPose('Pioneer')))
# backup sometimes
if self.counter % self.backupTimer == 0:
self.backup(self.counter)
# other IO
self.inputRavq.logHistory(0, \
str(self.robot.simulation[0].getPose('Pioneer')))
self.netInfo.write(str(self.counter) + " " + str(error) + "\n")
self.plot1.addPoint(self.scaleMotors(motors[0]), self.scaleMotors(motors[1]))
self.plot2.addPoint(self.net['output'].activation[0], \
self.net['output'].activation[1])
self.hidd.update(self.net['hidden'].activation)
self.inputs.update(self.net['input'].activation)
if self.verbosity > 0: print (error, correct, total)
# move robot
self.move(motors[0], motors[1])
self.previous = motors
# kick if things get bad
if self.robot.stall:
self.kick()
# sleep and increment counter
time.sleep(self.sleepTime)
self.counter += 1
def INIT(engine):
return RAVQBrain('RAVQBrain', engine) |
|
The current world file:
# Desc: 1 robot with player, laser, sonar
and gps # CVS: $Id: nolfi2.world,v 1.1 2003/05/15 21:45:55 yeelin Exp $
# the resolution of Stage's raytrace model in meters
resolution 0.02
# GUI settings
gui ( size [ 502.000 506.000 ] origin [5.018 4.950 0] scale 0.021 #
the size of each bitmap pixel in meters )
# load a bitmapped environment from a file
bitmap ( file "nolfi2.pnm" #resolution 0.1 #resolution .044 resolution
0.07 )
include "/usr/local/stage/worlds/usc_pioneer.inc"
# create a robot, setting its start position and Player port, and
# equipping it with a laser range scanner # #position #( port 6665
# pose [1.0 1.0 20] #laser() #)
usc_pioneer ( color "green" name "robot" port 6665 pose [.796 2.211
92] #(1101, 1113, 48) truth() )
# coordinates are defined from the center of the box box ( size [0.75
#0.75] color "blue" pose [2.5 3.5 0.000] sonar_return "visible" ) box
#( size [0.75 0.75] color "red" pose [2.5 1.5 0.000] sonar_return
#"visible" )
The nolfi2.pnm file can be found here: http://www.cs.swarthmore.edu/~stober/images/nolfi2.pnm
1.5. Results
* The first level is certainly a success in this limited context. The network learned the primitive behavior, and the hidden RAVQ classified the hidden layer activations into roughly three categories, corner, wall, and corridor. By changing the delta value I was able to change the number of model vectors. Three however, seemed appropriate.
* I ran an experiment to see how well the model vectors captured the basic behavior of the robot. The setup was as follows:
1. Propagate level one network using sensors + previous motors. 2. Classify hidden layer using RAVQ. 3. Copy RAVQ winner into hidden layer. 4. Propagate from the hidden layer. 5. Use the outputs of this second propagation as motor values.
* I trained an L2 network that would predict the sequence of RAVQ winners asynchronously. This asynchronous learning is an extension of the governor idea and will be discussed separately below.
* L2 prediction for anything other than the asynchronous case proved to be too difficult.
* Final Conclusion: The RAVQ proved comparable to the segmentation network described in Nolfi and Tani and the SOM used in Andrew and Yeelin's experiment.
1.6. Related Topics
1. Adaptive Resonance Theory 2. Self Organizing Maps
-
DevelopmentalRoboticsSummer2003 . . . . 1 match
2. On Demand Dynamic Balancing
2.1. Problem Description
* Networks being trained on robot control tasks often see large contiguous segments of nearly identical training data. This leads to catastrophic forgetting of previously learned behaviors. Collecting data from a robot then balancing the data by hand before training a network off-line solves the problem by excising large contiguous segments from the training set. Our goal is to automate this process in such a way as to make it possible to train online with a data set balanced as in off-line training.
2.2. Requirements
* Regulate learning according to the importance of the input. Isolate and repeat rare events so that they are weighted against other, more common inputs. Eliminate large contiguous blocks of data. Break data into smaller contiguous blocks to avoid over training. Implicit is the use of some vector quantization/classification device.
2.3. Governor Overview
* The following is a graphical representation of the Governor Architecture. The vector quantizer classifies the input sequence in hopes of identifying the critical input target pairs that best facilitate network learning. We have focused on using the RAVQ exclusively, but other vector quantization methods (such as SOM or LVQ) may provide similar results.
2.4. Governor Implementations
1. Use RAVQ to classify inputs + targets. When the classification changes, put the current input target pair into a buffer. At the same time, have the network train through each input target pair currently in the buffer. This implementation is sensitive to how well changes in the RAVQ reflect changes in the input stream.
2. Use RAVQ to classify inputs + targets. For each new input + target, have the RAVQ store that pair in a history associated with the winning model vector. At the same time, have the network iterate through all the model vector histories in some predefined order. This implementation is sensitive how well to the precision of the RAVQ model vectors in capturing essential input patterns.
3. Use the RAVQ to classify inputs + targets and then train on the model vectors. This method would require model vectors that are the best possible representatives of specific input sequences. We would also need to rely on network generalization quite a bit. The RAVQ would probably need to by adaptive.
4. Use the RAVQ to classify inputs + targets then train on a buffer filled with episodes surrounding the important changes. This should be compatible with the SRN where the SRN needs to retain local information.
2.5. Some Thoughts
* The original motivation was to create a system that would balance data on the fly. Balancing data, however, destroys time series information that may be important when using an SRN. This is the reason behind implementation 4 above. Matt reports that the results seem to be positive, or at least comparable to the feed forward results/success on a general avoidance behavior in a complicated world. When an SRN only needs local information approach 4, while crude, may be adequate.
* Balancing the data seems to require knowledge of the teacher, and of critical situations where the teacher changes. An automatic governor will have no a priori knowledge of the teacher, and so cannot balance with respect to critical events (as determined by an outside observer). What we as experimenters have control of are the parameters which abstractly specify important subsequences. The delta value is particularly important. Using a different base metric may also serve to capture particular kinds of sequences. Masking, or artificially weighting certain inputs, is another method by which the experimenter can attempt to influence the RAVQs partitioning of the input stream. This kind of tweaking effectively serves as a priori knowledge for the RAVQ.
2.6. Exploratory Experimental Results
* Implementations 1 and 2 have shown improved behavior over a network trained without the governor, but the training size is 200,000 steps, which is still quite large. Any success shows remarkable variation when trials are rerun with even slightly different parameter settings, indicating that the current solution may not yet not be robust enough for general use. The large training size and unstabability between trials are most likely due to the complexity of the environment and the teacher used.
* Changing the task and the environment do demonstrate that the governor provides some advantage over an un-governed network. Lisa tested governing on a simple wall follow task in a less complicated world (the Nolfi and Tani world). The governed network did quantifiably better. Andrew is running a series of head to head experiments between an ungoverned network and a governed network which will verify this claim on a real robot.
* Lisa and Andrew are using implementation 1. I've run a series of experiments using the wall follow task in the Nolfi and Tani world and found that implementation 2 does equally well. I suspect that on certain tasks implementation 2 would perform better since rare but functionally important inputs may enter the implementation 1 buffer infrequently and therefore limit the network's ability to learn the full functionality required during certain infrequent events. In implementation 2, such rare events would stay in their associated model vector histories and would be trained equally with other, more common input vectors. This assumes that the RAVQ allocates new model vectors for these rare events. In short, implementation 2 amplifies rare inputs more than implementation 1.
* When trained on the wall follower task in the familiar Nolfi and Tani world (10,000 steps), the model vector buffer method yielded a controller network which successfully navigated the lip separating the two rooms. The critical buffer method stalled on the turn (overturning). I believe that the model vector buffer method proved superior in this instance. More experiments need to be done in a wide range of environments and under a wide range of primitive controllers.
2.7. An Experimental Brain
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
| # imported modules
from pyrobot.brain import Brain
from pyrobot.brain.VisConx.VisRobotConx import *
import pyrobot.brain.ravq
import os
import time
import random
# log file directories
rootDirectory = "/local/"
currentExperiment = "data/"
currentBrain = "/local/GovernorBrain.py"
class GovernorBrain(Brain):
"""A brain that uses a RAVQ to govern network learning."""
def setup(self):
# for use with player/stage
#self.startService('truth')
#self.startService('bumper')
# robot parameters
self.robot.range.units = 'ROBOTS'
self.maxvalue = self.robot.range.getMaxvalue()
self.maxvalue += 0.075
# status variables
self.verbosity = 1
self.direction = 1
self.blockedFront = 0
self.wasStalled = 0
self.counter = 0
self.previous = [0.0, 0.0]
# tweakable params
self.sleepTime = 0.10
self.stopTime = 10000
# choose the governor method
self.method = 0
# create network
self.net = VisRobotNetwork() # could use VisRobotSRN()
self.inSize = self.robot.range.count
self.net.addLayers(self.inSize, self.inSize/2, 2)
# defaults - but here explicit
self.net.setBatch(0)
self.net.setInteractive(0)
self.net.setVerbosity(0)
# initialize network
self.net.initialize()
# learning parameters
self.net.setEpsilon(0.2)
self.net.setMomentum(0.9)
self.net.setTolerance(0.05)
# set learning
self.net.setLearning(1)
# input ravq (tweakable parameters)
self.ravq = pyro.brain.ravq.ExperimentalRAVQ(5, .3, .2, .02)
self.ravq.setHistory(1)
self.ravq.setAddModels(1)
self.ravq.setLearning(1)
self.ravq.setMask([1] * self.inSize + [self.inSize / 2] * 2)
# buffer for governor
self.buffer = []
self.bufferSize = 100
self.bufferIndex = 0
# file IO
self.path = rootDirectory + currentExperiment
if(os.path.isfile(self.path + "exp.lock")):
raise "Lock error!"
else:
try:
os.mkdir(self.path)
except:
pass
lock = open(self.path + "exp.lock", "w")
lock.write("This file locks the experiment directory to" + \
"prevent overwriting experimental data.")
lock.close()
# archive brain for future reference
os.system("cp " + currentBrain + " " + self.path + "archive.py")
self.netInfo = open(self.path + 'nn.dat', 'w')
self.ravq.openLog(self.path + 'ravq.log')
self.ravqInfo = open(self.path + 'ravq.dat', 'w')
self.repositionLog = open(self.path + 'reposition.dat','w')
self.data = open(self.path + 'input_target.dat', 'w')
self.balancedData = open(self.path + 'balanced.dat', 'w')
def destroy(self):
self.netInfo.close()
self.ravq.closeLog()
self.ravqInfo.close()
self.repositionLog.close()
self.data.close()
self.balancedData.close()
self.net.destroy()
def saveListToFile(self, ls, file):
for i in range(len(ls)):
file.write(str(ls[i]) + " ")
file.write("\n")
def scaleSensors(self, val):
"""From Robots (or anything) to [0, 1]"""
return (val / self.maxvalue)
def scaleMotors(self, val):
"""[-1, 1] to [0, 1]"""
return (val + 1) / 2.0
def kick(self):
"""How to get unstuck."""
self.repositionLog.write("STALLED " + str(self.counter) + "\n")
self.move(0.5 * random.random(), 0.0)
time.sleep(1)
self.update()
if self.get('robot/stall'):
self.move(-0.5 * random.random(), 0.0)
time.sleep(1)
self.update()
if self.get('robot/stall'):
self.move(0.0, 0.5 * random.random())
time.sleep(1)
self.update()
if self.get('robot/stall'):
self.move(0.0, -0.5 * random.random())
time.sleep(1)
self.update()
# this is not the wall follower!
def avoidObstacles(self):
"""
Determines next action, but doesn't execute it.
Returns the translate and rotate values.
When front is blocked, it picks to turn away from the
direction with the minimum reading and maintains that
turn until front is clear.
"""
d = 0.7
ds = 0.3
turn = random.random()
minFront = min(self.get('robot/range/front/value'))
minLeft = min(self.get('robot/range/front-left/value'))
minRight = min(self.get('robot/range/front-right/value'))
sideLeft = self.get('robot/range/0/value')
sideRight = self.get('robot/range/7/value')
if minFront < d:
if not self.blockedFront:
if minRight < minLeft:
self.direction = 1
else:
self.direction = -1
self.blockedFront = 1
return [0, self.direction * turn]
elif minLeft < d:
if self.blockedFront:
return [0, self.direction * turn]
else:
return [0,-turn]
elif minRight < d:
if self.blockedFront:
return [0, self.direction * turn]
else:
return [0,turn]
else:
if sideLeft < ds:
return [0,-turn]
elif sideRight < ds:
return [0,turn]
else:
self.blockedFront = 0
return [.2,0]
def wallFollower(self):
# tweakable parameters
frontRange = 0.7
minRange = .5
maxRange = .7
amount = 0.1
# important sensors
minFront = min(self.get('robot/range/front/value'))
minLeft = min(self.get('robot/range/front-left/value'))
minRight = min(self.get('robotrange/front-right/value'))
left = min(self.get('robot/range/left/value'))
right = min(self.get('robot/range/right/value'))
# the decision algorithm
if minFront < frontRange:
if not self.blockedFront:
self.direction = -1
self.blockedFront = 1
return [0, self.direction * amount]
else:
self.blockedFront = 0
if minLeft < minRange:
if self.blockedFront:
return [0, self.direction * amount]
else:
return [amount/2.0, -amount]
elif minLeft > maxRange:
if self.blockedFront:
return [0, self.direction * amount]
else:
return [amount/2.0, amount]
elif minRight < minRange:
if self.blockedFront:
return [0, self.direction * amount]
else:
return [amount, amount]
else:
self.blockedFront = 0
return [0.1, 0.0]
def step(self):
# display count
if self.verbosity > 0: print self.counter
if self.counter > self.stopTime:
self.net.saveWeightsToFile(self.path + 'network.wts')
self.ravq.saveRAVQToFile(self.path + 'ravq.pck')
self.ravqInfo.write(str(self.ravq))
self.destroy() # closes files
self.pleaseStop()
# use self.avoidObstacles() to change primitive behavior
motors = self.avoidObstacles()
# scale values that the network will use
inputs = map(self.scaleSensors, self.get('robot/range/all/value'))
targets = map(self.scaleMotors, motors)
# record the data for later offline learning
self.saveListToFile(inputs + targets, self.data)
# classify the data using the ravq
self.ravq.input(inputs + targets)
# autolabel the ravq models (slow)
self.ravq.autoLabel('decimal')
if self.verbosity > 0:
print " RAVQ Winner: ", self.ravq.getLabel(self.ravq.winner)
print " Number of Models: ", len(self.ravq.models)
print " MovingAvgDistance: ", self.ravq.movingAverageDistance
print " ModelVectorDistance: ", self.ravq.modelVectorsDistance
# kick if things get bad
if self.get('robot/stall'):
self.wasStalled += 1
if self.wasStalled > 10:
print 'Kicking!'
self.kick()
self.wasStalled = 0
if self.method:
# this method uses a buffer populated with input target pairs
# that occur at model vector changes
if self.ravq.getNewWinner(): # 1 if the winner is new
if len(self.buffer) >= self.bufferSize:
self.buffer = self.buffer[1:] + [inputs + targets]
else:
self.buffer.append(inputs + targets)
self.ravq.logHistory() # record of RAVQ winners
if len(self.buffer) > 0: # cycle through current buffer
array = self.buffer[self.bufferIndex]
self.bufferIndex = (self.bufferIndex + 1) % len(self.buffer)
error, correct, total, totalPCorrect = self.net.step(input = array[:self.inSize], \
output = array[self.inSize:])
self.netInfo.write(str(self.counter) + "\t" + str(error) + "\n")
self.saveListToFile(array, self.balancedData)
else:
# this method uses buffers associated with individual model
# vectors. these buffers are implemented in ravq.py
if self.ravq.getHistoryLength() > 0:
array = self.ravq.getHistory(self.bufferIndex)
self.bufferIndex = (self.bufferIndex + 1) % self.ravq.getHistoryLength()
self.net.step(input = array[:self.inSize], output = array[self.inSize:])
self.saveListToFile(array, self.balancedData)
# move the robot according to the primitive controller
self.move(motors[0], motors[1])
# sleep, record motor values, increment counter
time.sleep(self.sleepTime)
# optional additional input of motor values
self.previous = motors[:]
self.counter += 1
def INIT(engine):
return GovernorBrain('GovernorBrain', engine)
if __name__ == '__main__':
os.system("pyro -r Khepera -b /local/GovernorBrain.py") |
|
2.8. Brain for Testing Behavior
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
|
from pyrobot.brain import Brain
from pyrobot.brain.VisConx.VisRobotConx import *
import pickle
import pyrobot.brain.ravq
import os
import time
import random
# log file directories
rootDirectory = "/local/"
currentExperiment = "data2/"
currentBrain = "/local/GovernorBrainTest.py"
class GovernorBrain(Brain):
"""A brain that uses a RAVQ to govern network learning."""
def setup(self):
# for use with player/stage
#self.startService('truth')
#self.startService('bumper')
# robot parameters
self.robot.range.units = 'ROBOTS'
self.maxvalue = self.robot.range.getMaxvalue()
self.maxvalue += 0.10
# status variables
self.verbosity = 1
self.direction = 1
self.blockedFront = 0
self.wasStalled = 0
self.counter = 0
self.previous = [0.0, 0.0]
# tweakable params
self.sleepTime = 0.05
self.stopTime = 10000
# choose the governor method
self.method = 0
# file IO
self.path = rootDirectory + currentExperiment
# create network
self.net = VisRobotNetwork() # could use VisRobotSRN()
self.inSize = self.robot.range.count
self.net.addLayers(self.inSize, self.inSize/2, 2)
self.net.loadWeightsFromFile(self.path + 'network.wts')
# defaults - but here explicit
self.net.setBatch(0)
self.net.setInteractive(0)
self.net.setVerbosity(0)
# learning parameters
self.net.setEpsilon(0.2)
self.net.setMomentum(0.9)
self.net.setTolerance(0.05)
# set learning (no learning during testing)
self.net.setLearning(0)
# input ravq (tweakable parameters)
fp = open(self.path + 'ravq.pck')
self.ravq = pickle.load(fp)
fp.close()
self.ravq.setHistory(0)
self.ravq.setAddModels(0)
self.ravq.setLearning(0)
self.ravq.setMask([1] * self.inSize + [self.inSize / 2] * 2)
self.ravq.autoLabel()
def destroy(self):
self.net.destroy()
def scaleSensors(self, val):
"""From Robots (or anything) to [0, 1]"""
return (val / self.maxvalue)
def scaleMotors(self, val):
"""[-1, 1] to [0, 1]"""
return (val + 1) / 2.0
def kick(self):
"""How to get unstuck."""
self.wasStalled += 1
self.move(1.0 * random.random(), 0.0)
time.sleep(1)
self.update()
if self.robot.stall:
self.move(-1.0 * random.random(), 0.0)
time.sleep(1)
self.update()
if self.robot.stall:
self.move(0.0, 1.0 * random.random())
time.sleep(1)
self.update()
if self.robot.stall:
self.move(0.0, -1.0 * random.random())
time.sleep(1)
self.update()
# this is not the wall follower!
def avoidObstacles(self):
"""
Determines next action, but doesn't execute it.
Returns the translate and rotate values.
When front is blocked, it picks to turn away from the
direction with the minimum reading and maintains that
turn until front is clear.
"""
d = 0.7
ds = 0.3
turn = random.random()
minFront = min([s.value for s in self.robot.range["front"]])
minLeft = min([s.value for s in self.robot.range["front-left"]])
minRight = min([s.value for s in self.robot.range["front-right"]])
sideLeft = self.robot.range[0].value
sideRight = self.robot.range[7].value
if minFront < d:
if not self.blockedFront:
if minRight < minLeft:
self.direction = 1
else:
self.direction = -1
self.blockedFront = 1
return [0, self.direction * turn]
elif minLeft < d:
if self.blockedFront:
return [0, self.direction * turn]
else:
return [0,-turn]
elif minRight < d:
if self.blockedFront:
return [0, self.direction * turn]
else:
return [0,turn]
else:
if sideLeft < ds:
return [0,-turn]
elif sideRight < ds:
return [0,turn]
else:
self.blockedFront = 0
return [.2,0]
def wallFollower(self):
# tweakable parameters
frontRange = 0.7
minRange = .5
maxRange = .7
amount = 0.1
# important sensors
minFront = min(self.get('robot/range/front/value'))
minLeft = min(self.get('robot/range/front-left'))
minRight = min(self.get('robot/range/front-right/value'))
left = min(self.get('robot/range/left/value'))
right = min(self.get('robot/range/right/value'))
self.score += left
# the decision algorithm
if minFront < frontRange:
if not self.blockedFront:
self.direction = -1
self.blockedFront = 1
return [0, self.direction * amount]
else:
self.blockedFront = 0
if minLeft < minRange:
if self.blockedFront:
return [0, self.direction * amount]
else:
return [amount/2.0, -amount]
elif minLeft > maxRange:
if self.blockedFront:
return [0, self.direction * amount]
else:
return [amount/2.0, amount]
elif minRight < minRange:
if self.blockedFront:
return [0, self.direction * amount]
else:
return [amount, amount]
else:
self.blockedFront = 0
return [0.1, 0.0]
def step(self):
# display count
if self.verbosity > 0: print self.counter
if self.counter > self.stopTime:
self.destroy() # closes files
self.pleaseStop()
# use self.avoidObstacles() to change primitive behavior
motors = self.avoidObstacles()
# scale values that the network will use
inputs = map(self.scaleSensors, self.get('robot/range/all/value'))
targets = map(self.scaleMotors, motors)
# classify the data using the ravq
self.ravq.input(inputs + targets)
if self.verbosity > 0:
print " RAVQ Winner: ", self.ravq.getLabel(self.ravq.winner)
print " Number of Models: ", len(self.ravq.models)
# kick if things get bad
if self.get('robot/stall'):
print "Kicking"
self.kick()
error, correct, total = self.net.step(input = inputs, output = targets)
# move the robot according to the network
self.move((self.net.getLayer('output').getActivations()[0] * 2.0) - 1.0,\
(self.net.getLayer('output').getActivations()[1] * 2.0) - 1.0)
# sleep, record motor values, increment counter
time.sleep(self.sleepTime)
# optional additional input of motor values
self.previous = motors[:]
self.counter += 1
def INIT(engine):
return GovernorBrain('GovernorBrain', engine)
if __name__ == '__main__':
os.system("pyro -r Khepera -b /local/GovernorBrainTest.py") |
|
2.9. The Nofli Tani World File
The World Files:
# Desc: 1 robot with player, laser, sonar and
gps # CVS: $Id: nolfi2.world,v 1.1 2003/05/15 21:45:55 yeelin Exp $
# the resolution of Stage's raytrace model in meters
resolution 0.02
# GUI settings
gui ( size [ 502.000 506.000 ] origin [5.018 4.950 0] scale 0.021 #
the size of each bitmap pixel in meters )
# load a bitmapped environment from a file
bitmap ( file "nolfi2.pnm" #resolution 0.1 #resolution .044 resolution
0.07 )
include "/usr/local/stage/worlds/usc_pioneer.inc"
# create a robot, setting its start position and Player port, and
# equipping it with a laser range scanner # #position #( port 6665
# pose [1.0 1.0 20] #laser() #)
usc_pioneer ( color "green" name "robot" port 6665 pose [.796 2.211
92] #(1101, 1113, 48) truth() )
# coordinates are defined from the center of the box box ( size [0.75
#0.75] color "blue" pose [2.5 3.5 0.000] sonar_return "visible" ) box
#( size [0.75 0.75] color "red" pose [2.5 1.5 0.000] sonar_return
#"visible" )
The nolfi2.pnm file can be found here: http://www.cs.swarthmore.edu/~stober/nolfi2.pnm
* The more complicated cave world can be found in the pyrobot/plugins/worlds directory. It is called simple.world.
2.10. Off-line Approach
* A better approach might be to take sample sensor target values and then use the governor approach off-line. Here is a file that does that using data from a file of previously gathered sensor/target pairs.
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
|
from pyrobot.brain.conx import *
from pyrobot.brain.ravq import *
n = SRN()
n.setSequenceType("ordered-continuous")
n.addLayers(16,2,2)
n.loadDataFromFile('input_target.dat')
n.setEpsilon(0.2)
n.setMomentum(0.9)
n.setTolerance(0.05)
n.setLearning(1)
ravq = ARAVQ(3, .2, 1.6, .05)
ravq.setAddModels(1)
ravq.setHistory(1)
ravq.setMask([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,8,8])
fp = open('balanced.dat','w')
counter = 0
buffer = []
bufferIndex = 0
method = 1
def saveListToFile(ls, file):
for i in range(len(ls)):
file.write(str(ls[i]) + " ")
file.write("\n")
for x in n.loadOrder:
inputs = n.inputs[x]
targets = n.targets[x]
ravq.input(inputs + targets)
if method:
if ravq.getNewWinner(): # is 1 if the winner is a new winner, 0 otherwise
if len(buffer) >= 100:
buffer = buffer[1:] + [inputs + targets]
else:
buffer.append(inputs + targets)
if len(buffer) > 0: # cycle through current buffer
array = buffer[bufferIndex]
bufferIndex = (bufferIndex + 1) % len(buffer)
n.step(input = array[:16], output = array[16:])
saveListToFile(array, fp)
if x > 50000: # train for 50000 steps
break
else:
if ravq.getHistoryLength() > 0:
array = ravq.getHistory(bufferIndex)
bufferIndex = (bufferIndex + 1) % ravq.getHistoryLength()
n.step(input = array[:16], output = array[16:])
saveListToFile(array, fp)
if x > 50000:
break
print " Count: ", x
print " Steps: ", n.count
print " Number of model vectors: ", len(ravq.models)
n.saveWeightsToFile('network.wts')
fp.close() |
|
2.11. SRN Offline Approach
* This approach is similar to the approach that uses a buffer to store input target pairs at changes in the model vector. This approach differs, however, in that entire sequences leading up to the model vector are stored in the buffer. This allows an SRN network to be trained on sequences of input data, and thus preserve the contiguity of events while benefiting from the balancing of the governor architecture.
| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
|
from pyrobot.brain.VisConx.VisRobotConx import *
from pyrobot.brain.ravq import *
import math
def saveListToFile(ls, file):
for val in ls:
file.write(str(val) + " ")
file.write("\n")
# log file directories
rootDirectory = "/local/"
currentExperiment = "Data_Wander/"
dataOutput = "SRNBuffer/"
currentBrain = "/home/GovWander.py"
n = VisRobotSRN()
n.setSequenceType("ordered-continuous")
n.addLayers(16,5,2)
n.loadDataFromFile(rootDirectory+currentExperiment+'input_target.dat')
n.setEpsilon(0.3)
n.setMomentum(0.0)
n.setTolerance(0.05)
n.setLearning(1)
ravq = RAVQ(1, .2, 1.6)
ravq.setAddModels(1)
ravq.setMask([1,]*len(n.inputs[0]) + [8,]*len(n.targets[0]))
ravq.setHistory(0)
bufferLength = 10
historyLength = 9
historyBuffer = []
contextBuffer = []
trainingBuffer = []
trainingSeq = 0
seqLoc = 0 #location within current training pattern
for x in n.loadOrder:
inputs = n.inputs[x]
targets = n.targets[x]
ravq.input(inputs+targets)
#maintain input/target sequences and context layer
if len(historyBuffer) < historyLength:
historyBuffer = [inputs+targets] + historyBuffer
contextBuffer = [n.getLayer('context').getActivationsList()] + contextBuffer
else:
historyBuffer = [inputs+targets] + historyBuffer[0:-1]
contextBuffer = [n.getLayer('context').getActivationsList()] + contextBuffer[0:-1]
if trainingSeq >= len(trainingBuffer)-1:
trainingSeq = 0
else:
trainingSeq += 1
seqLoc = 0
#if model vector changes, put tuple of history and starting context into training buffer
if ravq.newWinnerIndex != ravq.previousWinnerIndex:
if len(trainingBuffer) < bufferLength:
trainingBuffer = [(historyBuffer, contextBuffer[-1])] + trainingBuffer
else:
trainingBuffer = [(historyBuffer, contextBuffer[-1])] + trainingBuffer[0:-1]
print " Winner #: ", ravq.newWinnerIndex
print " Current step: ", x
if len(trainingBuffer) > 0:
if seqLoc >= len(trainingBuffer[trainingSeq]):
if trainingSeq >= len(trainingBuffer)-1:
trainingSeq = 0
else:
trainingSeq += 1
seqLoc = 0
n.getLayer('context').resetFlags()
n.getLayer('context').copyActivations(trainingBuffer[trainingSeq][1])
n.step(input = trainingBuffer[trainingSeq][0][-1 - seqLoc][0:len(inputs)],
output = trainingBuffer[trainingSeq][0][-1 - seqLoc][len(inputs):len(inputs)+2])
seqLoc += 1
if x % 10000 == 0:
print " Count: ", x
print " Num Models: ", len(ravq.models)
print "Training Buffer Length: ", len(trainingBuffer)
for i in xrange(len(trainingBuffer)):
print "Entry 3%d: %d" % (i, len(trainingBuffer[i][0]))
n.saveWeightsToFile(rootDirectory+currentExperiment+dataOutput+'offline.hidden5_10_9_.3.wts')
print " Num Models: ", len(ravq.models)
for i in xrange(len(ravq.counters)):
print "Total Count for model %u : " % (i,), ravq.counters[i] |
|
2.12. Feature Extraction in General
* We've been discussing problems that require a prediction network to predict the sequence of RAVQ model vector states (or the sequence of states of any vector quantizer). This feature extraction when combined with asynchronous prediction learning makes the vector quantizer more than just a training aid. The vector quantizer cannot be removed after training, since it is required during functioning to break up input sequences into predictable, but asynchronous events with respect to real time.
2.13. Sources
1. Linaker, Niklasson. "Sensory Flow Segmentation using a Resource Allocating Vector Quantizer". http://citeseer.nj.nec.com/285911.html.
2. Linaker, Niklasson. "Extraction and inversion of abstract sensory flow representations". http://citeseer.nj.nec.com/353646.html.
3. Bergfeldt, Linaker. "Self-Organized Modulation of a Neural Robot Controller". http://citeseer.nj.nec.com/542716.html.
4. Nolfi, Tani. "Extracting Regularities in Space and Time Through a Cascade of Prediction Networks" http://citeseer.nj.nec.com/nolfi99extracting.html.
5. Kohonen. "Self Organizing Maps"
6. French, R. (1999). Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, Vol. 3 ( 4), 128 – 135. The French review is what grabbed my initial attention while I was searching for a topic. This review is fairly comprehensive up to about 1995 in terms of current research.
7. McClelland, J., McNaughton, B. and O’Reilly, R. (1995). Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102, 419-457. At some level of implementation the brain has to deal with catastrophic interference. This article ties in the structure and function of the brain with what we now know about catastrophic interference.
8. McCloskey, M., and Cohen, N. (1989). Catastrophic interference in connectionist networks: The sequential learning problem. The Psychology of Learning and Motivation, 24, 109 – 165. McCloskey and Cohen initially articulated the problem of catastrophic forgetting and demonstrated such interference in some simple networks.
9. Ratcliff, R. (1990). Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological Review, 97, 285-308. This is the other seminal paper investigating the property of catastrophic forgetting.
10. Robins, A. (1995). Catastrophic forgetting, rehearsal and pseudorehearsal. Connection Science. 7, 123-146. The Robins paper tries to connect the observed behavior of catastrophic forgetting to certain human rehearsal behaviors.
2.14. Other Approaches
1. French, R. (1991). Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. Proceedings of the 15th Annual Conference of the Cognitive Science Society, 723 – 728. French outlines a back propagation algorithm which reduces catastrophic interference. This combined with the Hamker article typify the dual approaches to resolving this problem, either with structural or algorithmic modifications.
2. Hamker, F. (2001) Life-long learning cell structures -- continuously learning without catastrophic interference, Neural Networks, Volume 14 (4-5), 551-573. The Hamker article is the most current research I could find dealing with this issue. In this article he develops a model for a cell structures network that avoids catastrophic interference.
3. Ring, M. B. (1994). Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas. http://citeseer.nj.nec.com/ring94continual.html
4. R. French, "Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference," in Proceedings of the 16th Annual Cognitive Science Society Conference, vol. 5, pp. 207--220, 1994. http://citeseer.nj.nec.com/french94dynamically.html
5. A. Edalat. Domain theory in learning processes. In S. Brookes, M. Main, A. Melton, and M. Mislove, editors, Proceedings of the Eleventh International Conference on Mathematical Foundations of Programming Semantics, volume one of Electronic Notes in Theoretical Computer Science. Elsevier, 1995. http://citeseer.nj.nec.com/article/edalat98domain.html