UserPreferences

Governor


1. Ideas

1. Use RAVQ to classify inputs + targets. When the classification changes, put the current input target pair into a buffer. At the same time, have the network train through each input target pair currently in the buffer.

2. Use RAVQ to classify inputs + targets. For each new input + target, have the RAVQ store that pair in a history associated with the winning model vector. At the same time, have the network iterate through all the model vector histories in some predefined order.

3. Use the RAVQ to classify inputs + targets and then train on the model vectors.

4. Use the RAVQ to classify inputs + targets then train on a buffer filled with episodes surrounding the important changes. This should be compatible with the SRN.

The following is a graphical representation of the Governor Architecture. The vector quantizer classifies the input sequence in hopes of identifying the critical input target pairs that best facilitate network learning. We have focused on using the RAVQ exclusively, but other vector quantization methods (such as SOM or LVQ) may provide similar results.


http://www.cs.swarthmore.edu/~stober/GovernorArch.jpg

2. Some Thoughts

1. The original motivation was to create a system that would balance data on the fly. Balancing data, however, destroys time series information that may be important when using an SRN.

2. What are the memory limitations of an SRN? Should we expect an SRN to hold traces of information for 10, 50, 100 time steps?

3. Balancing the data seems to require knowledge of the teacher, and of critical situations where the teacher changes. An automatic governor will have no a priori knowledge of the teacher, and so cannot balance with respect to critical events (as determined by an outside observer).

3. Progress

Ideas 1 and 2 have shown improved behavior over a network trained without the governor, but the training size is 200,000 steps, which is still quite large. Any success shows remarkable variation when trials are rerun with even slightly different parameter settings, indicating that the current solution may not yet not be robust enough for general use. We would also like to reduce the number of steps that are required to train on this behavior. Changing the teacher and the task are also being considered as further extensions.

Matt is working on trying a maze experiment similar to that we found in a paper on the RAVQ.

Changing the task and the environment do demonstrate that the governor provides some advantage over an un-governed network. The results of recent work is described below.

4. Some Recent Developments

Lisa has shown that using the critical buffer method (recording inputs when the winner changes) has yeilded imporved learning over the non governed neural network. I have been doing head to head comparisons between different types of governors. The critical buffer method is dependent on when model vectors change. The model vector buffer method is dependent on what the model vectors actually are. I'd say that the model vector buffer method amplifies rare inputs more than the the critical buffer method (which is more sensitive to changes over rarity). This gives the model vector buffer a slight advantage in my mind.

When trained on the wall follower task in the familiar Nolfi and Tani world (10,000 steps), the model vector buffer method yeilded a controller network which succesfully navigated the lip separating the two rooms. The critical buffer method stalled on the turn (overturning). I believe that the model vector buffer method proved superior in this instance. More experiments need to be done in a wide range of environments and under a wide range of primitive controllers.

5. An Example Brain

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 
 65 
 66 
 67 
 68 
 69 
 70 
 71 
 72 
 73 
 74 
 75 
 76 
 77 
 78 
 79 
 80 
 81 
 82 
 83 
 84 
 85 
 86 
 87 
 88 
 89 
 90 
 91 
 92 
 93 
 94 
 95 
 96 
 97 
 98 
 99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 
217 
218 
219 
220 
221 
222 
223 
224 
225 
226 
227 
228 
229 
230 
231 
232 
233 
234 
235 
236 
237 
238 
239 
240 
241 
242 
243 
244 
245 
246 
247 
248 
249 
250 
251 
252 
253 
254 
255 
256 
257 
258 
259 
260 
261 
262 
263 
264 
265 
266 
267 
268 
269 
270 
271 
272 
273 
274 
275 
276 
277 
278 
279 
280 
281 
282 
283 
284 
285 
286 
287 
288 
289 
290 
291 
292 
293 
294 
295 
296 
297 
298 
299 
300 
301 
302 
303 
304 
# imported modules
from pyrobot.brain import Brain
from pyrobot.brain.VisConx.VisRobotConx import *
import pyrobot.brain.ravq
import os
import time
import random


# log file directories
rootDirectory = "/local/"
currentExperiment = "data/"
currentBrain = "/local/GovernorBrain.py"

class GovernorBrain(Brain):
    """A brain that uses a RAVQ to govern network learning."""
    def setup(self):

        # for use with player/stage
        #self.startService('truth')
        #self.startService('bumper')

        # robot parameters
        self.robot.range.units = 'ROBOTS'
        self.maxvalue = self.robot.range.getMaxvalue()
        self.maxvalue += 0.075

        # status variables
        self.verbosity = 1
        self.direction = 1
        self.blockedFront = 0
        self.wasStalled = 0
        self.counter = 0
        self.previous = [0.0, 0.0]

        # tweakable params
        self.sleepTime = 0.10
        self.stopTime = 10000

        # choose the governor method
        self.method = 0

        # create network
        self.net = VisRobotNetwork() # could use VisRobotSRN()
        self.inSize = self.robot.range.count
        self.net.addLayers(self.inSize, self.inSize/2, 2)

        # defaults - but here explicit
        self.net.setBatch(0)
        self.net.setInteractive(0)
        self.net.setVerbosity(0)

        # initialize network
        self.net.initialize()

        # learning parameters
        self.net.setEpsilon(0.2)
        self.net.setMomentum(0.9)
        self.net.setTolerance(0.05)

        # set learning
        self.net.setLearning(1)

        # input ravq (tweakable parameters)
        self.ravq = pyro.brain.ravq.ExperimentalRAVQ(5, .3, .2, .02)
        self.ravq.setHistory(1)
        self.ravq.setAddModels(1)
        self.ravq.setLearning(1)
        self.ravq.setMask([1] * self.inSize + [self.inSize / 2] * 2)

        # buffer for governor
        self.buffer = []
        self.bufferSize = 100
        self.bufferIndex = 0

        # file IO
        self.path = rootDirectory + currentExperiment
        if(os.path.isfile(self.path + "exp.lock")):
            raise "Lock error!"
        else:
            try:
                os.mkdir(self.path)
            except:
                pass
            lock = open(self.path + "exp.lock", "w")
            lock.write("This file locks the experiment directory to" + \
                       "prevent overwriting experimental data.")
            lock.close()
            # archive brain for future reference
            os.system("cp " + currentBrain + " " + self.path + "archive.py")
            self.netInfo = open(self.path + 'nn.dat', 'w')
            self.ravq.openLog(self.path + 'ravq.log')
            self.ravqInfo = open(self.path + 'ravq.dat', 'w')
            self.repositionLog = open(self.path + 'reposition.dat','w')
            self.data = open(self.path + 'input_target.dat', 'w')
            self.balancedData = open(self.path + 'balanced.dat', 'w')

    def destroy(self):
        self.netInfo.close()
        self.ravq.closeLog()
        self.ravqInfo.close()
        self.repositionLog.close()
        self.data.close()
        self.balancedData.close()
        self.net.destroy()

    def saveListToFile(self, ls, file):
        for i in range(len(ls)):
            file.write(str(ls[i]) + " ")
        file.write("\n")

    def scaleSensors(self, val):
        """From Robots (or anything) to [0, 1]"""
        return (val / self.maxvalue)

    def scaleMotors(self, val):
        """[-1, 1] to [0, 1]"""
        return (val + 1) / 2.0

    def kick(self):
        """How to get unstuck."""
        self.repositionLog.write("STALLED " + str(self.counter) + "\n")
        self.move(0.5 * random.random(), 0.0)
        time.sleep(1)
        self.update()
        if self.get('robot/stall'):
            self.move(-0.5 * random.random(), 0.0)
            time.sleep(1)
            self.update()
            if self.get('robot/stall'):
                self.move(0.0, 0.5 * random.random())
                time.sleep(1)
                self.update()
                if self.get('robot/stall'):
                    self.move(0.0, -0.5 * random.random())
                    time.sleep(1)
                    self.update()

    # this is not the wall follower!

    def avoidObstacles(self):
        """
        Determines next action, but doesn't execute it.
        Returns the translate and rotate values.
        
        When front is blocked, it picks to turn away from the
        direction with the minimum reading and maintains that
        turn until front is clear.
        """
        d = 0.7
        ds = 0.3
        turn = random.random()
        minFront = min(self.get('robot/range/front/value'))
        minLeft  = min(self.get('robot/range/front-left/value'))
        minRight = min(self.get('robot/range/front-right/value'))
        sideLeft = self.get('robot/range/0/value')
        sideRight = self.get('robot/range/7/value')
        if minFront < d:
            if not self.blockedFront:
                if minRight < minLeft:
                    self.direction = 1
                else:
                    self.direction = -1
            self.blockedFront = 1
            return [0, self.direction * turn]
        elif minLeft < d:
            if self.blockedFront:
                return [0, self.direction * turn]
            else:
                return [0,-turn]
        elif minRight < d:
            if self.blockedFront:
                return [0, self.direction * turn]
            else:
                return [0,turn]
        else:
            if sideLeft < ds:
                return [0,-turn]
            elif sideRight < ds:
                return [0,turn]
            else:
                self.blockedFront = 0
                return [.2,0]

    def wallFollower(self):
        # tweakable parameters
        frontRange = 0.7
        minRange = .5
        maxRange = .7
        amount = 0.1

        # important sensors
        minFront = min(self.get('robot/range/front/value'))
        minLeft  = min(self.get('robot/range/front-left/value'))
        minRight = min(self.get('robotrange/front-right/value'))
        left =  min(self.get('robot/range/left/value'))
        right = min(self.get('robot/range/right/value'))

        # the decision algorithm
        if minFront < frontRange:
            if not self.blockedFront:
                self.direction = -1
            self.blockedFront = 1
            return [0, self.direction * amount]
        else:
            self.blockedFront = 0
        if minLeft < minRange:
            if self.blockedFront:
                return [0, self.direction * amount]
            else:
                return [amount/2.0, -amount]
        elif minLeft > maxRange:
            if self.blockedFront:
                return [0, self.direction * amount]
            else:
                return [amount/2.0, amount]
        elif minRight < minRange:
            if self.blockedFront:
                return [0, self.direction * amount]
            else:
                return [amount, amount]
        else:
            self.blockedFront = 0
            return [0.1, 0.0]

    def step(self):

        # display count
        if self.verbosity > 0: print self.counter
        if self.counter > self.stopTime:
            self.net.saveWeightsToFile(self.path + 'network.wts')
            self.ravq.saveRAVQToFile(self.path + 'ravq.pck')
            self.ravqInfo.write(str(self.ravq))
            self.destroy() # closes files
            self.pleaseStop()

        # use self.avoidObstacles() to change primitive behavior 
        motors = self.avoidObstacles()

        # scale values that the network will use
        inputs = map(self.scaleSensors, self.get('robot/range/all/value'))
        targets =  map(self.scaleMotors, motors)

        # record the data for later offline learning
        self.saveListToFile(inputs + targets, self.data)

        # classify the data using the ravq
        self.ravq.input(inputs + targets)
        # autolabel the ravq models (slow)
        self.ravq.autoLabel('decimal')

        if self.verbosity > 0:
            print " RAVQ Winner: ", self.ravq.getLabel(self.ravq.winner)
            print " Number of Models: ", len(self.ravq.models)
            print " MovingAvgDistance: ", self.ravq.movingAverageDistance
            print " ModelVectorDistance: ", self.ravq.modelVectorsDistance

        # kick if things get bad
        if self.get('robot/stall'):
            self.wasStalled += 1
            if self.wasStalled > 10:
                print 'Kicking!'
                self.kick()
                self.wasStalled = 0

        if self.method:
            # this method uses a buffer populated with input target pairs
            # that occur at model vector changes
            if self.ravq.getNewWinner(): # 1 if the winner is new
                if len(self.buffer) >= self.bufferSize:
                    self.buffer = self.buffer[1:] + [inputs + targets]
                else:
                    self.buffer.append(inputs + targets)
            self.ravq.logHistory() # record of RAVQ winners
            if len(self.buffer) > 0: # cycle through current buffer
                array = self.buffer[self.bufferIndex]
                self.bufferIndex = (self.bufferIndex + 1) % len(self.buffer)
                error, correct, total, totalPCorrect = self.net.step(input = array[:self.inSize], \
                                                                     output = array[self.inSize:])
                self.netInfo.write(str(self.counter) + "\t" + str(error) + "\n")
                self.saveListToFile(array, self.balancedData)
        else:
            # this method uses buffers associated with individual model
            # vectors. these buffers are implemented in ravq.py
            if self.ravq.getHistoryLength() > 0:
                array = self.ravq.getHistory(self.bufferIndex)
                self.bufferIndex = (self.bufferIndex + 1) % self.ravq.getHistoryLength()
                self.net.step(input = array[:self.inSize], output = array[self.inSize:])
                self.saveListToFile(array, self.balancedData)

        # move the robot according to the primitive controller 
        self.move(motors[0], motors[1])

        # sleep, record motor values, increment counter
        time.sleep(self.sleepTime)
        # optional additional input of motor values
        self.previous = motors[:]
        self.counter += 1

def INIT(engine):
    return GovernorBrain('GovernorBrain', engine)

if __name__ == '__main__':
    os.system("pyro -r Khepera -b /local/GovernorBrain.py")

6. Test Brain

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 
 65 
 66 
 67 
 68 
 69 
 70 
 71 
 72 
 73 
 74 
 75 
 76 
 77 
 78 
 79 
 80 
 81 
 82 
 83 
 84 
 85 
 86 
 87 
 88 
 89 
 90 
 91 
 92 
 93 
 94 
 95 
 96 
 97 
 98 
 99 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
111 
112 
113 
114 
115 
116 
117 
118 
119 
120 
121 
122 
123 
124 
125 
126 
127 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 
169 
170 
171 
172 
173 
174 
175 
176 
177 
178 
179 
180 
181 
182 
183 
184 
185 
186 
187 
188 
189 
190 
191 
192 
193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 
205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 
217 
218 
219 
220 
221 
222 
223 
224 
225 
226 
227 
228 
229 
230 
231 
232 
233 
234 
235 
236 
237 
238 

from pyrobot.brain import Brain
from pyrobot.brain.VisConx.VisRobotConx import *
import pickle
import pyrobot.brain.ravq
import os
import time
import random


# log file directories
rootDirectory = "/local/"
currentExperiment = "data2/"
currentBrain = "/local/GovernorBrainTest.py"

class GovernorBrain(Brain):
    """A brain that uses a RAVQ to govern network learning."""
    def setup(self):

        # for use with player/stage
        #self.startService('truth')
        #self.startService('bumper')

        # robot parameters
        self.robot.range.units = 'ROBOTS'
        self.maxvalue = self.robot.range.getMaxvalue()
        self.maxvalue += 0.10

        # status variables
        self.verbosity = 1
        self.direction = 1
        self.blockedFront = 0
        self.wasStalled = 0
        self.counter = 0
        self.previous = [0.0, 0.0]

        # tweakable params
        self.sleepTime = 0.05
        self.stopTime = 10000

        # choose the governor method
        self.method = 0

        # file IO
        self.path = rootDirectory + currentExperiment

        # create network
        self.net = VisRobotNetwork() # could use VisRobotSRN()
        self.inSize = self.robot.range.count
        self.net.addLayers(self.inSize, self.inSize/2, 2)
        self.net.loadWeightsFromFile(self.path + 'network.wts')

        # defaults - but here explicit
        self.net.setBatch(0)
        self.net.setInteractive(0)
        self.net.setVerbosity(0)

        # learning parameters
        self.net.setEpsilon(0.2)
        self.net.setMomentum(0.9)
        self.net.setTolerance(0.05)

        # set learning (no learning during testing)
        self.net.setLearning(0)

        # input ravq (tweakable parameters)
        fp = open(self.path + 'ravq.pck')
        self.ravq = pickle.load(fp)
        fp.close()
        self.ravq.setHistory(0)
        self.ravq.setAddModels(0)
        self.ravq.setLearning(0)
        self.ravq.setMask([1] * self.inSize + [self.inSize / 2] * 2)
        self.ravq.autoLabel()

    def destroy(self):
        self.net.destroy()

    def scaleSensors(self, val):
        """From Robots (or anything) to [0, 1]"""
        return (val / self.maxvalue)

    def scaleMotors(self, val):
        """[-1, 1] to [0, 1]"""
        return (val + 1) / 2.0

    def kick(self):
        """How to get unstuck."""
        self.wasStalled += 1
        self.move(1.0 * random.random(), 0.0)
        time.sleep(1)
        self.update()
        if self.robot.stall:
            self.move(-1.0 * random.random(), 0.0)
            time.sleep(1)
            self.update()
            if self.robot.stall:
                self.move(0.0, 1.0 * random.random())
                time.sleep(1)
                self.update()
                if self.robot.stall:
                    self.move(0.0, -1.0 * random.random())
                    time.sleep(1)
                    self.update()

    # this is not the wall follower!    
    def avoidObstacles(self):
        """
        Determines next action, but doesn't execute it.
        Returns the translate and rotate values.
        
        When front is blocked, it picks to turn away from the
        direction with the minimum reading and maintains that
        turn until front is clear.
        """
        d = 0.7
        ds = 0.3
        turn = random.random()
        minFront = min([s.value for s in self.robot.range["front"]])
        minLeft  = min([s.value for s in self.robot.range["front-left"]])
        minRight = min([s.value for s in self.robot.range["front-right"]])
        sideLeft = self.robot.range[0].value
        sideRight = self.robot.range[7].value
        if minFront < d:
            if not self.blockedFront:
                if minRight < minLeft:
                    self.direction = 1
                else:
                    self.direction = -1
            self.blockedFront = 1
            return [0, self.direction * turn]
        elif minLeft < d:
            if self.blockedFront:
                return [0, self.direction * turn]
            else:
                return [0,-turn]
        elif minRight < d:
            if self.blockedFront:
                return [0, self.direction * turn]
            else:
                return [0,turn]
        else:
            if sideLeft < ds:
                return [0,-turn]
            elif sideRight < ds:
                return [0,turn]
            else:
                self.blockedFront = 0
                return [.2,0]


    def wallFollower(self):
        # tweakable parameters
        frontRange = 0.7
        minRange = .5
        maxRange = .7
        amount = 0.1

        # important sensors
        minFront = min(self.get('robot/range/front/value'))
        minLeft  = min(self.get('robot/range/front-left'))
        minRight = min(self.get('robot/range/front-right/value'))
        left =  min(self.get('robot/range/left/value'))
        right = min(self.get('robot/range/right/value'))
        self.score += left

        # the decision algorithm
        if minFront < frontRange:
            if not self.blockedFront:
                self.direction = -1
            self.blockedFront = 1
            return [0, self.direction * amount]
        else:
            self.blockedFront = 0
        if minLeft < minRange:
            if self.blockedFront:
                return [0, self.direction * amount]
            else:
                return [amount/2.0, -amount]
        elif minLeft > maxRange:
            if self.blockedFront:
                return [0, self.direction * amount]
            else:
                return [amount/2.0, amount]
        elif minRight < minRange:
            if self.blockedFront:
                return [0, self.direction * amount]
            else:
                return [amount, amount]
        else:
            self.blockedFront = 0
            return [0.1, 0.0]


    def step(self):
        # display count
        if self.verbosity > 0: print self.counter
        if self.counter > self.stopTime:
            self.destroy() # closes files
            self.pleaseStop()

        # use self.avoidObstacles() to change primitive behavior 
        motors = self.avoidObstacles()

        # scale values that the network will use
        inputs = map(self.scaleSensors, self.get('robot/range/all/value'))
        targets =  map(self.scaleMotors, motors)

        # classify the data using the ravq
        self.ravq.input(inputs + targets)

        if self.verbosity > 0:
            print " RAVQ Winner: ", self.ravq.getLabel(self.ravq.winner)
            print " Number of Models: ", len(self.ravq.models)

        # kick if things get bad
        if self.get('robot/stall'):
            print "Kicking"
            self.kick()

        error, correct, total = self.net.step(input = inputs, output = targets)

        # move the robot according to the network 
        self.move((self.net.getLayer('output').getActivations()[0] * 2.0) - 1.0,\
                             (self.net.getLayer('output').getActivations()[1] * 2.0) - 1.0)

        # sleep, record motor values, increment counter
        time.sleep(self.sleepTime)

        # optional additional input of motor values
        self.previous = motors[:]
        self.counter += 1

def INIT(engine):
    return GovernorBrain('GovernorBrain', engine)

if __name__ == '__main__':
    os.system("pyro -r Khepera -b /local/GovernorBrainTest.py")

The World Files:

# Desc: 1 robot with player, laser, sonar and gps
# CVS: $Id: nolfi2.world,v 1.1 2003/05/15 21:45:55 yeelin Exp $

# the resolution of Stage's raytrace model in meters
#

resolution 0.02

# GUI settings
#

gui
(
  size [ 502.000 506.000 ]
  origin [5.018 4.950 0]
  scale 0.021 # the size of each bitmap pixel in meters
)

# load a bitmapped environment from a file
#

bitmap
(
  file "nolfi2.pnm"
  #resolution 0.1
  #resolution .044
  resolution 0.07
)

include "/usr/local/stage/worlds/usc_pioneer.inc"

# create a robot, setting its start position and Player port, 
# and equipping it with a laser range scanner
#
#position
#(
# port 6665 
# pose [1.0 1.0 20]
#laser()
#)
 
usc_pioneer
(
  color "green" 
  name "robot" 
  port 6665
  pose [.796 2.211 92] 
  #(1101, 1113, 48)
  truth()
)

# coordinates are defined from the center of the box
#box ( size [0.75 0.75] color "blue" pose [2.5 3.5 0.000] sonar_return "visible" ) 
#box ( size [0.75 0.75] color "red" pose [2.5 1.5 0.000] sonar_return "visible" ) 

The nolfi2.pnm file can be found here: http://www.cs.swarthmore.edu/~stober/nolfi2.pnm

7. Offline Approach

A better approach might be to take sample sensor target values and then use the governor approach offline. Here is a file that does that using data from a file of previously gathered sensor/target pairs.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 

from pyrobot.brain.conx import *
from pyrobot.brain.ravq import *

n = SRN()
n.setSequenceType("ordered-continuous")
n.addLayers(16,2,2)
n.loadDataFromFile('input_target.dat')

n.setEpsilon(0.2)
n.setMomentum(0.9)
n.setTolerance(0.05)
n.setLearning(1)

ravq = ARAVQ(3, .2, 1.6, .05)
ravq.setAddModels(1)
ravq.setHistory(1)
ravq.setMask([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,8,8])

fp = open('balanced.dat','w')

counter = 0
buffer = []
bufferIndex = 0

method = 1

def saveListToFile(ls, file):
    for i in range(len(ls)):
        file.write(str(ls[i]) + " ")
    file.write("\n")

for x in n.loadOrder:
    inputs = n.inputs[x]
    targets = n.targets[x]
    ravq.input(inputs + targets)
    if method:
        if ravq.getNewWinner(): # is 1 if the winner is a new winner, 0 otherwise
            if len(buffer) >= 100:
                buffer = buffer[1:] + [inputs + targets]
            else:
                buffer.append(inputs + targets)
        if len(buffer) > 0: # cycle through current buffer
            array = buffer[bufferIndex]
            bufferIndex = (bufferIndex + 1) % len(buffer)
            n.step(input = array[:16], output = array[16:])
            saveListToFile(array, fp)
        if x > 50000: # train for 50000 steps
            break
    else:
        if ravq.getHistoryLength() > 0:
            array = ravq.getHistory(bufferIndex)
            bufferIndex = (bufferIndex + 1) % ravq.getHistoryLength()
            n.step(input = array[:16], output = array[16:])
            saveListToFile(array, fp)
        if x > 50000:
            break

print " Count: ", x
print " Steps: ", n.count
print " Number of model vectors: ", len(ravq.models)

n.saveWeightsToFile('network.wts')
fp.close()

8. SRN Offline Approach

This approach is similar to the approach that uses a buffer to store input target pairs at changes in the model vector. This approach differs, however, in that entire sequences leading up to the model vector are stored in the buffer. This allows an SRN network to be trained on sequences of input data, and thus preserve the contiguity of events while benefiting from the balancing of the governor architecture.

  1 
  2 
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 
 14 
 15 
 16 
 17 
 18 
 19 
 20 
 21 
 22 
 23 
 24 
 25 
 26 
 27 
 28 
 29 
 30 
 31 
 32 
 33 
 34 
 35 
 36 
 37 
 38 
 39 
 40 
 41 
 42 
 43 
 44 
 45 
 46 
 47 
 48 
 49 
 50 
 51 
 52 
 53 
 54 
 55 
 56 
 57 
 58 
 59 
 60 
 61 
 62 
 63 
 64 
 65 
 66 
 67 
 68 
 69 
 70 
 71 
 72 
 73 
 74 
 75 
 76 
 77 
 78 
 79 
 80 
 81 
 82 
 83 
 84 
 85 
 86 
 87 
 88 

from pyrobot.brain.VisConx.VisRobotConx import *
from pyrobot.brain.ravq import *
import math

def saveListToFile(ls, file):
    for val in ls:
        file.write(str(val) + " ")
    file.write("\n")

# log file directories
rootDirectory = "/local/"
currentExperiment = "Data_Wander/"
dataOutput = "SRNBuffer/"
currentBrain = "/home/GovWander.py"

n = VisRobotSRN()
n.setSequenceType("ordered-continuous")
n.addLayers(16,5,2)
n.loadDataFromFile(rootDirectory+currentExperiment+'input_target.dat')
n.setEpsilon(0.3)
n.setMomentum(0.0)
n.setTolerance(0.05)
n.setLearning(1)
ravq = RAVQ(1, .2, 1.6)
ravq.setAddModels(1)
ravq.setMask([1,]*len(n.inputs[0]) + [8,]*len(n.targets[0]))
ravq.setHistory(0)

bufferLength = 10
historyLength = 9
historyBuffer = []
contextBuffer = []
trainingBuffer = []
trainingSeq = 0
seqLoc = 0 #location within current training pattern
for x in n.loadOrder:
    inputs = n.inputs[x]
    targets = n.targets[x]
    ravq.input(inputs+targets)

    #maintain input/target sequences and context layer
    if len(historyBuffer) < historyLength:
        historyBuffer = [inputs+targets] + historyBuffer
        contextBuffer = [n.getLayer('context').getActivationsList()] + contextBuffer
    else:
        historyBuffer = [inputs+targets] + historyBuffer[0:-1]
        contextBuffer = [n.getLayer('context').getActivationsList()] + contextBuffer[0:-1]
        if trainingSeq >= len(trainingBuffer)-1:
            trainingSeq = 0
        else:
            trainingSeq += 1
        seqLoc = 0

    #if model vector changes, put tuple of history and starting context into training buffer
    if ravq.newWinnerIndex != ravq.previousWinnerIndex:
        if len(trainingBuffer) < bufferLength:
            trainingBuffer = [(historyBuffer, contextBuffer[-1])] + trainingBuffer
        else:
            trainingBuffer = [(historyBuffer, contextBuffer[-1])] + trainingBuffer[0:-1]
        print " Winner #: ", ravq.newWinnerIndex
        print " Current step: ", x

    if len(trainingBuffer) > 0:
        if seqLoc >= len(trainingBuffer[trainingSeq]):
            if trainingSeq >= len(trainingBuffer)-1:
                trainingSeq = 0
            else:
                trainingSeq += 1
            seqLoc = 0
            n.getLayer('context').resetFlags()
            n.getLayer('context').copyActivations(trainingBuffer[trainingSeq][1])

        n.step(input = trainingBuffer[trainingSeq][0][-1 - seqLoc][0:len(inputs)],
               output = trainingBuffer[trainingSeq][0][-1 - seqLoc][len(inputs):len(inputs)+2])
        seqLoc += 1

    if x % 10000 == 0:
        print " Count: ", x
        print " Num Models: ", len(ravq.models)
        print "Training Buffer Length: ", len(trainingBuffer)
        for i in xrange(len(trainingBuffer)):
            print "Entry 3%d: %d" % (i, len(trainingBuffer[i][0]))

n.saveWeightsToFile(rootDirectory+currentExperiment+dataOutput+'offline.hidden5_10_9_.3.wts')
print " Num Models: ", len(ravq.models)
for i in xrange(len(ravq.counters)):
    print "Total Count for model %u : " % (i,), ravq.counters[i]