Developmental Robotics group Summer 2005
This page documents the activities of the Developmental Robotics Research Group for the summer 2005. The group is composed of faculty members Lisa Meeden and Douglas Blank and some Swarthmore students.
Past summers' research:
Here, we will keep a log of some of the major activities carried out and/or planned for the summer.
Go to DevelopmentalRobotics for a starting point to these activities.
Summer Schedule
We will meet every Monday at 10am
-
Lisa will be away July 4-8
-
Doug will be away around June 20 - 27, and around July 9 - 13
-
Ben will be away June 27-July 1, and ends on August 5
-
Ethan will be away at some point (he will try to overlap with other people's vacations as much as possible) and ends on August 5
May June
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7 1 2 3 4
8 9 10 11 12 13 14 5 6 7 8 9 10 11
15 16 17 18 19 20 21 12 13 14 15 16 17 18
22 23 24 25 26 27 28 19 20 21 22 23 24 25
29 30 31 26 27 28 29 30
July August
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 1 2 3 4 5 6
3 4 5 6 7 8 9 7 8 9 10 11 12 13
10 11 12 13 14 15 16 14 15 16 17 18 19 20
17 18 19 20 21 22 23 21 22 23 24 25 26 27
24 25 26 27 28 29 30 28 29 30 31
31
Ideas to Explore
-
Learn Python/Pyro and build general-purpose visualization tools.
-
Explore a non-iterative learning system such as DistAl. See Yang. Parekh, and Honvar (1997) or (1999) pp. 53-73.
-
Explore Growing Neural Gas (GNG), an alternative to RAVQ that keeps topology.
-
Understand what makes BabybotOffline work.
-
Redo original Babybot with SRN Governor
-
Get it working on Aibo with Aibo TV
-
Explore noisy XOR learning (learns to ignore patterns it can't learn)
Log
-
A log of weekly/daily events that others should know about.
August 1
Ethan tried a few things on Babybot2 with Governed SRN and found that:* If masks on the control and prediction net layers are set equal, governed Babybot does only slightly better than usual.
-
that is:
-
controlNet.setMask(state=1, context=1, motor_out=1) predictNet.setMask(motor_in= 1, state=1, context=1, predict=1)
* If the camera's viewscreen is cut to 40x2 by using the two center pixels in the viewscren, governed Babybot eventually gets 100% reward-centers, but much later than SRN Babybot (trial 400).
* If masks are set equal AND the viewscreen is cut to 40x2, governed Babybot performs better than SRN Babybot and does extremely well; he hits 100% reward-centers by trial 12, in a sharp jump from about 15%, and then fluctuates between very low and very high rewards semi-periodically. Also, he almost never gets *punished*; when he is not being rewarded, there is no error on the viewscreen (which, note, is when he learns from the MVs).
These files can currently be found at http://www.sccs.swarthmore.edu/~egj/Babybot/ with filenames { MaskTwo.gov, MaskAll.gov, NoMaskTwo.gov } for the data files, and { MaskTwo.gov.pdf, MaskAll.gov.pdf, NoMaskTwo.gov.pdf } for the graphs.
Saturday, July 23
AiboBaby has been running since Friday afternoon. Results-so-far can be seen at:http://www.sccs.swarthmore.edu/~egj/Babybot/aibo.pdf
...they don't look terribly promising so far.
Update from Pomona College
We've put some results from our noisy XOR experiments on the following page: XORNoise
Week of July 11
More runs of the Babybot2 experiment were made; a few significant bugs were fixed, but the experiment still isn't working. Ethan ran the experiment with both robots stationary for the entire run, and found that while prediction error almost immediately dropped to zero, a small box of prediction error would appear in the center of the viewscreen every few trials; this popped up regularly throughout the experiment. I think this is related to the periodic resetting-and-flipping of the target robot: the error blip isn't present for the entire first trial (after about three timesteps there is no error at all) and only starts to appear on the second trial, when the target robot is flipped 180* (and so its appearance & position are slightly different). But why would a 90-node-HL NN be unable to learn this perfectly predictable change successfully? (The same thing happens with a governed SRN, too.)It was also determined that Kheperas aren't very useful.
Week of July 4
10 runs of the Babybot2 experiment were made, using these paramaters:-
rewardEpsilon: 0.5
-
punishEpsilon: 0.3
-
punishEpsilonLow: 0.05
-
predictEpsilon: 0.3
-
controlMomentum: 0.0
-
predictMomentum: 0.1
-
regular SRN, not governed
Each run lasted for at least 500000 timesteps; since a trial lasts for 390 timesteps this meant about 1300 trials. (NOTE: Run length was dependent on timesteps instead of # of trials because that's how the code was written for BabybotOffline, and I was too lazy to change it. I figured 500000 timesteps would be way more than we needed since the original Babybot reached its peak performace around trials 35-60, according to the Babybot paper.)
Results: In none of the ten runs did Babybot keep the error centroid in the center of its view for an entire trial. (But, is it fair to expect it to?) For each run, the ControlNet and PredictNet TSS errors were plotted vs. time, as was the value of the "Reward Center" counter. In many runs a spike in "Rewards" can be seen around trials 40-80, which seems to be consistent with the original experiment's result. I'm not sure how to interpret the TSS numbers; if there is a discernible pattern in them, it appears to be that CNet TSS rises and falls at the same places Rewards rises and falls.
My overall impression was that Babybot hasn't really learned to follow the target. It is able to start turning in the right direction (as described in "Week of June 20" below) but it always turns too quickly, leaving the target behind. On occasion, Babybot starts turning back in the opposite direction, but I did not get the sense that it was trying to find the target.
It seems like we need to tweak this some more before it's going to work on the AIBO--after all, even supposing Babybot2 worked perfectly, there's no good way to reposition the real robots at the beginning of each trial, as we do in the simulation.
Week of June 20
Tests were run on the Babybot2 code. At least two issues still remain:-
There is a bug somewhere in the pyro camera code that causes crashes; Doug says it's probably at the level of the underlying C++ code.
-
Babybot is recieving rewards/punishments both for error caused by the presence of an unexpected object AND the absence of an expected one. In the original experiment, only the first kind of error was used.
Given that, here are some preliminary results:
-
So far there has not been an observable improvement when using a Governed network rather than a standard SRN. Both kinds of brains are able to learn within the first 5 or 10 trials (where a trial is the time from the start of a reset to the start of the next reset) to initially turn in the same direction that the target robot is traveling (i.e. counterclockwise or clockwise). I have not yet seen Babybot keep the target in the center of its vision for a whole trial.
-
Usually we can get at least 50 or 100 trials in before the camera crash kicks in (sometimes several hundred). The program collects data once per trial, so I made an attempt to plot Babybot's rewardNet TSS error, controlNet TSS error, and reward count (all vs. # of trials). So far it looks pretty jumbled/scattered...if there was a discernible trend I couldn't make it out. More trials will hopefully lead to a stronger correlation.
Also, some time was spent last week and at the beginning of this week learning to use the digital camera and iMovie software so that if we want to make a video of the AIBO at the end of the summer, we will know how.
Week of June 6
Ethan & Ben cleaned up code and ran tests on BabybotOfflineTest. Findings:-
Using a governor instead of an SRN speeds learning 5x.
-
Using RewardEpsilon = PunishEpsilon = 0.5 (rather than 0.3, 0.1 respectively) speeds learning 2x.
-
Tracking error versus tracking state has no significant effect.
-
Any GovernorEpsilon > 1, 0.001 <= GovernorDelta <= 0.01 work equally well.
-
Hidden layer learns just as well if it has at least 5 nodes.
-
If the robot isn't able to wrap around the edge of the world, it never seems to learn. This is weird; maybe my non-wrapping code is wrong somewhere? (It's possible that it's getting misleading sensor data when it's near the edge of the world in this case.) UPDATE: this is now fixed; the non-wrapping code was bugged. Testing indicates that preventing the robot from wrapping around the world speeds up learning by about 2x when not using a Governor. When a Governor is used, there's not a significant difference in wrap vs. non-wrap times.
