While working on my swarm robot sensor subsystem I was struck by how much hand crafting went into programing a robot controller. The motors never rotate at the same speed, no two sensors respond to stimuli in the exact same way,..
One of my roboticist heroes, Dario Floreano, has written about using genetic algorithms to evolve neural networks in his robotics research. One of his early experiments was in solving a familiar robot task, i.e. getting a robot to move and avoid obstacles. A task that has come up many times in my work with robots. His idea was to use a “simple” feedforward neural network to map IR proximity sensors to the two motors of a differential drive robot and to use genetic algorithms to derive the fittest individual to perform the task. Wow! All new territory for me, but I’m hooked.
First on my list was to understand what a feedforward neural network with recurrent connections is and to get more details on how it was implemented in his experiment. Lots of jargon to learn.
Automatic Creation of an Autonomous Agent: Genetic Evolution of a Neural-Network Driven Robot (1994), Floreano, D., Mondada, F.
To err is human.
To really foul up, use a computer.
The neural network
So I found some more details on their experiment. The architecture of the neural network seems like a key piece of their success.
In a feedforward neural network a neuron’s output is the sum of the product of the input and weight presynaptic vectors and the threshold required to trigger a response (I warned you about the jargon). I’ve come to find out that they used a bias in place of the threshold and it was a vital part of the architecture because it gave the robot movement when there was no sensory stimulus.
A neural network can use an activation function to interpret the results of the neuron calculation. Floreano and Mondada used a sigmoid function on the output neurons shifted by 0.5 for a range of [-0.5,0.5]. When supplied a value for x the sign of the result indicated the wheel’s direction, and the absolute value the wheel’s speed. Sweet!
1 /( 1 + exp(-x) ) - 0.5
The authors modeled their neural network on Braitenberg vehicle 3c and noted that it can get stuck when two sensors on either side receive the same stimulus at the same time (it’s the classic robot stuck in a corner problem). Their way out of this is to introduce time into the calculation performed by the output neurons with a recurrent connection that supplied the neuron’s output from the previous time step. The best individuals of the last generations never get stuck in such cases because the state of the motors is not uniquely defined by the current state but also by the previous history of actions. If interested check out J. Elman’s paper.
Elman J.L. (1990) Finding structure in time. Cognitive Science, 14:179-211.
Here’s my take on their neural network architecture and what I’m implementing. One of my sources mentions that their network design had these synaptic connections: “16 for sensors, 4 recurrent, and 2 for bias unit.” I believe that the 4 recurrent connections implement Elman’s idea of historical actions.
I got a chance to write and test a bit of code to execute this neural network. I took out the initialization and print statements for brevity. Eventually the normalized sensor readings will populate the inputLayer array and the weights and bias will come from the genetic algorithm.
/* Neural Network Robot This is intended to flesh out a network for inclusion in the simple genetic algorithm to learn how to move a robot and avoid obstacles */ #include <stdio.h> #include <math.h> #include <stdlib.h> #include "pico/stdlib.h" #include "random.h" // Simple network that can learn to move and not get stuck // Features : sigmoid activation function, Elman context (recurrent) neurons, no training, no backprop double sigmoid(double x) { return 1 / (1 + exp(-x)) - 0.5; } // e raised to the -x power and shifted -0.5 double init_weight() { float result; result = ga_random(); if (result > 0.5) result = -1 + result; return result; } #define numInputs 8 // 8 sensors #define numContextNodes 2 // These two must both be the same number #define numOutputs 2 // These two must both be the same number void main() { stdio_init_all(); sleep_ms(5000); double inputLayer[numInputs]; // holds the normalized sensor readings double contextLayer[numContextNodes]; // holds the t-1 context values from outputs 1 & 2 double outputLayer[numOutputs]; // holds the sigmoid function results for outputs 1 & 2 double outputLayerBias[numOutputs]; // holds the two output neuron bias values double sensorWeights[numInputs][numOutputs]; // holds 8 by 2 random weights double contextWeights[numContextNodes][numOutputs]; // holds 2 by 2 random weights randomize(0.24f); contextLayer[0] = init_weight(); // the sigmoid output normally goes here contextLayer[1] = init_weight(); // the sigmoid output normally goes here for (int n=1; n<10000; n++) { // I took out some loops that initialized the arrays for (int j=0; j<numOutputs; j++) { double activation=outputLayerBias[j]; for (int k=0; k<numInputs; k++) { activation+=inputLayer[k]*sensorWeights[k][j]; } for (int l=0; l<numContextNodes; l++) { // loop thru the context nodes and weights activation+=contextLayer[l]*contextWeights[l][j]; } outputLayer[j] = sigmoid(activation); } // done with movement thru NN for (int m=0; m<numContextNodes; m++) contextLayer[m] = outputLayer[m]; // move the sigmoid output to the contextLayer[] // At this point the robot would be allowed to move for 300ms, // at the end of which the fitness function would be recorded. } return; }
The interesting thing about this approach of evolving a neural network is that there is no training set nor back-propagation. The genetic algorithm is basically an optimization technique that uses a fitness function to evaluate the results of a chromosome’s performance. The fittest survive and their children carry the genes forward. Wild!
To err is human.
To really foul up, use a computer.
All that stuff has been of interest to me over the years. Tuning and designing ANNs of various types can be a bit of a black art. I could talk about neural networks (played with coding some) but have no practical application for an Arduino robot. I first read about genetic evolution in Richard Dawkins book "The Blind Watchmaker" where he gave a simple genetic program to evolve biomorphs.
https://www.richarddawkins.net/wp-content/uploads/sites/41/2014/06/Evolution-of-Evolvability.pdf
These produced visual variations rather then variations in sensory/motor behaviors.
I first read about genetic evolution in Richard Dawkins book "The Blind Watchmaker" where he gave a simple genetic program to evolve biomorphs
That's interesting. Did you ever implement the program and create bimorphs?
To err is human.
To really foul up, use a computer.
No I never did my own version of that program. It is really artificial selection as it is the human that selects the biomorphs to breed for the next generation much as we do with our domestic animals and plants.
I do remember playing with some genetic programs.
One thing that fascinates me is self assembly. Biological systems self assemble. They are trying to make little robots that can self assemble.
I’ve been reading about how genetic algorithms use fitness functions and it seems that Dawkins work with biomorphs is the classic example of subjective fitness. Something that an artist might use to create art or music.
Tom
To err is human.
To really foul up, use a computer.
One thing you might evolve is behaviors. A robot has input states that determine the output states. Part of the input state can be internal states (memory) not just the sensory input states. The mental tools I used to understand all this I first learned after reading Ross Ashby's book An Introduction to Cybernetics. It was written for people with limited math skills like biologists who had to study complex systems. Today the principles would come under the umbrella of Systems Thinking.
http://dspace.utalca.cl/bitstream/1950/6344/2/IntroCyb.pdf
In systems thinking a machine is a way of behaving (changing states) which may be mechanical or biological.
So for example you might mutate a set of robot vacuum cleaner's i/o connections and see what happens. The one with the i/o connections that cover more of the floor in the least amount of time is duplicated in the other robots and then they are all mutated (some connections changed) again. In theory they might become better and better at covering the floor area in the least amount of time.
Swarm behavior is an evolved behavior.
Genetic Evolution
Being new to genetic algorithms I picked up a used copy of Goldberg’s book for about 10USD.
D. E. Goldberg. (1989). Genetic algorithms in search optimization and machine learning. Addison-Wesley, Reading, MA.
“Genetic algorithms operate on populations of strings, with the string coded to represent some underlying parameter set. Reproduction, crossover and mutation are applied to successive string populations to create new string populations.”
The book contains Pascal code that covers the high points that Floreano and Mondada write about. Namely,
- Roulette wheel selection
- Linear fitness scaling
- Biased mutations with probability of 0.2
- Single point crossover with probability of 0.1
- Encoding a chromosome with either floating point or binary values
The purpose of a fitness function is to give a numerical value to the utility, profit, or goodness of an individual chromosome. In this case the fitness measures navigation and the avoidance of obstacles as performed by a robot. Floreano and Mondada came up with an interesting fitness function that I describe later.
Roulette wheel selection is a way to visualize the reproductive probability of the individuals in a population. The slots in the wheel represent each individual and the width of the slot is proportional to their reproductive probability (fitness). Spin the wheel and the most fit genomes have a higher probability of being selected.
Goldberg explains that there are two problems with this technique. The first is that if one individual stands above all others in an early generation then that individual will be selected more often and will dominate our small population (premature convergence).
The second problem is when the fittest individual is only slightly better than the average. This turns the search for the fittest into a random walk among the mediocre. The solution for both cases is to scale the fitness values of a population before selection. This will reel in the outstanding individual and spread out the mediocre for a better search of the ‘genetic space’.
It’s pretty impressive how well this works. These two charts are the result of running a fitness scaling exercise in the book. In the run without scaling one individual was selected 9 times from a population of 30. In the fitness scaling run it was selected only twice.
After the fittest are selected they are randomly paired up and their chromosome strings copied over to two children. If crossover is being performed then a random uniform number is used as a crossing point to divide the chromosome. The substrings of the two children are then swapped, see figure 2.
Mutation in their experiment was accomplished by toggling the bits in the chromosome string, figure 3.
All of the 22 neural network parameters were encoded as 4-bit precision binary values concatenated into one long chromosome string. Each individual in a generation had their chromosome decoded and applied to the neural network, with the two outputs used as wheel motor speeds. The robot, after setting the speed of each motor, was allowed to run for 300ms. After which their performance was recorded and stored as a fitness value.
The experimenters used this fitness function.
Phi = V (1 - sqrt(v)) (1 - i)
where 0 <= V <= 1
0 <= v <= 1
0 <= i <= 1
V is the sum of rotation speeds of the two wheels, v is the absolute value of the algebraic difference between the signed speed values of the wheels, and i is the normalized activation value of the infrared sensor with the highest activity. The higher the value the better the performance of the individual.
“These three components encourage - respectively - motion, straight displacement and obstacle avoidance, but do not say in what direction the robot should move.”
Nolfi, S. and Floreano, D. (2000). Evolutionary Robotics: Biology, Intelligence, and Technology of Self-Organizing Machines. MIT Press, Cambridge, MA.
V sounds like meters per second to me and wheel encoders can supply that by counting clicks per time period. This is the formula that I use to determine distance on my swarm robots:
pi * wheel diameter / gear ratio * encoder clicks per revolution; so
pi * 32mm = 100.53 and 75.81 * 12 = 909.7s; for a distance per click of 0.11050653 millimeters.
v uses the absolute value of the arithmetic difference between the two signed wheel speeds. Eg. if the left wheel speed is 0.1 m/s and the right wheel speed is -0.2 m/s then the algebraic difference is 0.3 m/s.
i uses the normalized activation value which is determined by dividing each sensor’s reading by the square root of the sum of the squares of all sensor output.
It’s been an interesting study so far. I’m still baffled as to how a genetic algorithm can find a set of neural network parameters that allow a robot to navigate and avoid obstacles. I get how the history of actions provided by the recurrent connects help with things like getting stuck in a canyon, but the search space just seems so huge…
Tom
To err is human.
To really foul up, use a computer.
That is really all about evolution (learning in the species not the individual). Learning and adaption (relearning) is also a useful function. Being born already knowing what to do is good in terms of brain resources and is found in grazing animals. The extra cost of a brain capable of learning pays off in predators and thus they have a childhood during which they learn things over and above their natural instincts. They are dependent on parenting to keep them alive and teach them things.
Actions need to be goal driven. The robot acts and then checks if the desired result has occurred. If it has not it needs to change its actions. A sensor may fail or a motor may malfunction so other means might be applied to achieve the goal. A mouse may use its legs to navigate a maze but tie its legs up and it will roll around the maze to find the cheese (the goal).
While working on my swarm robot sensor subsystem I was struck by how much hand crafting went into programing a robot controller. The motors never rotate at the same speed, no two sensors respond to stimuli in the exact same way,..
One of my roboticist heroes, Dario Floreano, has written about using genetic algorithms to evolve neural networks in his robotics research. One of his early experiments was in solving a familiar robot task, i.e. getting a robot to move and avoid obstacles. A task that has come up many times in my work with robots. His idea was to use a “simple” feedforward neural network to map IR proximity sensors to the two motors of a differential drive robot and to use genetic algorithms to derive the fittest individual to perform the task. Wow! All new territory for me, but I’m hooked.
First on my list was to understand what a feedforward neural network with recurrent connections is and to get more details on how it was implemented in his experiment. Lots of jargon to learn.
Automatic Creation of an Autonomous Agent: Genetic Evolution of a Neural-Network Driven Robot (1994), Floreano, D., Mondada, F.
I'm finding your thread here very interesting and I would like to learn more. When I hear AI or neural-network, I see the typical branching logic trees. I would like to get to the level of obstacle avoidance, face or even person recognition, or voice command recognition. I have some ESP32-S3's coming today (I hope) and they seem to make a big deal about the vector processor being able to do this last topic on the board.
With the 20-20 hindsight that you now have, do you have say... a list of links to topics or YT that you'd now recommend in an order that makes sense to a complete novice? OR... do you still recommend the order that you have in this thread?
Thanks!
VBR,
Inq
3 lines of code = InqPortal = Complete IoT, App, Web Server w/ GUI Admin Client, WiFi Manager, Drag & Drop File Manager, OTA, Performance Metrics, Web Socket Comms, Easy App API, All running on ESP8266...
Even usable on ESP-01S - Quickest Start Guide
Well @inq, I don’t know what to tell you. I don’t have a methodology more like a curiosity about a few Roboticists.
Tom
To err is human.
To really foul up, use a computer.
So are you going to try to implement evolving obstacle avoidance in one of your swarm robots?
https://www.sciencedirect.com/science/article/pii/S1877050916308808
"This paper proposes an evolutionary algorithm for autonomous robot navigation. The results obtained in simulation enabled the development of new strategies. The simulator also made possible rapid evaluation of different parameters such as different mutation rates, reproduction, and selection strategies. The simulator produced a satisfactory collision-free behaviour on average after 15 generations. The solution developed by our system was embedded in a real robot and experiments demonstrated that this embedded system was able to successfully complete the autonomous navigation task in a real arena."
https://www.sciencedirect.com/science/article/pii/S1877050916308808
"This paper proposes an evolutionary algorithm for autonomous robot navigation. T
Did you happen to notice the first reference that they cite for this article?
Tom
To err is human.
To really foul up, use a computer.
@thrandell wrote:
"Did you happen to notice the first reference that they cite for this article?"
No. But I now see it is the author of the book you referenced.
It might be of interest to others if you can code such an algorithm in one of your swarm robots.
"I took out the initialization and print statements for brevity."
I like complete code I can run with print outs of what it happening.
Execution
So I had a bit of an epiphany after re-reading Floreano’s publications on their experiment and what they meant by ‘actions’.
My initial understanding of how the genetic algorithm and neural network worked together was wrong. The fitness of a chromosome is a measurement of how much the robot improves its situation after starting from the parameters encoded in the chromosome and completing 80 ‘actions'. An action entails reading the IR sensors, applying them and the recurrent output values to the neural network, running the motors for 300 ms, and collecting V, v, & i. After 80 ‘actions’ the V, v, & i values are averaged and plugged into the fitness function. This makes sense, right? When it’s all said and done the robot starts with a static set of neural network parameters and the only things to guide it are the changes to the IR sensor readings and the recurrent connection values.
Anyway, this is the way I believe the experiment was conducted:
- Start with an individual’s chromosome which is encoded with 16 sensor synapses, 4 recurrent connections and 2 output bias weights.
- The robot pulls in sensor input, normalizes it and using it along with the synaptic weights specified in the chromosome the neural network does its magic to determine the motor’s direction and speed.
- The robot is allowed to run with these settings for 300ms after which the wheel velocity, straight displacement, and highest sensor activation are recorded. After this initial action the IR sensors are read again and the recurrent connection values are feed back into the network. This action is repeated for another 79 times after which the data is summed, averaged, and applied to the fitness function.
- In the early populations a pair of random speeds was applied for 5 seconds to reposition a robot that was probably stuck or facing a wall when the individual was changed (repositioning strategy).
- After all individuals in the population have run, the genetic algorithm takes over to select the fittest individuals. They are paired up and crossover and mutation functions are run to produce two children that will replace their parents in the population.
- This is repeated for 100 generations or until fitness convergence is reached or the fitness stops improving.
The experiments parameters:
Population size |
80 |
Generations number |
100 |
Crossover probability |
0.1 |
Mutation probability |
0.2 |
Mutation range |
+/-0.5 |
Initial weight range |
+/-0.5 |
Final weight range |
not bounded |
Life length |
80 actions |
Action duration |
300ms |
Their results:
Tom
To err is human.
To really foul up, use a computer.