November 3rd, 2012, 01:32 AM
Using a Genetic Algorithm to Write a Program
This is just a concept right now, I haven't started on it. But that's because I am having difficulties with the conceptualization step.
Lately I have become fascinated with Genetic Algorithm (GA) programs. They are an evolution-based algorithm that stores a 'string of dna' and then does a 'fitness test' to compare the 'dna' to others. Then it attempts to reproduce, mutate, or otherwise optimize the best parent DNA into children DNA, and repeat the process.
I wrote a GA to calculate the shortest route, using the four most basic math operators, to turn x and y into z. For example, if x is 2 and y is 5 and z is 10, the program would return something like x*y=z or y+y=z, not x*y+x-x*y+x+x+x+x=z (which might actually be a first generation result). If you want to read more about that check out my blog eliskan.wordpress.com
Anyways, after experimenting with that, I really want to attempt to write a self-reproducing Genetic Algorithm program. It would do is reproduce a function, with new scripting, test the new program and see if it produces a better result in quicker time.
My big problem is finding out exactly how something like this would be achieved. I mean you can't make a program by 'random' so it would need a way of understanding the syntax without producing major errors or infinite loops. Please note this is just an experiment, I just want to learn. Ohh and I want to make Skynet.
edit: I have found one thread discussing exactly this topic on a different site : stackoverflow.com/questions/5732917/code-generation-by-genetic-algorithms
Right now I am looking over the thread it seems to have some solid ideas. Any further input is appreciated.
edit2: Further consideration leads me to conclude this will be very difficult, but I think the approach in the thread I linked above is a bit crude. They do mention however that making a program write a program usually throws errors, but I think a try/except function could help with that.
I am thinking that you could set the fitness up like this:
if program simply does not work by returning exception or failing a time test: fitness=0
if program works: fitness = some combination of speed of program+length of text in program+results of programs algorithm
Ideally, to write a 'Hello World' program, you could make a bunch of intentionally long and over-done Hello World programs, then run a genetic algorithm to have it filter it down to the shortest and quickest program to output "Hello World". This could be taken to the extremes in a lot of programs so it's something I am highly interested in.
An approach might have it be able to recognize variables and certain chunks of code, then use those as chromosomes. In the Hello World code, for example, it should recognize that the string "Hello World" is constant in all the provided parents and that will be included in all the children, most likely after "print"
Other things are also constants, but in those grey areas the program should be willing to experiment.. within reason. It shouldn't just start tossing random letters together. It should know 'what is required next, according to python rules...'. Which in itself is incredibly complicated but not impossible.
/tldr: This is tricky!
November 3rd, 2012, 05:49 PM
A computer has a great many states, few of which are useful.
I agree with you.
The program would need a syntax-repair algorithm that runs between the GA and compilation or execution. Cornell had an experimental PL/1 compiler that fixed your program. Obviously, it didn't "fix your program by intent". The concept was that it would be useful for education since the program would always produce some output. We used punch cards and the turn-around time from card reader to line printer was pretty quick unless assignments were due... Imagine, 4 years earlier I used APL interactively at an IBM selectric terminal in high school. Quite a regression!
A program can run in an environment with a time limit. You can trap long running programs or those with infinite loops.
Semantics: does the program map inputs to the correct result?
Yeah, even with a whole mess of constraints the space to explore is enormous. Maybe Skynet is easier.
[/code] are essential for python code and Makefiles!
November 3rd, 2012, 06:52 PM
Thank you for the informative and thoughtful post. It gave me an idea for the beginnings of my project.
I am going to attempt to make a 'function writer'. To keep it simple (at first), I am going to just try to manipulate numbers similar to my earlier GA but using functions like cos and abs and all that stuff I barely understand. Later I will attempt to include if/else statements and other programming syntax. So the function writer should make a bunch of simple funcs along the lines of:
Each line could be a chromosome. When reproducing however it would have to 'know' certain things. It would need to know, for instance, that to type "return r" requires information in the r variable. That is where it gets tricksy but again not impossible.
Of course, fitness would be based by the length of the program and the time it takes to run, as well as how close its output is to the expected result.
If that works and I'm able to reproduce two functions, that is the first step. Then the next step will be including more complex operations and some of the cooler things that python programming language has to offer.
We'll see how this goes. It is incredibly ambitious but I do enjoy learning about genetic algorithms quite a bit In my opinion, a self-optimizing Genetic Program will be the thing of the future.. Once people can figure out how to do it!
November 4th, 2012, 04:43 AM
Ok the framework has begun. I have progressed to making the program create actual scripts that work(with extremely limited syntax), and can find semi-optimal codes. It's so far only a hill-climber, and very limited, but it can do if/else statements.
The function-creator output looks something like this, though each one is highly randomized and usually very inefficient:
The first two and last lines are required. Also as I said syntax is incredibly limited with if/else and basic math, I am working my way into more of it but before I continue I have to work on the next stage - fitness and reproduction!
if not x!=result:
Fitness I have figured out. It's essentially length and the result of the script combined... Reproduction is the hard part. Where to begin and end a 'chromosone' is not as simple as linebreaks.
I am thinking that an if/else statement could be considered one entire gene, but that the brackets within could be subject to mutation and variation. In fact, every major 'statement' or block of code should be considered a gene.. like when you're in python interpreter and you type something out, it doesn't respond until it knows the block of code is complete.
Then, through mutation and further reproduction you could eliminate annoying artifacts in the code and allow for actually reaching this goal.
What do you guys think? How should I approach fitness and reproduction? And if SkyNet's possible, do you think they will have mercy on their creator?
November 4th, 2012, 01:34 PM
This project doesn't interest me, there are faster ways to trim away useless code (gcc reports statements that have no effect). I am interested in your GA description but haven't yet finished reading it.
I've learned, I hope correctly, that when given a choice determinism beats random chance.
I studied neural nets. Neural nets map input to output but you don't, without a lot of work, understand the function that you've "trained". Explicit deterministic Bayes nets---I'm quite sure this is what google uses for language translation work quite well and are set up in one direct pass through the data.
[/code] are essential for python code and Makefiles!
November 4th, 2012, 10:40 PM
Yes, GA seems at first like 'monkeys mashing on a keyboard' but very quickly it begins to narrow in on the best results because of the fitness tests. Only the strongest, most fit results survive, like hill climbing. What makes GA so unique is the ability to also mutate randomly and/or reproduce with other fit results. This allows it to work around the local optimum that hill-climbing algorithms always face and hopefully find the global optimum.
Originally Posted by b49P23TIvg
I have never played with neural nets but it does sound interesting. I will have to look into it
For more reading on writing a genetic algorithm, check out this useful post: www.puremango.co.uk/2010/12/genetic-algorithm-for-hello-world/
November 4th, 2012, 10:56 PM
I agree that genetic algorithms are useful. Heck, I know a guy who used a GA to find materials, layer count, and layer thicknesses to design an optical filter. Simulated annealing is another good algorithm, which (I think) has a similar ability to climb out of a local minimum. (that is, I know simulated annealing can, unsure of GA---except that it retains a large population so it seems like it must be less likely to get stuck in local minima.)
I'm saying that if you want to eliminate dead code, a direct approach is a lot faster than a GA approach. Maybe I'm just re-enforcing the concept that along with expression and statement choice by GA comes a whole lot of additional deterministic processing.
[/code] are essential for python code and Makefiles!
November 4th, 2012, 11:05 PM
I agree with you. GA is a very labor intensive process. Every function in my populations (i usually start with 100000 but end up with only about 1000 population) needs to be tested and that alone takes up quite a chunk of cpu.
Originally Posted by b49P23TIvg
Honestly I am just doing this to learn more about how GA work and to stretch the limits of what I personally am capable of. I'm really an amateur programmer and this stuff is (or should I say - was) out of my league. But you really don't improve if you don't attempt! Nothing ventured, nothing gained
GA can be very ineffective. In earlier versions of my first GA, I would need to work with populations of 100+, and do the mutation loop on each individual 1000 times, as well as regular reproduction, to end up with a nice result. That would end up taking about 5 minutes for 1 'optimized' result. Still, that's not too bad, considering how long it might take a human to find the same result.
After tweaking the fitness and learning that higher populations is better than higher mutations, I was able to dramatically trim down processing time.
This is perhaps the biggest problem with writing a GA. It's easy to set up a fitness function, but applying proper weights for it to wind up where you want it can be difficult. This is even more obvious when you don't have an easy way to calculate fitness. For instance, when making music or images. People have made 'human-interactive GA' programs before but it takes ages compared to simulations.
All in all, the field is very interesting and I want to keep improving!
November 5th, 2012, 05:11 AM
I implemented what I thought was a genetic algorithm for a tetris AI a few months back but later learned it was just stochastic optimization, you know, randomly adjusting a hand full of metrics on a hard coded ai function. Although it worked better than I hoped.
I tried a second time to use a full GA approach and failed on that attempt.
While lamenting on that a bit I had a change of thought and I think it's true that GA might not be the best candidate for AI but I think it could be used as one of a bunch of AI techniques together to arrive at something really useful.
This is how I look at it. We have to start with a goal. The ultimate goal is intelligence, possibly consciousness if that can even be defined. Every thing else is a precursor to that. GA and evolutionary programming seems to be modeled after the biological process of evolution. That is, the precursor to complex organisms and also the brain. Once you step inside the domain of what goes on in the brain, you can no longer just assume the same principles of biological evolution. It becomes obvious that there is a powerful process of learning taking place inside the brain that may have allot to do with the genetic make up of the organism, but the definition of what learning is goes beyond the genes that construct the matter of the brain.
I found this video on youtube: The Future of Robotics and Artificial Intelligence (Andrew Ng, Stanford University, STAN 2011)
Anywho, neural networks are still a little difficult for me to understand so I'm gonna give my tetris AI one more shot with a GA using a technique similar to yours.
But maybe a really good AI will combine both GAs and neural networks.
November 5th, 2012, 05:08 PM
Originally Posted by russ123
I have found an interesting paper discussing a tetris AI using GA a couple weeks ago. Now that you mention it, I have to share! It seems the guy had attempted to make it with a normal code first before resorting to a GA, which actually just changed the weights of his original code.
The resulting AI is very impressive and fast. If this is you that wrote the blog, let me know!
As you can see, Genetic Algorithms can even be used to find optimal weights within a non-genetic program. It also demonstrates that the guys original program actually would have worked if he gave his AI very unexpected weights, like giving bonuses for making blockades and punishing getting clears - two ideas that go against what we would consider the 'standard' way to play tetris.
In this way a Genetic Algorithm could even be used to optimize an existing program by giving it better values to produce your desired results.
November 5th, 2012, 05:57 PM
I remember seeing that article too. I didn't really finish reading it though cos I had already did that and was looking for a different approach. But it's definately not my blog, I'm to much of an imbecile to have a blog. I found this approach interesting:
Originally Posted by eliskan
Applying the PageRank Algorithm to TETRIS Stacking
But I found it used too much memory so never really bothered getting it working.
This is my one