Hi all

I wonder if this is allowed, but this isn't really a question. My apologies if it's not. I used to use www.python-forum.org and now their down for over a week as the site was bought by the members and were planning re-renovations, so hoping that's why. Anyways, there it was ok to post and update your projects in a new thread in the "General" or "Bar" sections.

This is my project. I was trying to solve a projecteuler problem related to the ulam spiral which involved the integers that lay on the diagonals of the spiral. After some videos to recap on how to find a sequence function (interesting memories of high school ) I managed to derive a function for one diagonal but the second one had a problem. Anyways long story short... I saw that Mathematica has a module or function or whatever for this called FindSequenceFunction() that attempts to find a simple mathematical function for a given sequence. I thought that would be cool but couldn't find it in Python.

So I've also been interested in Evolutionary Computation, specifically for now, Genetic Algorithms and Genetic Programming. I've used a GA once to optimize a small set of metric effectively and thought I would try and step it up a level. So I'm going to try and use a GA to implement the find_sequence_function().

I've been thinking about and working on this intermittently for a few days so this is my progress so far. These two blocks are from the same script but I separated them here.

Design

Code:Goal: ----- find_sequence_function(seq, runtime) [0, 1, 4, 9, 16] --> 'x^2' Tries to find the sequence function for the given sequence Concept: -------- Putting a mathematical expression in form suitable for the GA. infix notation: 2 + 10x + x^2 prefix notation: (arity of 2) + 2 + * 10 x ^ x 2 like lisp: still with arity of 2 (+ 2 (+ (* 10 x) (^ x 2))) In a python list form: ['+', 2, ['+', ['*', 10, x], ['^', x, 2]]] How expressions get evaluated operator func / \ or / \ / \ / \ operand operand arg arg Eg. evaluate(['+', 2, ['+', ['*', 10, x], ['^', x, 2]]]) add / \ / \ 2 add / \ / \ mul pow / \ / \ / \ / \ 10 x x 2

Prototype (unfinished)code

python Code:from random import randint, choice, shuffle from time import clock from copy import deepcopy import pickle # ----------------------------- Operators ------------------------------------| add = lambda a, b: a + b sub = lambda a, b: a - b mul = lambda a, b: a * b div = lambda a, b: float(a) / b # pow --> builtin symb_to_op = {'+':add, '-':sub, '*':mul, '/':div, '^'<img src="http://images.devshed.com/fds/smilies/tongue.gif" border="0" alt="" title="Stick Out Tongue" class="inlineimg" />ow} op_to_symb = dict([(symb_to_op[symb], symb) for symb in symb_to_op]) # ---------------------- Expression Functions --------------------------------| def evaluate(expr, x): ''' evaluate(['+', 1, 'x'], 2) --> 3 ''' if type(expr) is int: return expr else: try: op, a, b = symb_to_op[expr[0]], expr[1], expr[2] a = evaluate(a, x) if type(a) is list else (x if a is 'x' else a) b = evaluate(b, x) if type(b) is list else (x if b is 'x' else b) if a is 'er' or b is 'er': return 'er' elif (a > 1000 or b > 1000) and expr[0] == '^': return 'er' elif (a > 1000000 or b > 1000000) and expr[0] == '*': return 'er' else: return op(a, b) except (ZeroDivisionError, OverflowError, ValueError): ## print 'error!'###debugline return 'er' def infix(expr): ''' ['+', 1, 'x'] --> '(1 + x)' ''' symb, a, b = expr[0], expr[1], expr[2] a = infix(a) if type(a) is list else a b = infix(b) if type(b) is list else b return '(%s %s %s)' % (a, symb, b) ##def prefix(expr): ## ''' ## '(1 + x) --> ['+', 1, 'x'] ## ''' ## # ? ## return def random_expr(max_depth=6): if max_depth == 0: return [choice(['+', '-', '*', '/', '^']), randint(1, 5), randint(1, 5)] else: return [choice(['+', '-', '*', '/', '^']), choice([random_expr(max_depth-1), randint(1, 5), 'x']), choice([random_expr(max_depth-1), randint(1, 5), 'x'])] def test_expr(expr, max_time=1e-5): ''' tests for viable expr; returns True if passed else False ''' start = clock() a = evaluate(expr, 1.1) end = clock() if end - start > max_time: # instances where max_time exceeded due to outside processes # that have nothing to do with the expression will have to be # considered collatoral or waste for the sake of efficiency. return False else: pass b = evaluate(expr, 1.23456) if a == b: # either 'x' is not in expression or operation on x nullified # since 1.123456 and 1 for x should not produce the same result. return False else: return True def max_depth(expr, cur_depth=0): ''' ['+', 1, ['-', 2, 3]] --> 1 ['+', 1, 2] --> 0 returns depth of nesting in expression ''' if type(expr[1]) is list: a = max_depth(expr[1], cur_depth + 1) else: a = cur_depth if type(expr[2]) is list: b = max_depth(expr[2], cur_depth + 1) else: b = cur_depth return a if a > b else b def get_any_list_at_depth(expr, depth): ''' get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 0) --> ['-', 2, ['*', 3, 4]] get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 1) --> ['*', 3, 4] get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 2) --> False ''' if type(expr[1]) is list: if depth > 0: a = get_any_list_at_depth(expr[1], depth - 1) else: a = expr[1] else: a = False if type(expr[2]) is list: if depth > 0: b = get_any_list_at_depth(expr[2], depth - 1) else: b = expr[2] else: b = False if not a and not b: return False elif not b: return a elif not a: return b else: return a if randint(0, 1) else b #-------------------- Genetic Algorithm Functions ----------------------------| def select(popul, n): ''' returns n individuals from population uses roulette wheel selection method according to individual fitness wheel contains integer corresponding to indexes of popul list. ''' popul.sort() wheel = [] # find min score for elite group m = popul[0][0] for i in range(len(popul) / 2): if popul[i][0] < m: m = popul[i][0] # find minimized total of elite group total = 0 for i in range(len(popul) / 2): score = popul[i][0] - m total += score # add elite group to wheel for i in range(len(popul) / 2): score = popul[i][0] - m perc = int((score / total) * 100) ## wheel += [i] * perc###debugline wheel += [i] * (perc * (len(popul) / 100)) print 'wheel:', len(wheel), 'popul:', len(popul)###debugline # add buffer to wheel wheel += range(len(popul)) print 'wheel:', len(wheel), 'popul:', len(popul)###debugline return###debugline ## # 1. Create elite population ## elite = deepcopy(popul[:len(popul) / 2]) ## ## # 2. Minimize fitness scores in elite. ## m = min(elite)[0] ## for ind in elite: ## end[0] -= m ## ## # 3. Index Wheel for elite group. ## wheel = [] ## total_score = sum([n[0] for n in elite]) ## for ind in elite: ## ind_perc = ## total_score = 0.0 ## for ind in popul: ## try: ## total_score += ind[0] ## except OverflowError: ## ind[0] = 0 ## index_wheel = [] ## for i in range(len(popul)): ## ind_perc = ind[0] / total_score ## print '\tind_perc:', ind_perc ## index_wheel += [i] * int(ind_perc * 100) ## diff = 100 - len(index_wheel) ## print 'wheel length:', len(index_wheel) #### print diff###debugline ## return###debugline ## # Padding ## index_wheel += range(len(popul)) ## return chosen def combine(pair, mutation_prob=2): ''' combine([[score, expr], [score, expr]]) --> [[None, expr], [None, expr]] picks a random mutual depth and makes one random operand swap between parents original parents not affected mutation_prob --> 1 out of n chances. ''' expr1, expr2 = deepcopy(pair[0][1]), deepcopy(pair[1][1]) # gage depth of both expressions to determin max mutual depth depth = max_depth(expr1), max_depth(expr2) depth = depth[0] if depth[0] < depth[1] else depth[1] ## print 'max depth:', depth###debugline depth -= 1 # choose a random depth in range of max depth swap_point = randint(-1, depth) ## print 'swap point:', swap_point###debugline # make one swap between two random operands at swap_point. if swap_point > -1: expr1_swap_point = get_any_list_at_depth(expr1, swap_point) expr2_swap_point = get_any_list_at_depth(expr2, swap_point) else: ## print "!!"###debugline expr1_swap_point = expr1 expr2_swap_point = expr2 operand1 = randint(1, 2) operand2 = randint(1, 2) # mutation if not randint(0, mutation_prob-1): mut_expr = choice([expr1_swap_point, expr2_swap_point]) mut_operand = randint(1, 2) if type(mut_expr[mut_operand]) is list: pass elif mut_expr[mut_operand] is 'x': pass elif type(mut_expr[mut_operand]) is str: mut_expr[mut_operand] = choice(['+', '*', '-', '/', '^']) else: # is int mut_expr[mut_operand] += randint(0, 5) else: pass expr1_swap_point[operand1], expr2_swap_point[operand2] = deepcopy(expr2_swap_point[operand2]), deepcopy(expr1_swap_point[operand1]) return [[0, expr1], [0, expr2]] def fitness(ind, target): ''' fitness([score, expr], target) --> [score, expr] ''' # get a sequence from expr starting at x = 0 trash, expr = ind seq = [evaluate(expr, x) for x in range(len(target))] # check the difference in shape of the curve between target and seq curve_dif = [seq[i] - target[i] if seq[i] != 'er' else 'er' for i in range(len(seq))] # replace 'er' with worst case i.e max difference mx = max([n for n in curve_dif if type(n) is not str]) curve_dif = [mx if i == 'er' else i for i in curve_dif] # find average difference and subtract from all to normalize curve_dif try: av = float(sum(curve_dif)) / len(curve_dif) except OverflowError: av = sum(curve_dif) / len(curve_dif) norm_curve_dif = [i - av for i in curve_dif] # CURVE/SHAPE DIFFERENCE total_curv_dif = sum(norm_curve_dif) # ANGLE OF CURVE DIFFERENCE total_angle_dif = sum(map(abs, norm_curve_dif)) # RAW/MAGNITUDE DIFFERENCE total_raw_dif = sum(map(abs, curve_dif)) return [(total_curv_dif * metrics[0]) + (total_angle_dif * metrics[1]) + (total_raw_dif * metrics[2]), expr] def genetic_algorithm_1(target, a, b, c, num_results=1, max_time=60): ''' a: population size b: offspring per family c: ratio of parents to offspring for next pop selection ''' a = a if a % 2 == 0 else a + 1 b = b if b >= 2 else 2 results = [] # 1. Create initial population of a individuals. # popul -> [[score, expr], ... ] popul = [] while len(popul) < a: expr = random_expr() if expr and test_expr(expr): popul.append([0, expr]) # 2. Test fitness of individuals. popul = [fitness(ind, target) for ind in popul] popul.sort() start = clock() generations = 0###debugline while len(results) < num_results and clock() - start < max_time: # 3. Put population into random pairs. parents = deepcopy(popul) shuffle(parents) parents = zip(parents[:a/2], parents[a/2:]) # 4. Combine each pair to produce b children. offspring = [] for pair in parents: for children in range(b): child1, child2 = combine(pair) offspring.append(child1) offspring.append(child2) ## return offspring###debugline # 5. Test fitness of offspring offspring = [child for child in offspring if test_expr(child[1])] offspring = [fitness(child, target) for child in offspring] offspring.sort() # 6. Select (a * c) parents. popul = popul[:int(a*c)] select(popul, int(a*c))###debugline ## popul = select(popul, int(a*c)) # 7. Select (a - (a * c)) offspring. popul += offspring[:a - int(a * c)] ## popul += select(offspring, a - int(a * c)) select(offspring, (a - int(a*c))) popul.sort() # check for suitable results generations += 1###debugline return popul, generations#results # -------------------------------- Metrics -----------------------------------| try: with open('metrics.pkl', 'r') as f: metrics = pickle.load(f) except IOError: print "GA metrics not found! loading default." metrics = [1, 1, 1] # --------------------------------- Main -------------------------------------| def find_sequence_function(seq): ''' [0, 1, 4, 9, 16] --> 'x^2' ''' # ??? return def tune_metrics(runt_time=60): ''' Runs a simple GA for stochastic optimization of metrics ''' global metrics # ? return # ----------------------------- Workbench -----------------------------------| target = [(2 + (10*x) + (x**2)) for x in range(10)] popul = [[0, random_expr()] for i in range(500)] popul = [fitness(i, target) for i in popul] save = deepcopy(popul) a = select(popul, 300) ##targets = [('x * 2', [x*2 for x in range(10)]), ## ('(x * 2) - 1', [(x*2)-1 for x in range(10)]), ## ('2 + (10*x) + (x**2)', [(2 + (10*x) + (x**2)) for x in range(10)])] ##targets = [('2 + (10*x) + (x**2)', [(2 + (10*x) + (x**2)) for x in range(10)])] ##bucket = [] ## ##for target in targets: ## bucket.append(genetic_algorithm_1(target[1], 500, 5, 3/10.0, max_time=2)) ## result = bucket[-1][0][0][1] ## print '-'*20 ## print 'expression:', target[0] ## print 'GA result: ', infix(result) ## print target[1] ## print [evaluate(result, x) for x in range(10)] ##pair = [[0, ['+', 'a', ['+', 'a', ['+', 'a', 'a']]]], ## [0, ['+', 'b', ['+', 'b', ['+', 'b', 'b']]]]] ## ##print 'parent1', pair[0][1] ##print 'parent2', pair[1][1] ##for i in range(10): ## print '\n\n' ## child = combine(pair) ## print '\n' ## print pair[0][1] ## print child[0][1] ## print '\n' ## print pair[1][1] ## print child[1][1] ##pair = [[0, ['+', 'a', ['+', 'a', 'a']]], ## [0, ['+', 'b', ['+', 'b', ['+', 'b', 'b']]]]] ## ##print '-' * 50 ##print 'parent1', pair[0][1] ##print 'parent2', pair[1][1] ##for i in range(10): ## print '\n\n' ## child = combine(pair) ## print '\n' ## print pair[0][1] ## print child[0][1] ## print '\n' ## print pair[1][1] ## print child[1][1] ##def timer(expr, runs): ## 'returns time to evaluate the expression' ## start = clock() ## for run in xrange(runs): ## evaluate(expr, 7) ## end = clock() ## return (end - start) / runs ##samples = [random_expr() for expr in range(100)] ##samples = [expr for expr in samples if test_expr(expr)] ##samples = [expr for expr in samples if evaluate(expr, 2) != evaluate(expr, 7)] ##for i in range(100): ## samples = [random_expr() for expr in range(100)] ## samples = [expr for expr in samples if test_expr(expr)] ## samples = [expr for expr in samples if evaluate(expr, 2) != evaluate(expr, 7)] ## print '\t', len(samples)

I think it's coming together quite well so far. There is one problem I've identified though. Every 5 or so times I run it it will either hang or it will produce one of the errors that are supposed to be handled in the try except block. The try except does seems to be doing it's job other than the one or two occasions where it just seems not to. Still trying to find out why that might be, if anyone knows please feel free to post, or with suggestions, feedback, criticism, etc.

log

I almost wrote clog for code log but ... you can see how that wouln't be very catchy

Mon | 12-11-2012

- added max_operand=200 argument to evaluate() as temporary fix to prevent overly large calculations hanging the machine.

- refactored test_expr() to also test if 'x' exists in exression and moved off workbench.

- started on code for genetic_algorithm.

Tues | 13-11-2012

- Got little more progress on genetic_algorithm(), almost complete.

- Started thinking about fitness() function.

- Made max_depth() function but wondering if there's a better way.

- Started on combine() function and a little stuck on a nice way to impliment it :s

* obviously didn't have allot of free time today or yesterday. Will use the time away from screen to try and refactor ideas rather than code.

* Wondering if this will help get me closer to future job as a developer... or not, still fun though

Wed | 14-11-2012

- Fixed max_depth() function.

- created get_any_list_at_depth() function (long name I know, it's temporary)

- Completed combine function. Shew that was tricky... and a bit messy.

managed to do it thanks to list pointers.

- removed max_operand=200 parameter from evaluate() and added some conditions in the function

instead that gaurd againsts excessive multiplication and power operations appropriately.

- Couldn't sleep so took a look at fitness function and actually put something together.

Now all I have to do is create the selection function which shouldn't be difficult at all, and finish up the GA

so I can do some tests and finally see some results... if any. Will use another GA (I know ga crazy in here) to

find the right settings for the three metrics used in the fitness function for:

curve shape difference

curve angle difference

raw sequence difference

* Note to self: Brute force approaches in the form of GA's can be more work than they seem

Thur | 15-11-2012

- Got GA to a point were it can at least return some results.

- Thought I had completed fitness() function last night, not really. Had much more free time today and it's only 1 in the afternoon, but most of the time went to debugging.

- Still haven't implemented any kind of mutation and metrics are still at [1, 1, 1]

but managed to get something working with just a random seed population and crossover operations and the fitness function I came up with last night.

Last one was getting thereCode:GA metrics not found! loading default. -------------------- expression: x * 2 GA result: (1 * (x + x)) [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] -------------------- expression: (x * 2) - 1 GA result: ((x + x) - 1) [-1, 1, 3, 5, 7, 9, 11, 13, 15, 17] [-1, 1, 3, 5, 7, 9, 11, 13, 15, 17] -------------------- expression: 2 + (10*x) + (x**2) GA result: ((4 + (x + 4)) * (x + 1)) [2, 13, 26, 41, 58, 77, 98, 121, 146, 173] [8, 18, 30, 44, 60, 78, 98, 120, 144, 170]

Each of the three tests were run for 60 seconds and the population returned in sorted order of fitness. The first individual is taken to print the results above. Remember this is nothing but a random seed population, a simple fitness function that takes three different measurements and a crossover function. Quite pleased.

...later...

- Threw in some random mutation in the combine() function.

- Identified a problem with my algorithm. It's being affected by extreme elitism, causing the resulting population after a few generations to consist of clones of the same individual. In other words, it's been infected by Agent Smith. Will refacter GA1 to incorporate the rouletter wheel selection method while still being monogamous, if that still makes a difference.

Frid | 16-11-2012

Didn't do much today. Tried to impliment selection function which was supposed to be one of the easier tasks but failed twice. At this point due to the number of bugs I'm thinking of how the entire thing can be refactored. Not sure if classes would be suitable or what would be the best way to use them in this situation. Back to drawing board I suppose.

Teus | 20-11-2012

Haven't given up, but now that I'm re-factoring everything with classes I've decided to use the opportunity to finally learn some OOP with python. It's coming together.

You can find the work in progress here.

Tweet This+ 1 thisPost To Linkedin