Forums: » Register « |  Free Tools |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support |

New Free Tools on Dev Shed!

#1
November 11th, 2012, 02:21 PM
 russ123
Registered User

Join Date: Nov 2012
Posts: 19
Time spent in forums: 8 h 21 m 37 sec
Reputation Power: 0
Project | find_sequence_function() for python

Hi all

I wonder if this is allowed, but this isn't really a question. My apologies if it's not. I used to use www.python-forum.org and now their down for over a week as the site was bought by the members and were planning re-renovations, so hoping that's why. Anyways, there it was ok to post and update your projects in a new thread in the "General" or "Bar" sections.

This is my project. I was trying to solve a projecteuler problem related to the ulam spiral which involved the integers that lay on the diagonals of the spiral. After some videos to recap on how to find a sequence function (interesting memories of high school ) I managed to derive a function for one diagonal but the second one had a problem. Anyways long story short... I saw that Mathematica has a module or function or whatever for this called FindSequenceFunction() that attempts to find a simple mathematical function for a given sequence. I thought that would be cool but couldn't find it in Python.

So I've also been interested in Evolutionary Computation, specifically for now, Genetic Algorithms and Genetic Programming. I've used a GA once to optimize a small set of metric effectively and thought I would try and step it up a level. So I'm going to try and use a GA to implement the find_sequence_function().

I've been thinking about and working on this intermittently for a few days so this is my progress so far. These two blocks are from the same script but I separated them here.

Design
Code:
```
Goal:
-----

find_sequence_function(seq, runtime)

[0, 1, 4, 9, 16] --> 'x^2'

Tries to find the sequence function for the given sequence

Concept:
--------

Putting a mathematical expression in form suitable for the GA.

infix notation:

2 + 10x + x^2

prefix notation: (arity of 2)

+ 2 + * 10 x ^ x 2

like lisp: still with arity of 2

(+ 2 (+ (* 10 x) (^ x 2)))

In a python list form:

['+', 2, ['+', ['*', 10, x], ['^', x, 2]]]

How expressions get evaluated

operator                  func
/  \          or        /  \
/    \                  /    \
operand    operand          arg    arg

Eg. evaluate(['+', 2, ['+', ['*', 10, x], ['^', x, 2]]])

/   \
/     \
/   \
/     \
mul       pow
/  \       /  \
/    \     /    \
10     x   x      2

```

Prototype (unfinished)code

python Code:
 Original - python Code
```from random import randint, choice, shufflefrom time import clockfrom copy import deepcopyimport pickle   # ----------------------------- Operators ------------------------------------| add = lambda a, b: a + bsub = lambda a, b: a - bmul = lambda a, b: a * bdiv = lambda a, b: float(a) / b# pow --> builtin symb_to_op = {'+':add,              '-':sub,              '*':mul,              '/':div,              '^':pow} op_to_symb = dict([(symb_to_op[symb], symb)                   for symb in symb_to_op])   # ---------------------- Expression Functions --------------------------------| def evaluate(expr, x):    '''    evaluate(['+', 1, 'x'], 2)  -->  3    '''    if type(expr) is int:        return expr    else:        try:            op, a, b = symb_to_op[expr[0]], expr[1], expr[2]            a = evaluate(a, x) if type(a) is list else (x if a is 'x' else a)            b = evaluate(b, x) if type(b) is list else (x if b is 'x' else b)            if a is 'er' or b is 'er':                return 'er'            elif (a > 1000 or b > 1000) and expr[0] == '^':                return 'er'            elif (a > 1000000 or b > 1000000) and expr[0] == '*':                return 'er'            else:                return op(a, b)        except (ZeroDivisionError, OverflowError, ValueError):##            print 'error!'###debugline            return 'er'  def infix(expr):    '''    ['+', 1, 'x']  -->  '(1 + x)'    '''    symb, a, b = expr[0], expr[1], expr[2]    a = infix(a) if type(a) is list else a    b = infix(b) if type(b) is list else b    return '(%s %s %s)' % (a, symb, b)  ##def prefix(expr):##    '''##    '(1 + x) --> ['+', 1, 'x']##    '''##    # ?##    return  def random_expr(max_depth=6):    if max_depth == 0:        return [choice(['+', '-', '*', '/', '^']),                randint(1, 5),                randint(1, 5)]    else:        return [choice(['+', '-', '*', '/', '^']),                choice([random_expr(max_depth-1), randint(1, 5), 'x']),                choice([random_expr(max_depth-1), randint(1, 5), 'x'])]  def test_expr(expr, max_time=1e-5):    '''    tests for viable expr; returns True if passed else False    '''    start = clock()    a = evaluate(expr, 1.1)    end = clock()     if end - start > max_time:        # instances where max_time exceeded due to outside processes        # that have nothing to do with the expression will have to be        # considered collatoral or waste for the sake of efficiency.        return False    else:        pass    b = evaluate(expr, 1.23456)    if a == b:        # either 'x' is not in expression or operation on x nullified        # since 1.123456 and 1 for x should not produce the same result.        return False    else:        return True  def max_depth(expr, cur_depth=0):    '''    ['+', 1, ['-', 2, 3]]   -->  1    ['+', 1, 2]             -->  0    returns depth of nesting in expression    '''    if type(expr[1]) is list:        a = max_depth(expr[1], cur_depth + 1)    else:        a = cur_depth    if type(expr[2]) is list:        b = max_depth(expr[2], cur_depth + 1)    else:        b = cur_depth    return a if a > b else b  def get_any_list_at_depth(expr, depth):    '''    get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 0)  -->  ['-', 2, ['*', 3, 4]]    get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 1)  -->  ['*', 3, 4]    get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 2)  -->  False    '''    if type(expr[1]) is list:        if depth > 0:            a = get_any_list_at_depth(expr[1], depth - 1)        else:            a = expr[1]    else:        a = False     if type(expr[2]) is list:        if depth > 0:            b = get_any_list_at_depth(expr[2], depth - 1)        else:            b = expr[2]    else:        b = False     if not a and not b:        return False    elif not b:        return a    elif not a:        return b    else:        return a if randint(0, 1) else b   #-------------------- Genetic Algorithm Functions ----------------------------| def select(popul, n):    '''    returns n individuals from population    uses roulette wheel selection method according to individual fitness    wheel contains integer corresponding to indexes of popul list.    '''    popul.sort()    wheel = []     # find min score for elite group    m = popul[0][0]    for i in range(len(popul) / 2):        if popul[i][0] < m:            m = popul[i][0]     # find minimized total of elite group    total = 0    for i in range(len(popul) / 2):        score = popul[i][0] - m        total += score     # add elite group to wheel    for i in range(len(popul) / 2):        score = popul[i][0] - m        perc = int((score / total) * 100)##        wheel += [i] * perc###debugline        wheel += [i] * (perc * (len(popul) / 100))     print 'wheel:', len(wheel), 'popul:', len(popul)###debugline    # add buffer to wheel    wheel += range(len(popul))     print 'wheel:', len(wheel), 'popul:', len(popul)###debugline    return###debugline     ##    # 1. Create elite population##    elite = deepcopy(popul[:len(popul) / 2])####    # 2. Minimize fitness scores in elite.##    m = min(elite)[0]##    for ind in elite:##        end[0] -= m####    # 3. Index Wheel for elite group.##    wheel = []##    total_score = sum([n[0] for n in elite])##    for ind in elite:##        ind_perc =   ##    total_score = 0.0##    for ind in popul:##        try:##            total_score += ind[0]##        except OverflowError:##            ind[0] = 0##    index_wheel = []##    for i in range(len(popul)):##        ind_perc = ind[0] / total_score##        print '\tind_perc:', ind_perc##        index_wheel += [i] * int(ind_perc * 100)##    diff = 100 - len(index_wheel)##    print 'wheel length:', len(index_wheel)####    print diff###debugline##    return###debugline##    # Padding##    index_wheel += range(len(popul))##    return chosen  def combine(pair, mutation_prob=2):    '''    combine([[score, expr], [score, expr]])  -->  [[None, expr], [None, expr]]    picks a random mutual depth and makes one random operand swap between parents    original parents not affected    mutation_prob  -->  1 out of n chances.    '''    expr1, expr2 = deepcopy(pair[0][1]), deepcopy(pair[1][1])     # gage depth of both expressions to determin max mutual depth    depth = max_depth(expr1), max_depth(expr2)    depth = depth[0] if depth[0] < depth[1] else depth[1]##    print 'max depth:', depth###debugline    depth -= 1     # choose a random depth in range of max depth    swap_point = randint(-1, depth) ##    print 'swap point:', swap_point###debugline     # make one swap between two random operands at swap_point.    if swap_point > -1:        expr1_swap_point = get_any_list_at_depth(expr1, swap_point)        expr2_swap_point = get_any_list_at_depth(expr2, swap_point)    else:##        print "!!"###debugline        expr1_swap_point = expr1        expr2_swap_point = expr2    operand1 = randint(1, 2)    operand2 = randint(1, 2)     # mutation    if not randint(0, mutation_prob-1):        mut_expr = choice([expr1_swap_point, expr2_swap_point])        mut_operand = randint(1, 2)        if type(mut_expr[mut_operand]) is list:            pass        elif mut_expr[mut_operand] is 'x':            pass        elif type(mut_expr[mut_operand]) is str:            mut_expr[mut_operand] = choice(['+', '*', '-', '/', '^'])        else: # is int            mut_expr[mut_operand] += randint(0, 5)    else:        pass     expr1_swap_point[operand1], expr2_swap_point[operand2] = deepcopy(expr2_swap_point[operand2]), deepcopy(expr1_swap_point[operand1])    return [[0, expr1], [0, expr2]]  def fitness(ind, target):    '''    fitness([score, expr], target)  -->  [score, expr]    '''    # get a sequence from expr starting at x = 0    trash, expr =  ind    seq = [evaluate(expr, x) for x in range(len(target))]     # check the difference in shape of the curve between target and seq    curve_dif = [seq[i] - target[i] if seq[i] != 'er' else 'er' for i in range(len(seq))]     # replace 'er' with worst case i.e max difference    mx = max([n for n in curve_dif if type(n) is not str])    curve_dif = [mx if i == 'er' else i for i in curve_dif]     # find average difference and subtract from all to normalize curve_dif    try:        av = float(sum(curve_dif)) / len(curve_dif)    except OverflowError:        av = sum(curve_dif) / len(curve_dif)    norm_curve_dif = [i - av for i in curve_dif]     # CURVE/SHAPE DIFFERENCE    total_curv_dif = sum(norm_curve_dif)     # ANGLE OF CURVE DIFFERENCE    total_angle_dif = sum(map(abs, norm_curve_dif))     # RAW/MAGNITUDE DIFFERENCE    total_raw_dif = sum(map(abs, curve_dif))     return [(total_curv_dif * metrics[0]) + (total_angle_dif * metrics[1]) + (total_raw_dif * metrics[2]),            expr]  def genetic_algorithm_1(target, a, b, c, num_results=1, max_time=60):    '''    a: population size    b: offspring per family    c: ratio of parents to offspring for next pop selection    '''    a = a if a % 2 == 0 else a + 1    b = b if b >= 2 else 2    results = []     # 1. Create initial population of a individuals.    # popul  ->  [[score, expr], ... ]    popul = []    while len(popul) < a:        expr = random_expr()        if expr and test_expr(expr):                popul.append([0, expr])     # 2. Test fitness of individuals.    popul = [fitness(ind, target) for ind in popul]    popul.sort()     start = clock()    generations = 0###debugline    while len(results) < num_results and clock() - start < max_time:         # 3. Put population into random pairs.        parents = deepcopy(popul)        shuffle(parents)        parents = zip(parents[:a/2], parents[a/2:])         # 4. Combine each pair to produce b children.        offspring = []         for pair in parents:            for children in range(b):                child1, child2 = combine(pair)                offspring.append(child1)                offspring.append(child2) ##        return offspring###debugline         # 5. Test fitness of offspring        offspring = [child for child in offspring if test_expr(child[1])]        offspring = [fitness(child, target) for child in offspring]        offspring.sort()         # 6. Select (a * c) parents.        popul = popul[:int(a*c)]        select(popul, int(a*c))###debugline##        popul = select(popul, int(a*c))         # 7. Select (a - (a * c)) offspring.        popul += offspring[:a - int(a * c)]##        popul += select(offspring, a - int(a * c))        select(offspring, (a - int(a*c)))        popul.sort()         # check for suitable results         generations += 1###debugline     return popul, generations#results   # -------------------------------- Metrics -----------------------------------| try:    with open('metrics.pkl', 'r') as f:        metrics = pickle.load(f)except IOError:    print "GA metrics not found! loading default."    metrics = [1, 1, 1]  # --------------------------------- Main -------------------------------------| def find_sequence_function(seq):    '''    [0, 1, 4, 9, 16] --> 'x^2'    '''    # ???    return  def tune_metrics(runt_time=60):    '''    Runs a simple GA for stochastic optimization of metrics    '''    global metrics    # ?    return  # ----------------------------- Workbench -----------------------------------|  target = [(2 + (10*x) + (x**2)) for x in range(10)] popul = [[0, random_expr()] for i in range(500)]popul = [fitness(i, target) for i in popul]save = deepcopy(popul)a = select(popul, 300)  ##targets = [('x * 2', [x*2 for x in range(10)]),##           ('(x * 2) - 1', [(x*2)-1 for x in range(10)]),##           ('2 + (10*x) + (x**2)', [(2 + (10*x) + (x**2)) for x in range(10)])] ##targets = [('2 + (10*x) + (x**2)', [(2 + (10*x) + (x**2)) for x in range(10)])]##bucket = []####for target in targets:##    bucket.append(genetic_algorithm_1(target[1], 500, 5, 3/10.0, max_time=2))##    result = bucket[-1][0][0][1]##    print '-'*20##    print 'expression:', target[0]##    print 'GA result: ', infix(result)##    print target[1]##    print [evaluate(result, x) for x in range(10)]  ##pair = [[0, ['+', 'a', ['+', 'a', ['+', 'a', 'a']]]],##        [0, ['+', 'b', ['+', 'b', ['+', 'b', 'b']]]]]####print 'parent1', pair[0][1]##print 'parent2', pair[1][1]##for i in range(10):##    print '\n\n'##    child = combine(pair)##    print '\n'##    print pair[0][1]##    print child[0][1]##    print '\n'##    print pair[1][1]##    print child[1][1]  ##pair = [[0, ['+', 'a', ['+', 'a', 'a']]],##        [0, ['+', 'b', ['+', 'b', ['+', 'b', 'b']]]]]####print '-' * 50##print 'parent1', pair[0][1]##print 'parent2', pair[1][1]##for i in range(10):##    print '\n\n'##    child = combine(pair)##    print '\n'##    print pair[0][1]##    print child[0][1]##    print '\n'##    print pair[1][1]##    print child[1][1]  ##def timer(expr, runs):##    'returns time to evaluate the expression'##    start = clock()##    for run in xrange(runs):##        evaluate(expr, 7)##    end = clock()##    return (end - start) / runs  ##samples = [random_expr() for expr in range(100)]##samples = [expr for expr in samples if test_expr(expr)]##samples = [expr for expr in samples if evaluate(expr, 2) != evaluate(expr, 7)]  ##for i in range(100):##    samples = [random_expr() for expr in range(100)]##    samples = [expr for expr in samples if test_expr(expr)]##    samples = [expr for expr in samples if evaluate(expr, 2) != evaluate(expr, 7)]##    print '\t', len(samples)
```

I think it's coming together quite well so far. There is one problem I've identified though. Every 5 or so times I run it it will either hang or it will produce one of the errors that are supposed to be handled in the try except block. The try except does seems to be doing it's job other than the one or two occasions where it just seems not to. Still trying to find out why that might be, if anyone knows please feel free to post, or with suggestions, feedback, criticism, etc.

log

I almost wrote clog for code log but ... you can see how that wouln't be very catchy

Mon | 12-11-2012
- added max_operand=200 argument to evaluate() as temporary fix to prevent overly large calculations hanging the machine.
- refactored test_expr() to also test if 'x' exists in exression and moved off workbench.
- started on code for genetic_algorithm.

Tues | 13-11-2012
- Got little more progress on genetic_algorithm(), almost complete.
- Started thinking about fitness() function.
- Made max_depth() function but wondering if there's a better way.
- Started on combine() function and a little stuck on a nice way to impliment it :s
* obviously didn't have allot of free time today or yesterday. Will use the time away from screen to try and refactor ideas rather than code.
* Wondering if this will help get me closer to future job as a developer... or not, still fun though

Wed | 14-11-2012
- Fixed max_depth() function.
- created get_any_list_at_depth() function (long name I know, it's temporary)
- Completed combine function. Shew that was tricky... and a bit messy.
managed to do it thanks to list pointers.
- removed max_operand=200 parameter from evaluate() and added some conditions in the function
instead that gaurd againsts excessive multiplication and power operations appropriately.
- Couldn't sleep so took a look at fitness function and actually put something together.
Now all I have to do is create the selection function which shouldn't be difficult at all, and finish up the GA
so I can do some tests and finally see some results... if any. Will use another GA (I know ga crazy in here) to
find the right settings for the three metrics used in the fitness function for:
curve shape difference
curve angle difference
raw sequence difference
* Note to self: Brute force approaches in the form of GA's can be more work than they seem

Thur | 15-11-2012
- Got GA to a point were it can at least return some results.
- Thought I had completed fitness() function last night, not really. Had much more free time today and it's only 1 in the afternoon, but most of the time went to debugging.
- Still haven't implemented any kind of mutation and metrics are still at [1, 1, 1]
but managed to get something working with just a random seed population and crossover operations and the fitness function I came up with last night.

Code:
```GA metrics not found! loading default.
--------------------
expression: x * 2
GA result:  (1 * (x + x))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
--------------------
expression: (x * 2) - 1
GA result:  ((x + x) - 1)
[-1, 1, 3, 5, 7, 9, 11, 13, 15, 17]
[-1, 1, 3, 5, 7, 9, 11, 13, 15, 17]
--------------------
expression: 2 + (10*x) + (x**2)
GA result:  ((4 + (x + 4)) * (x + 1))
[2, 13, 26, 41, 58, 77, 98, 121, 146, 173]
[8, 18, 30, 44, 60, 78, 98, 120, 144, 170]```

Last one was getting there
Each of the three tests were run for 60 seconds and the population returned in sorted order of fitness. The first individual is taken to print the results above. Remember this is nothing but a random seed population, a simple fitness function that takes three different measurements and a crossover function. Quite pleased.
...later...
- Threw in some random mutation in the combine() function.
- Identified a problem with my algorithm. It's being affected by extreme elitism, causing the resulting population after a few generations to consist of clones of the same individual. In other words, it's been infected by Agent Smith. Will refacter GA1 to incorporate the rouletter wheel selection method while still being monogamous, if that still makes a difference.

Frid | 16-11-2012
Didn't do much today. Tried to impliment selection function which was supposed to be one of the easier tasks but failed twice. At this point due to the number of bugs I'm thinking of how the entire thing can be refactored. Not sure if classes would be suitable or what would be the best way to use them in this situation. Back to drawing board I suppose.

Teus | 20-11-2012
Haven't given up, but now that I'm re-factoring everything with classes I've decided to use the opportunity to finally learn some OOP with python. It's coming together.
You can find the work in progress here.

#2
November 11th, 2012, 03:44 PM
 russ123
Registered User

Join Date: Nov 2012
Posts: 19
Time spent in forums: 8 h 21 m 37 sec
Reputation Power: 0
Genetic Algorithm

Wikipedia shows a Simple generational genetic algorithm procedure.

I came up with these two algorithms so far. I plan to implement both just to see if there's any difference. Assuming that either will work that is.

Code:
```Genetic Algorithm 1: Monogamous

parameters:

A   population size
B   Number of offspring per pair (must be >= 2)
C   ratio of parents to offspring for new population.
i.e parents / offspring
eg. 2/8 = 0.25

1. Create initial population of A randomly generated expressions.
2. Put individuals into random pairs. (Population is reduced to half)
3. Combine parents to produce (B) offspring per pair.
4. Evaluate and select (A - (A * C)) best offspring for new population.
5. Select (A * C) best parents for new population.
6. Repeat from step 2.

crossover and mutation occur at step 3.
selection occurs at steps 4 and 5.
Max pop size: step 3 = (A + (A * B))

Genetic Algorithm 2: Polygamous

parameters:

A   population size
B   ratio of parents to offspring for new population.
i.e parents / offspring
eg. 2/8 = 0.25

1. Create initial population of A randomly generated expressions.
2. Evaluate all individuals.
3. Using roulette wheel selection method, pick two parents.
4. Combine parents to produce one offspring, append to new pop.
5. Repeat from step 3 until new pop size == (A - (A * B)).
6. Select (A * B) parents to append to new population.
7. Repeat from step 3.

crossover and mutation occur at step 3.
selection occurs at step 3 and 6.
Max pop size: at step 6 = (A + (A - (A * B)))
```

I was just thinking now, some kind of selection method can also take place at step 2 for GA1. Something like paring according to fitness or likeness or both, or perhaps unlikeness. I don't know. Going to call it a day for now.

#3
November 11th, 2012, 08:07 PM
 b49P23TIvg
Contributing User

Join Date: Aug 2011
Posts: 4,163
Time spent in forums: 1 Month 3 Weeks 2 Days 9 h 25 m 16 sec
Reputation Power: 455
is this problem 28 or 58 or 167 (other?). I haven't yet solved 167. As I recall, quadratic equations fit the diagonals of the Ulam spiral. (Working from center center outward along a diagonal.)

Hey, if you're going to use GA try it for a worthy problem:
http://projecteuler.net/problem=314
That problem, to me, seems suited to GA.
__________________
[code]Code tags[/code] are essential for python code!

#4
November 11th, 2012, 09:34 PM
 russ123
Registered User

Join Date: Nov 2012
Posts: 19
Time spent in forums: 8 h 21 m 37 sec
Reputation Power: 0
Quote:
 Originally Posted by b49P23TIvg is this problem 28 or 58 or 167 (other?). I haven't yet solved 167. As I recall, quadratic equations fit the diagonals of the Ulam spiral. (Working from center center outward along a diagonal.) Hey, if you're going to use GA try it for a worthy problem: http://projecteuler.net/problem=314 That problem, to me, seems suited to GA.

Woah! That's a hard one. I'm only level 2 and I've managed to do the problems consecutively so far, so don't want to tackle that one for now. Still very interesting though.

The problem this was related to was 58 but it's not to solve the problem. I already solved it using a method from prob 28. I can't remember exactly why but I found that even if I found both quadratics (thanks for mentioning that btw) it still wouldn't work.

The GA is only for the find_sequence_function() which I thought might come in handy, but also I just want to code something

#5
November 11th, 2012, 09:49 PM
 russ123
Registered User

Join Date: Nov 2012
Posts: 19
Time spent in forums: 8 h 21 m 37 sec
Reputation Power: 0

#6
November 12th, 2012, 08:16 AM
 russ123
Registered User

Join Date: Nov 2012
Posts: 19
Time spent in forums: 8 h 21 m 37 sec
Reputation Power: 0
Found that annoying bug that causes it to hang. I guess I didn't think too carefully about the mathematics of the whole thing, but it was running into expression made up mostly of power operations that were causing astronomical calculations that would freeze my computer.

I solved it temporarily by implementing a max_operand=200 default parameter in the evaluate() function. But that seems like it will limit the value for the x parameter. So thinking of another way to avoid dangerous calculations.

Started implementing GA1. Thinking of how to implement the combination process pro-grammatically. The idea is that an expression will be considered in levels like depicted in the tree form in the Design/concept. The combination function should randomly choose a level and split each parent at that level, combining opposite halves. There's a name for that technique, forgot what it's called.

 Viewing: Dev Shed Forums > Programming Languages > Python Programming > Project for find_function() | Find the mathematical expression for a given sequence