Thread: Project for find_function() | Find the mathematical expression for a given sequence

1. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2012
Posts
19
Rep Power
0

Project | find_sequence_function() for python

Hi all

I wonder if this is allowed, but this isn't really a question. My apologies if it's not. I used to use www.python-forum.org and now their down for over a week as the site was bought by the members and were planning re-renovations, so hoping that's why. Anyways, there it was ok to post and update your projects in a new thread in the "General" or "Bar" sections.

This is my project. I was trying to solve a projecteuler problem related to the ulam spiral which involved the integers that lay on the diagonals of the spiral. After some videos to recap on how to find a sequence function (interesting memories of high school ) I managed to derive a function for one diagonal but the second one had a problem. Anyways long story short... I saw that Mathematica has a module or function or whatever for this called FindSequenceFunction() that attempts to find a simple mathematical function for a given sequence. I thought that would be cool but couldn't find it in Python.

So I've also been interested in Evolutionary Computation, specifically for now, Genetic Algorithms and Genetic Programming. I've used a GA once to optimize a small set of metric effectively and thought I would try and step it up a level. So I'm going to try and use a GA to implement the find_sequence_function().

I've been thinking about and working on this intermittently for a few days so this is my progress so far. These two blocks are from the same script but I separated them here.

Design
Code:
Goal:
-----

find_sequence_function(seq, runtime)

[0, 1, 4, 9, 16] --> 'x^2'

Tries to find the sequence function for the given sequence

Concept:
--------

Putting a mathematical expression in form suitable for the GA.

infix notation:

2 + 10x + x^2

prefix notation: (arity of 2)

+ 2 + * 10 x ^ x 2

like lisp: still with arity of 2

(+ 2 (+ (* 10 x) (^ x 2)))

In a python list form:

['+', 2, ['+', ['*', 10, x], ['^', x, 2]]]

How expressions get evaluated

operator                  func
/  \          or        /  \
/    \                  /    \
operand    operand          arg    arg

Eg. evaluate(['+', 2, ['+', ['*', 10, x], ['^', x, 2]]])

/   \
/     \
/   \
/     \
mul       pow
/  \       /  \
/    \     /    \
10     x   x      2

Prototype (unfinished)code

python Code:
from random import randint, choice, shuffle
from time import clock
from copy import deepcopy
import pickle

# ----------------------------- Operators ------------------------------------|

add = lambda a, b: a + b
sub = lambda a, b: a - b
mul = lambda a, b: a * b
div = lambda a, b: float(a) / b
# pow --> builtin

'-':sub,
'*':mul,
'/':div,
'^'<img src="http://images.devshed.com/fds/smilies/tongue.gif" border="0" alt="" title="Stick Out Tongue" class="inlineimg" />ow}

op_to_symb = dict([(symb_to_op[symb], symb)
for symb in symb_to_op])

# ---------------------- Expression Functions --------------------------------|

def evaluate(expr, x):
'''
evaluate(['+', 1, 'x'], 2)  -->  3
'''
if type(expr) is int:
return expr
else:
try:
op, a, b = symb_to_op[expr[0]], expr[1], expr[2]
a = evaluate(a, x) if type(a) is list else (x if a is 'x' else a)
b = evaluate(b, x) if type(b) is list else (x if b is 'x' else b)
if a is 'er' or b is 'er':
return 'er'
elif (a > 1000 or b > 1000) and expr[0] == '^':
return 'er'
elif (a > 1000000 or b > 1000000) and expr[0] == '*':
return 'er'
else:
return op(a, b)
except (ZeroDivisionError, OverflowError, ValueError):
##            print 'error!'###debugline
return 'er'

def infix(expr):
'''
['+', 1, 'x']  -->  '(1 + x)'
'''
symb, a, b = expr[0], expr[1], expr[2]
a = infix(a) if type(a) is list else a
b = infix(b) if type(b) is list else b
return '(%s %s %s)' % (a, symb, b)

##def prefix(expr):
##    '''
##    '(1 + x) --> ['+', 1, 'x']
##    '''
##    # ?
##    return

def random_expr(max_depth=6):
if max_depth == 0:
return [choice(['+', '-', '*', '/', '^']),
randint(1, 5),
randint(1, 5)]
else:
return [choice(['+', '-', '*', '/', '^']),
choice([random_expr(max_depth-1), randint(1, 5), 'x']),
choice([random_expr(max_depth-1), randint(1, 5), 'x'])]

def test_expr(expr, max_time=1e-5):
'''
tests for viable expr; returns True if passed else False
'''
start = clock()
a = evaluate(expr, 1.1)
end = clock()

if end - start > max_time:
# instances where max_time exceeded due to outside processes
# that have nothing to do with the expression will have to be
# considered collatoral or waste for the sake of efficiency.
return False
else:
pass
b = evaluate(expr, 1.23456)
if a == b:
# either 'x' is not in expression or operation on x nullified
# since 1.123456 and 1 for x should not produce the same result.
return False
else:
return True

def max_depth(expr, cur_depth=0):
'''
['+', 1, ['-', 2, 3]]   -->  1
['+', 1, 2]             -->  0
returns depth of nesting in expression
'''
if type(expr[1]) is list:
a = max_depth(expr[1], cur_depth + 1)
else:
a = cur_depth
if type(expr[2]) is list:
b = max_depth(expr[2], cur_depth + 1)
else:
b = cur_depth
return a if a > b else b

def get_any_list_at_depth(expr, depth):
'''
get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 0)  -->  ['-', 2, ['*', 3, 4]]
get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 1)  -->  ['*', 3, 4]
get_..._depth(['+', 1, ['-', 2, ['*', 3, 4]]], 2)  -->  False
'''
if type(expr[1]) is list:
if depth > 0:
a = get_any_list_at_depth(expr[1], depth - 1)
else:
a = expr[1]
else:
a = False

if type(expr[2]) is list:
if depth > 0:
b = get_any_list_at_depth(expr[2], depth - 1)
else:
b = expr[2]
else:
b = False

if not a and not b:
return False
elif not b:
return a
elif not a:
return b
else:
return a if randint(0, 1) else b

#-------------------- Genetic Algorithm Functions ----------------------------|

def select(popul, n):
'''
returns n individuals from population
uses roulette wheel selection method according to individual fitness

wheel contains integer corresponding to indexes of popul list.
'''
popul.sort()
wheel = []

# find min score for elite group
m = popul[0][0]
for i in range(len(popul) / 2):
if popul[i][0] < m:
m = popul[i][0]

# find minimized total of elite group
total = 0
for i in range(len(popul) / 2):
score = popul[i][0] - m
total += score

# add elite group to wheel
for i in range(len(popul) / 2):
score = popul[i][0] - m
perc = int((score / total) * 100)
##        wheel += [i] * perc###debugline
wheel += [i] * (perc * (len(popul) / 100))

print 'wheel:', len(wheel), 'popul:', len(popul)###debugline
# add buffer to wheel
wheel += range(len(popul))

print 'wheel:', len(wheel), 'popul:', len(popul)###debugline
return###debugline

##    # 1. Create elite population
##    elite = deepcopy(popul[:len(popul) / 2])
##
##    # 2. Minimize fitness scores in elite.
##    m = min(elite)[0]
##    for ind in elite:
##        end[0] -= m
##
##    # 3. Index Wheel for elite group.
##    wheel = []
##    total_score = sum([n[0] for n in elite])
##    for ind in elite:
##        ind_perc =

##    total_score = 0.0
##    for ind in popul:
##        try:
##            total_score += ind[0]
##        except OverflowError:
##            ind[0] = 0
##    index_wheel = []
##    for i in range(len(popul)):
##        ind_perc = ind[0] / total_score
##        print '\tind_perc:', ind_perc
##        index_wheel += [i] * int(ind_perc * 100)
##    diff = 100 - len(index_wheel)
##    print 'wheel length:', len(index_wheel)
####    print diff###debugline
##    return###debugline
##    index_wheel += range(len(popul))
##    return chosen

def combine(pair, mutation_prob=2):
'''
combine([[score, expr], [score, expr]])  -->  [[None, expr], [None, expr]]
picks a random mutual depth and makes one random operand swap between parents
original parents not affected
mutation_prob  -->  1 out of n chances.
'''
expr1, expr2 = deepcopy(pair[0][1]), deepcopy(pair[1][1])

# gage depth of both expressions to determin max mutual depth
depth = max_depth(expr1), max_depth(expr2)
depth = depth[0] if depth[0] < depth[1] else depth[1]
##    print 'max depth:', depth###debugline
depth -= 1

# choose a random depth in range of max depth
swap_point = randint(-1, depth)

##    print 'swap point:', swap_point###debugline

# make one swap between two random operands at swap_point.
if swap_point > -1:
expr1_swap_point = get_any_list_at_depth(expr1, swap_point)
expr2_swap_point = get_any_list_at_depth(expr2, swap_point)
else:
##        print "!!"###debugline
expr1_swap_point = expr1
expr2_swap_point = expr2
operand1 = randint(1, 2)
operand2 = randint(1, 2)

# mutation
if not randint(0, mutation_prob-1):
mut_expr = choice([expr1_swap_point, expr2_swap_point])
mut_operand = randint(1, 2)
if type(mut_expr[mut_operand]) is list:
pass
elif mut_expr[mut_operand] is 'x':
pass
elif type(mut_expr[mut_operand]) is str:
mut_expr[mut_operand] = choice(['+', '*', '-', '/', '^'])
else: # is int
mut_expr[mut_operand] += randint(0, 5)
else:
pass

expr1_swap_point[operand1], expr2_swap_point[operand2] = deepcopy(expr2_swap_point[operand2]), deepcopy(expr1_swap_point[operand1])
return [[0, expr1], [0, expr2]]

def fitness(ind, target):
'''
fitness([score, expr], target)  -->  [score, expr]
'''
# get a sequence from expr starting at x = 0
trash, expr =  ind
seq = [evaluate(expr, x) for x in range(len(target))]

# check the difference in shape of the curve between target and seq
curve_dif = [seq[i] - target[i] if seq[i] != 'er' else 'er' for i in range(len(seq))]

# replace 'er' with worst case i.e max difference
mx = max([n for n in curve_dif if type(n) is not str])
curve_dif = [mx if i == 'er' else i for i in curve_dif]

# find average difference and subtract from all to normalize curve_dif
try:
av = float(sum(curve_dif)) / len(curve_dif)
except OverflowError:
av = sum(curve_dif) / len(curve_dif)
norm_curve_dif = [i - av for i in curve_dif]

# CURVE/SHAPE DIFFERENCE
total_curv_dif = sum(norm_curve_dif)

# ANGLE OF CURVE DIFFERENCE
total_angle_dif = sum(map(abs, norm_curve_dif))

# RAW/MAGNITUDE DIFFERENCE
total_raw_dif = sum(map(abs, curve_dif))

return [(total_curv_dif * metrics[0]) + (total_angle_dif * metrics[1]) + (total_raw_dif * metrics[2]),
expr]

def genetic_algorithm_1(target, a, b, c, num_results=1, max_time=60):
'''
a: population size
b: offspring per family
c: ratio of parents to offspring for next pop selection
'''
a = a if a % 2 == 0 else a + 1
b = b if b >= 2 else 2
results = []

# 1. Create initial population of a individuals.
# popul  ->  [[score, expr], ... ]
popul = []
while len(popul) < a:
expr = random_expr()
if expr and test_expr(expr):
popul.append([0, expr])

# 2. Test fitness of individuals.
popul = [fitness(ind, target) for ind in popul]
popul.sort()

start = clock()
generations = 0###debugline
while len(results) < num_results and clock() - start < max_time:

# 3. Put population into random pairs.
parents = deepcopy(popul)
shuffle(parents)
parents = zip(parents[:a/2], parents[a/2:])

# 4. Combine each pair to produce b children.
offspring = []
for pair in parents:
for children in range(b):
child1, child2 = combine(pair)
offspring.append(child1)
offspring.append(child2)

##        return offspring###debugline

# 5. Test fitness of offspring
offspring = [child for child in offspring if test_expr(child[1])]
offspring = [fitness(child, target) for child in offspring]
offspring.sort()

# 6. Select (a * c) parents.
popul = popul[:int(a*c)]
select(popul, int(a*c))###debugline
##        popul = select(popul, int(a*c))

# 7. Select (a - (a * c)) offspring.
popul += offspring[:a - int(a * c)]
##        popul += select(offspring, a - int(a * c))
select(offspring, (a - int(a*c)))
popul.sort()

# check for suitable results

generations += 1###debugline

return popul, generations#results

# -------------------------------- Metrics -----------------------------------|

try:
with open('metrics.pkl', 'r') as f:
except IOError:
metrics = [1, 1, 1]

# --------------------------------- Main -------------------------------------|

def find_sequence_function(seq):
'''
[0, 1, 4, 9, 16] --> 'x^2'
'''
# ???
return

def tune_metrics(runt_time=60):
'''
Runs a simple GA for stochastic optimization of metrics
'''
global metrics
# ?
return

# ----------------------------- Workbench -----------------------------------|

target = [(2 + (10*x) + (x**2)) for x in range(10)]

popul = [[0, random_expr()] for i in range(500)]
popul = [fitness(i, target) for i in popul]
save = deepcopy(popul)
a = select(popul, 300)

##targets = [('x * 2', [x*2 for x in range(10)]),
##           ('(x * 2) - 1', [(x*2)-1 for x in range(10)]),
##           ('2 + (10*x) + (x**2)', [(2 + (10*x) + (x**2)) for x in range(10)])]

##targets = [('2 + (10*x) + (x**2)', [(2 + (10*x) + (x**2)) for x in range(10)])]
##bucket = []
##
##for target in targets:
##    bucket.append(genetic_algorithm_1(target[1], 500, 5, 3/10.0, max_time=2))
##    result = bucket[-1][0][0][1]
##    print '-'*20
##    print 'expression:', target[0]
##    print 'GA result: ', infix(result)
##    print target[1]
##    print [evaluate(result, x) for x in range(10)]

##pair = [[0, ['+', 'a', ['+', 'a', ['+', 'a', 'a']]]],
##        [0, ['+', 'b', ['+', 'b', ['+', 'b', 'b']]]]]
##
##print 'parent1', pair[0][1]
##print 'parent2', pair[1][1]
##for i in range(10):
##    print '\n\n'
##    child = combine(pair)
##    print '\n'
##    print pair[0][1]
##    print child[0][1]
##    print '\n'
##    print pair[1][1]
##    print child[1][1]

##pair = [[0, ['+', 'a', ['+', 'a', 'a']]],
##        [0, ['+', 'b', ['+', 'b', ['+', 'b', 'b']]]]]
##
##print '-' * 50
##print 'parent1', pair[0][1]
##print 'parent2', pair[1][1]
##for i in range(10):
##    print '\n\n'
##    child = combine(pair)
##    print '\n'
##    print pair[0][1]
##    print child[0][1]
##    print '\n'
##    print pair[1][1]
##    print child[1][1]

##def timer(expr, runs):
##    'returns time to evaluate the expression'
##    start = clock()
##    for run in xrange(runs):
##        evaluate(expr, 7)
##    end = clock()
##    return (end - start) / runs

##samples = [random_expr() for expr in range(100)]
##samples = [expr for expr in samples if test_expr(expr)]
##samples = [expr for expr in samples if evaluate(expr, 2) != evaluate(expr, 7)]

##for i in range(100):
##    samples = [random_expr() for expr in range(100)]
##    samples = [expr for expr in samples if test_expr(expr)]
##    samples = [expr for expr in samples if evaluate(expr, 2) != evaluate(expr, 7)]
##    print '\t', len(samples)

I think it's coming together quite well so far. There is one problem I've identified though. Every 5 or so times I run it it will either hang or it will produce one of the errors that are supposed to be handled in the try except block. The try except does seems to be doing it's job other than the one or two occasions where it just seems not to. Still trying to find out why that might be, if anyone knows please feel free to post, or with suggestions, feedback, criticism, etc.

log

I almost wrote clog for code log but ... you can see how that wouln't be very catchy

Mon | 12-11-2012
- added max_operand=200 argument to evaluate() as temporary fix to prevent overly large calculations hanging the machine.
- refactored test_expr() to also test if 'x' exists in exression and moved off workbench.
- started on code for genetic_algorithm.

Tues | 13-11-2012
- Got little more progress on genetic_algorithm(), almost complete.
- Started thinking about fitness() function.
- Made max_depth() function but wondering if there's a better way.
- Started on combine() function and a little stuck on a nice way to impliment it :s
* obviously didn't have allot of free time today or yesterday. Will use the time away from screen to try and refactor ideas rather than code.
* Wondering if this will help get me closer to future job as a developer... or not, still fun though

Wed | 14-11-2012
- Fixed max_depth() function.
- created get_any_list_at_depth() function (long name I know, it's temporary)
- Completed combine function. Shew that was tricky... and a bit messy.
managed to do it thanks to list pointers.
- removed max_operand=200 parameter from evaluate() and added some conditions in the function
instead that gaurd againsts excessive multiplication and power operations appropriately.
- Couldn't sleep so took a look at fitness function and actually put something together.
Now all I have to do is create the selection function which shouldn't be difficult at all, and finish up the GA
so I can do some tests and finally see some results... if any. Will use another GA (I know ga crazy in here) to
find the right settings for the three metrics used in the fitness function for:
curve shape difference
curve angle difference
raw sequence difference
* Note to self: Brute force approaches in the form of GA's can be more work than they seem

Thur | 15-11-2012
- Got GA to a point were it can at least return some results.
- Thought I had completed fitness() function last night, not really. Had much more free time today and it's only 1 in the afternoon, but most of the time went to debugging.
- Still haven't implemented any kind of mutation and metrics are still at [1, 1, 1]
but managed to get something working with just a random seed population and crossover operations and the fitness function I came up with last night.

Code:
--------------------
expression: x * 2
GA result:  (1 * (x + x))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
--------------------
expression: (x * 2) - 1
GA result:  ((x + x) - 1)
[-1, 1, 3, 5, 7, 9, 11, 13, 15, 17]
[-1, 1, 3, 5, 7, 9, 11, 13, 15, 17]
--------------------
expression: 2 + (10*x) + (x**2)
GA result:  ((4 + (x + 4)) * (x + 1))
[2, 13, 26, 41, 58, 77, 98, 121, 146, 173]
[8, 18, 30, 44, 60, 78, 98, 120, 144, 170]
Last one was getting there
Each of the three tests were run for 60 seconds and the population returned in sorted order of fitness. The first individual is taken to print the results above. Remember this is nothing but a random seed population, a simple fitness function that takes three different measurements and a crossover function. Quite pleased.
...later...
- Threw in some random mutation in the combine() function.
- Identified a problem with my algorithm. It's being affected by extreme elitism, causing the resulting population after a few generations to consist of clones of the same individual. In other words, it's been infected by Agent Smith. Will refacter GA1 to incorporate the rouletter wheel selection method while still being monogamous, if that still makes a difference.

Frid | 16-11-2012
Didn't do much today. Tried to impliment selection function which was supposed to be one of the easier tasks but failed twice. At this point due to the number of bugs I'm thinking of how the entire thing can be refactored. Not sure if classes would be suitable or what would be the best way to use them in this situation. Back to drawing board I suppose.

Teus | 20-11-2012
Haven't given up, but now that I'm re-factoring everything with classes I've decided to use the opportunity to finally learn some OOP with python. It's coming together.
You can find the work in progress here.
2. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2012
Posts
19
Rep Power
0

Genetic Algorithm

Wikipedia shows a Simple generational genetic algorithm procedure.

I came up with these two algorithms so far. I plan to implement both just to see if there's any difference. Assuming that either will work that is.

Code:
Genetic Algorithm 1: Monogamous

parameters:

A   population size
B   Number of offspring per pair (must be >= 2)
C   ratio of parents to offspring for new population.
i.e parents / offspring
eg. 2/8 = 0.25

1. Create initial population of A randomly generated expressions.
2. Put individuals into random pairs. (Population is reduced to half)
3. Combine parents to produce (B) offspring per pair.
4. Evaluate and select (A - (A * C)) best offspring for new population.
5. Select (A * C) best parents for new population.
6. Repeat from step 2.

crossover and mutation occur at step 3.
selection occurs at steps 4 and 5.
Max pop size: step 3 = (A + (A * B))

Genetic Algorithm 2: Polygamous

parameters:

A   population size
B   ratio of parents to offspring for new population.
i.e parents / offspring
eg. 2/8 = 0.25

1. Create initial population of A randomly generated expressions.
2. Evaluate all individuals.
3. Using roulette wheel selection method, pick two parents.
4. Combine parents to produce one offspring, append to new pop.
5. Repeat from step 3 until new pop size == (A - (A * B)).
6. Select (A * B) parents to append to new population.
7. Repeat from step 3.

crossover and mutation occur at step 3.
selection occurs at step 3 and 6.
Max pop size: at step 6 = (A + (A - (A * B)))
I was just thinking now, some kind of selection method can also take place at step 2 for GA1. Something like paring according to fitness or likeness or both, or perhaps unlikeness. I don't know. Going to call it a day for now.
3. is this problem 28 or 58 or 167 (other?). I haven't yet solved 167. As I recall, quadratic equations fit the diagonals of the Ulam spiral. (Working from center center outward along a diagonal.)

Hey, if you're going to use GA try it for a worthy problem:
http://projecteuler.net/problem=314
That problem, to me, seems suited to GA.
4. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2012
Posts
19
Rep Power
0
Originally Posted by b49P23TIvg
is this problem 28 or 58 or 167 (other?). I haven't yet solved 167. As I recall, quadratic equations fit the diagonals of the Ulam spiral. (Working from center center outward along a diagonal.)

Hey, if you're going to use GA try it for a worthy problem:
http://projecteuler.net/problem=314
That problem, to me, seems suited to GA.
Woah! That's a hard one. I'm only level 2 and I've managed to do the problems consecutively so far, so don't want to tackle that one for now. Still very interesting though.

The problem this was related to was 58 but it's not to solve the problem. I already solved it using a method from prob 28. I can't remember exactly why but I found that even if I found both quadratics (thanks for mentioning that btw) it still wouldn't work.

The GA is only for the find_sequence_function() which I thought might come in handy, but also I just want to code something
5. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2012
Posts
19
Rep Power
0
6. No Profile Picture
Registered User
Devshed Newbie (0 - 499 posts)

Join Date
Nov 2012
Posts
19
Rep Power
0
Found that annoying bug that causes it to hang. I guess I didn't think too carefully about the mathematics of the whole thing, but it was running into expression made up mostly of power operations that were causing astronomical calculations that would freeze my computer.

I solved it temporarily by implementing a max_operand=200 default parameter in the evaluate() function. But that seems like it will limit the value for the x parameter. So thinking of another way to avoid dangerous calculations.

Started implementing GA1. Thinking of how to implement the combination process pro-grammatically. The idea is that an expression will be considered in levels like depicted in the tree form in the Design/concept. The combination function should randomly choose a level and split each parent at that level, combining opposite halves. There's a name for that technique, forgot what it's called.