#1
  1. No Profile Picture
    I hate nerds
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Posts
    540
    Rep Power
    0

    how to measure speed of python script


    i wrote two functions f1 and f2, and f1 is considerably faster (f2 is about O(f1^5)) than the other. however, f1 in my opinion should be slower!

    f1 and f2 both have a for loop. the difference is that in
    f1 it passes a whole dictionary object to a subfunction and that subfunction sorts the dictionarys keys for every iteration of the loop.

    f2 on the other hand does NOT pass the dictionary object and only sorts once.

    i need some sort of time object that i can insert between statements and can tell me the difference in milliseconds. i tried using the time module, but the lowest it goes is seconds so everything rounds down to 0 seconds.
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    The timeit module should do exactly what you want here, but i have no idea how much detail this goes into.. if this isnt detailed enough for you then i'd probably use the profile or hotshot module!

    http://www.python.org/doc/current/li...le-timeit.html
    http://www.python.org/doc/current/li...e-profile.html
    http://www.python.org/doc/current/li...e-hotshot.html

    I wouldnt mind seeing your code, maybe i could explain why one is faster than the other.

    Mark.
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    I hate nerds
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Posts
    540
    Rep Power
    0

    thanks!


    hey netyan. thanks buddy, you are always very helpful.

    here is the background info. DOMlist is a global list of dictionaries. each dictionary is a tree storing the DOM model of a webpage. the key is a numeric nodeid, and the value is a string that contains the tagtype, i.e. <a>, <table>, etc. treekeys is a global list, that i use to store the sorted keys of a dictionary (i sort it to speed up getnodesinrange2 and gettagsinrange)

    the function is "numcorrect" and the one that is not commented out is the faster one. the faster one calls gettagsinrange which takes a dictionary as argument. the slower one calls getnodesinrange2, which does not take the dictionary argument. you can also see that the faster numcorrect also only sorts treekeys a lot less.

    Code:
    #def numcorrect(tagtype, windowlen):
    #	global treekeys
    #	correct = 0
    #	rules = associations[tagtype]
    #	for d in DOMlist:	#for each DOMtree
    #		treekeys = d.keys()
    #		treekeys.sort(sortbylevel)
    #		#t = time.localtime()
    #		for nodeid, nodetag in d.items():
    #			tir = []
    #			for n in getnodesinrange2(nodeid, windowlen):
    #				tir.append(d[n])
    #			
    #			if nodetag == tagtype and contains(rules, tir) == 1:
    #				correct += 1
    #		#t2 = time.localtime()
    #		#print t2[5] - t[5], "seconds"
    #	return correct
    
    def numcorrect(tagtype, windowlen):
    	correct = 0
    	rules = associations[tagtype]
    	for d in DOMlist:	#for each DOMtree
    		for nodeid, nodetag in d.items(): #for each node in DOMtree
    			if nodetag == tagtype and contains(rules, gettagsinrange(d, nodeid, windowlen)) == 1:
    				correct += 1
    	return correct
    
    def getnodesinrange2(node, windowlen):
    	nodesinrange=[]
    	for i in treekeys:
    		if node.count(".") - i.count(".") > windowlen: #i is too shallow, continue
    			continue
    		if i.count(".") - node.count(".") > windowlen: #i is too deep for window, so return
    			return nodesinrange
    		#node possibly within window, make sure not the same node
    		if node == i:
    			continue
    		#node possibly within window, get the distance
    		d = nodedistance(node, i, node.count(".") + i.count(".") + 2)
    		#append node, and weight.  weight is 
    		if d <= windowlen:
    			nodesinrange.append(i)
    	return nodesinrange
    	
    def gettagsinrange(dict, node, windowlen):
    	keys = dict.keys()
    	keys.sort()
    	nodesinrange=[]
    	for i in keys:
    		if node.count(".") - i.count(".") > windowlen: #i is too shallow, continue
    			continue
    		if i.count(".") - node.count(".") > windowlen: #i is too deep for window, so return
    			break
    		#node possibly within window, make sure not the same node
    		if node == i:
    			continue
    		#node possibly within window, get the distance
    		d = nodedistance(node, i, node.count(".") + i.count(".") + 2)
    		#append node, and weight.  weight is 
    		if d <= windowlen:
    			nodesinrange.append(i)
    	tagsinrange = []
    	for j in nodesinrange:
    		tagsinrange.append(dict[j])
    	return tagsinrange
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Very welcome sad!

    as you said, one function does more than the other, so that can explain some of the preformance differance. Then you have to take into acount that sertain things take longer than others!

    In the commented out version of numcorrect you define a global variable, you should note that globas take considerably longer to access than local variables and are usually not required.. so by passing the dictionary as an argument instead of a global your actually creating a local variable (which as mentioned above are faster - although i dont know why)

    Thats just one example, and i havn't really really read too far into DOM either or i might be able to give u better ways to sort. Oh while i remember, inline if statments where you can, not only will this speed things up but your code will be smaller i.e.

    Code:
    if node.count(".") - i.count(".") > windowlen:
    	continue
    if i.count(".") - node.count(".") > windowlen:
    	break
    if node == i:
    	continue
    could also be written like this..

    Code:
    if node.count(".") - i.count(".") > windowlen or node == i:
        continue
    else:
        break
    This one assumes that if node.cound - i.cound or node == i dont return true, break the hell outa there! If this assumbtion isnt right then you could use an elif instead i.e.

    Code:
    ...
    elif i.count(".") - node.count(".") > windowlen: break
    ...
    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    I hate nerds
    Devshed Novice (500 - 999 posts)

    Join Date
    Jul 2003
    Posts
    540
    Rep Power
    0
    thansk mark

    i thought it could be the global variable access...although i still thought that by cutting down so many other things it would be a lot faster.

    thanks for the inline function help, i will take that into consideration.

IMN logo majestic logo threadwatch logo seochat tools logo