#1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    10
    Rep Power
    0

    can somebody help me make my script more efficient??


    I wrote this little script that goes to finance.yahoo.com and retrieves a stock quote.

    Later this will be a function for a larger program, and it will need to be executed more often. It seems like it takes about half a second after inputing the symbol before it displays the info. I realize this may be due to lag, but I'm not sure how efficient my parsing is. I'm not very familiair with python, is there a more efficient way to parse the information out of the html??

    Thanks,

    Corey

    Code:
    #!/usr/bin/python
    import urllib2
    
    
    #Get ticker symbol and parse it into a URL
    quote=raw_input("Please enter the stock's symbol:")
    web_url = "http://finance.yahoo.com/q?s=" + quote
    
    
    #Retrieve the webpage and store it into a list
    list = (urllib2.urlopen(web_url)).readlines()
    
    
    #puts the line containing Last Trade price into price_string
    #puts the line containing Trade Time into time_string
    for x in range(len(list)):
      for i in range(len(list[x])):
        if list[x][i:i+11]=="Last Trade:":
          price_string = list[x]
        if list[x][i:i+11]=="Trade Time:":
          time_string = list[x]
    
    
    #parses html out of price_string; puts it in quote
    quote=""
    for i in range(len( price_string)):
      if  price_string[i]=='<':
        wtf=1
        continue;
      if  price_string[i]=='>':
        wtf=0
        continue;
      if wtf==0:
        quote=quote+ price_string[i]
    print quote
    
    
    #parses html out of time_string; puts it in time
    time=""
    for i in range(len( time_string)):
      if  time_string[i]=='<':
        html=1
        continue;
      if  time_string[i]=='>':
        html=0
        continue;
      if html==0:
        time=time+ time_string[i]
    print time
  2. #2
  3. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    10
    Rep Power
    0
    i've been working with Python for a bit and your commands look sound..but yet I am tried and there could be a xtra bit of code in there thats slowing it down :/ so my susgestion is see if you can get on a faster line and try it out there if its stilll too slow then debug it step by step removing and replaceing code
  4. #3
  5. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Location
    Tucson AZ
    Posts
    29
    Rep Power
    0
    I'm not sure if it would be more efficient, as I don't know how Python handles it's resources... but you could download the URL into a file clean up the resources from accessing the web and then parse the file.

    I don't know if it would be quicker because you have to store it in a file. However, it's possible that it might be faster as the amount of usage goes up. Again, it depends on how resources are handled.
  6. #4
  7. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    10
    Rep Power
    0
    oh ya and what vers of python are you running? maby that can make the difference as well......py 2.3.3 is the best vers by far ^_^
  8. #5
  9. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    10
    Rep Power
    0
    I'm on a fast connection and ping 40ms to finance.yahoo.com

    I think urlopen does store it in a temporary file, then I put that into a list for parsing. The website is always chanign though and I always need the new info, so I can't cache it.

    And I am using the newest version of python.

    I made it so after it gets the last piece of info, which will always be below the other piece in the html code, it'll break out of the loop... but there's no discernable difference in the execution time. I'm pretty sure that the delay is mostly the time it takes their webserver to respond.

    I could make it skips the first 100 or so lines(which is just css crap)... But mostly I'd just like to know out of curiosity if my parsing algorithim is efficient or not.

    Thanks,

    Corey
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2003
    Location
    Canada
    Posts
    185
    Rep Power
    0

    Re: can somebody help me make my script more efficient??


    Originally posted by A|pha_N3rd
    I wrote this little script that goes to finance.yahoo.com and retrieves a stock quote.

    Later this will be a function for a larger program, and it will need to be executed more often. It seems like it takes about half a second after inputing the symbol before it displays the info. I realize this may be due to lag, but I'm not sure how efficient my parsing is. I'm not very familiair with python, is there a more efficient way to parse the information out of the html??

    Thanks,

    Corey

    Code:
    #!/usr/bin/python
    import urllib2
    
    
    #Get ticker symbol and parse it into a URL
    quote=raw_input("Please enter the stock's symbol:")
    web_url = "http://finance.yahoo.com/q?s=" + quote
    
    
    #Retrieve the webpage and store it into a list
    list = (urllib2.urlopen(web_url)).readlines()
    
    
    #puts the line containing Last Trade price into price_string
    #puts the line containing Trade Time into time_string
    for x in range(len(list)):
      for i in range(len(list[x])):
        if list[x][i:i+11]=="Last Trade:":
          price_string = list[x]
        if list[x][i:i+11]=="Trade Time:":
          time_string = list[x]
    
    
    #parses html out of price_string; puts it in quote
    quote=""
    for i in range(len( price_string)):
      if  price_string[i]=='<':
        wtf=1
        continue;
      if  price_string[i]=='>':
        wtf=0
        continue;
      if wtf==0:
        quote=quote+ price_string[i]
    print quote
    
    
    #parses html out of time_string; puts it in time
    time=""
    for i in range(len( time_string)):
      if  time_string[i]=='<':
        html=1
        continue;
      if  time_string[i]=='>':
        html=0
        continue;
      if html==0:
        time=time+ time_string[i]
    print time
    There is nothing wrong with using someone elses code man. Just make sure you give credit to the person who first wrote it. In this case Inkdm posted that exact code on a different website. i suggest starting here to get the basics of python.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    133
    Rep Power
    12
    I timed the different parts of your code. Don't worry about the loops, they are fast enough. It is the downloading of the webpage that takes time.
    Code:
    retrieve: 4.8
    puts * 2: 0.18
    parse price: 0.0005
    parse time: 0.0005
  14. #8
  15. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    You can do the same thing in 3 lines using regular xxpressions , whether or not this is faster i don't know.. but in theory it should/could be

    Anyway here's my code (just to promote the power of regex ), you might have to make a few changes to it inorder to get the formatting you want though

    Code:
    #!/usr/bin/env python
    
    import re, urllib
    
    code = raw_input('Please enter the stock\'s symbol:')
    
    tokens = re.compile('(Trade Time:.+?|Last Trade:.+?)\n')
    source = urllib.urlopen('http://finance.yahoo.com/q?s=%s' % code).read()
    source = tokens.findall(re.sub('<.+?>', '', source))
    
    print source
    Will give you a list containing two values 'Trade Time:value' and 'Last Trade:value', with a few little changes to the regex you should be able to get any of the other values too!

    Mark.
    programming language development: www.netytan.com Hula

  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    133
    Rep Power
    12
    The problem with regexes is that they're slow. It's almost always faster to parse something in another manner, it's just that regexes are more powerful and easier. Don't use them in this case.
  18. #10
  19. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    10
    Rep Power
    0
    If you're going to accuse me of plagerising of a few lines of crappy code, at least cite the original. I can assure you that wrote it myself...

    percivall, thanks. Executing the line that fetches the website 20 times takes ~10 seconds for me. If I didn't have to wait for the first one to be done it'd be faster... How do I execute 2 lines of code at the same time??

    netytan, wow that's amazing! it took me 30 minutes of sorting through webpages to figure out how to do it, and you did it in 3 lines I think I'm going to learn more about python
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2003
    Posts
    133
    Rep Power
    12
    (continued from my last post) ... Though in this case, the lesser speed of regexes will make absolutely no difference, so use them.
  22. #12
  23. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Thanks . At first glance your code seems to be really OMG lol, but it actually makes allot of sence if u sit down and read it (of course having the page source infront of you does help allot )

    I'm a big Regex fan, they are definatly one of the most powerful tool to have in a language IMO! I'd hate to site down and write a parser from scratch every time i want something parsing! especially since webpages change from time to time

    The problem with regexes is that they're slow. It's almost always faster to parse something in another manner, it's just that regexes are more powerful and easier...
    Not sure about this one perc, i don't really see how doing using the Pyhon's re module could be slower than parsing a webpage with multiple for loops (not that these are slow!) baring in mind that the re module is written in C/C++. Of course you have the added import time, but how efficent is that!

    Maybe you would time the two script if you can? Oh just out of interest, what are you using for this, pystone?

    Edit: Infinite, where on Python 2.3.2 .. 2.3.3 hasn't been released yet dude

    Have fun guys,
    Mark.
    Last edited by netytan; October 26th, 2003 at 09:04 AM.
    programming language development: www.netytan.com Hula


IMN logo majestic logo threadwatch logo seochat tools logo