#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    5
    Rep Power
    0

    Index out of range problems


    Hi everyone,

    I'm rather new at python programming and teaching myself how via youtube. For one of the tutorials he taught how to fetch stuff from sites called screen scraping. I got the code he gives and it works fine with his example rss feed, but then when I try to do it myself on another feed it says
    "print findPatTitle [i] IndexError: list index out of range"

    here's the code which results in the error:
    Code:
    from bs4 import BeautifulSoup
    from urllib import urlopen
    import re
    
    webpage = urlopen('http://www.kayak.com/h/rss/deals').read()
    patFinderTitle = re.compile('<title>(.*)</title>')
    findPatTitle = re.findall(patFinderTitle,webpage)
    
    listIterator = []
    listIterator[:] = range(2,10)
    
    for i in listIterator:
        print findPatTitle [i]
    I just would like to simply know how come the site he provided works, while the one I'm trying does not.

    Thanks in advance!
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,963
    Rep Power
    481
    I think you need a better tutorial. The program is too complicated.

    Code:
    from urllib import urlopen
    import re
    
    webpage = urlopen('http://www.kayak.com/h/rss/deals').read()
    patFinderTitle = re.compile('<title>(.*)</title>')
    
    titles = patFinderTitle.findall(webpage)
    
    for (i,title,) in enumerate(titles):
        print('title %2d: %s'%(i,title,)
    Run this program, you'll see there are 8 titles, indexes 0 through 7. With the crazy list iterator you were trying to access the unavailable indexes 8 and 9 causing, appropriately, IndexError.
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    5
    Rep Power
    0

    Perfect


    Thanks alot!

IMN logo majestic logo threadwatch logo seochat tools logo