#1
  1. ID10T
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2004
    Location
    Yes
    Posts
    632
    Rep Power
    27

    urllib download timeout


    Hi im using urllib to download webpage(s) like this:

    Code:
    try:
    		web_page = urllib.urlopen(page)
    	except:
    		print "Couldnt download\t -> ", page
    if a page doesnt exist or is not available it triggers the exception which is fine, however some pages exist but either hang when downloading or are just to big(im on 56k), how can i set a timeout to trigger the exception if the operation isnt completed in say 20 seconds?


    Also, the only way i can currently terminate the program is with ctrl-z on the linux commandline. This however leaves the sqlite db in using corrupted ot otherwise damaged. I get:

    Code:
    [root@localhost v.1]# sqlite database
    Unable to open database "database": file is encrypted or is not a database
    How can i make the program exit nicely when pressing the the esc key or something similar?
    Im sure its on the net somewhere but i cant seem to find it and the fact that im on a slow connection doesnt help so any hints appreciated.

    Cheers
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2004
    Location
    There where the rabbits jump
    Posts
    556
    Rep Power
    11
    Hey I have never worked with the URLLIB but is not there a timeout variable to make when you init the class if not then is there someway to stop the thing

    The timing would be done by a Thread and sleep(20)
    Those people who think they know everything are a great annoyance to those of us who do.
  4. #3
  5. ID10T
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2004
    Location
    Yes
    Posts
    632
    Rep Power
    27
    Ok so i had another look around the web & found this solution:

    Import socket
    and:

    Code:
    socket.setdefaulttimeout(1)
    	try:
    		web_page = urllib.urlopen(page)
    	except:
    		print "Couldnt download -> ", page
    this didnt work and i got the following(edit: when th eoriginal error was ecountered):

    Code:
    Traceback (most recent call last):
      File "./get_urls", line 205, in ?
        main()
      File "./get_urls", line 203, in main
        main_loop()
      File "./get_urls", line 178, in main_loop
        all = download_new_page(current_page)
      File "./get_urls", line 161, in download_new_page
        main_loop()
      File "./get_urls", line 176, in main_loop
        current_page = get_unchecked_url()
      File "./get_urls", line 144, in get_unchecked_url
        main_loop()
      File "./get_urls", line 178, in main_loop
        all = download_new_page(current_page)
      File "./get_urls", line 161, in download_new_page
        main_loop()
      File "./get_urls", line 176, in main_loop
        current_page = get_unchecked_url()
      File "./get_urls", line 144, in get_unchecked_url
        main_loop()
      File "./get_urls", line 178, in main_loop
        all = download_new_page(current_page)
      File "./get_urls", line 161, in download_new_page
        main_loop()
      File "./get_urls", line 178, in main_loop
        all = download_new_page(current_page)
      File "./get_urls", line 164, in download_new_page
        bing = web_page.read()
      File "/usr/lib/python2.4/socket.py", line 285, in read
        data = self._sock.recv(recv_size)
    socket.timeout: timed out
    the last line says the socket timed out so the code seemed to be going in the right direction.

    I had a look at the file socket.py line 285 which was mentioned at the bottom of the traceback; it looks like this:
    Code:
     def read(self, size=-1):
            data = self._rbuf
            if size < 0:
                # Read until EOF
                buffers = []
                if data:
                    buffers.append(data)
                self._rbuf = ""
                if self._rbufsize <= 1:
                    recv_size = self.default_bufsize
                else:
                    recv_size = self._rbufsize
                while True:
    
                    data = self._sock.recv(recv_size) # this is line 285
    
                    if not data:
                        break
                    buffers.append(data)
                return "".join(buffers)
    Does anyone know by chance what its doing or trying to do?

    I then did a try/except around the offending line of code like this:

    Code:
    		try:
                    data = self._sock.recv(recv_size)
    		except:
    			print "Whoops!!!"
    This worked well enough, preventing the program from crashing out & doing a debug, but left it in some sort of endless loop.

    So i did this:
    Code:
    		try:
                    	data = self._sock.recv(recv_size)
    		except:
    			print "Whoops!!!"
    			return ""
    which seems to be working fine.
    Im not sure exactly what im asking for anymore because im just blundering around, but i suppose it would be good to know if anyone knows a way to to get it to work without having to edit sockets.py.

    Or even better if someone has a clue as whats going on. Besides that the problem is temporarily solved.

IMN logo majestic logo threadwatch logo seochat tools logo