#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Posts
    266
    Rep Power
    14

    urllib2.urlopen() REVISITED


    I am using python to check whether certain urls exist.

    The list would be something like:
    http://aaa-6020-sql-425h.qa.lan/php/admin/launch.php
    http://aaa-6020-oci-425h.qa.lan/php/admin/launch.php

    Most urls are opened without a problem and the code works as it should. But a few urls on a specific server make the script hang. I am pretty sure this is a server issue because when I run wget from the command line it hangs as well. What I would like is a timeout feature but as far as I can tell urllib2 doesn't have one. Is there a way to create my own timeout feature for urllib2.urlopen(url)?

    Code:
    try:
                url = "http://aaa-6020-oci-425h.qa.lan/php/admin/launch.php"
                urllib2.urlopen(url)
                site_success_flag = True
    except:
                site_failure_flag = True
    Also I forgot to mention I the server is running python 2.2.2 so I do not have access to socket timeout functions.
    Last edited by Theeggman; June 18th, 2004 at 03:23 PM.
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Your best bet would probably to look though the urllib module, especially the URLopener() class and see how that works. Unfortunatly since these are both based on sockets if there is no way to set a socket timeout then you might have a problem.

    Sorry i couldn't be more help,

    Mark.
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    If you can get hold of the socket object you could try using the socket.setsockopt function to set the timeout. This calls the low-level C function of the same name, so you will need to read the C docs for the function. You want to set the SO_RCVTIMEO and SO_SNDTIMEO options. These should be available on both UNIX and Windows with WinSock 2.0, but I suspect the parameters will be different.

    You could also spawn multiple threads to check serveral servers in parallel. This will stop the whole program hanging if one server is down, and will be much faster overall.

    Dave - The Developers' Coach
    Last edited by DevCoach; June 19th, 2004 at 08:04 AM.
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Doh, never thought of that . I like the threaded idea, wonder if you could run some kind of timer and force the thread to close the timer reaches a certain level? Not sure about the speed increase though - i've read about a lot of cases where the programmer introduced threads to boost the speed of his program which ended up running at the same speed. This may be tied to that particular case though?

    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Originally Posted by netytan
    Doh, never thought of that . I like the threaded idea, wonder if you could run some kind of timer and force the thread to close the timer reaches a certain level? Not sure about the speed increase though - i've read about a lot of cases where the programmer introduced threads to boost the speed of his program which ended up running at the same speed. This may be tied to that particular case though?

    Mark.

    I think the speed increase will be real. If an application is doing a lot of processing then splitting it into threads will not speed it up - the computer still has to do the same amount of work. In this case, however, the flow of the code goes something like this...

    send a request to the server...

    ... wait for a response...

    ... keep waiting....

    ... etc...

    ... get a response and process it (or time out)

    so probably 99% of the time the program is doing nothing except waiting for a response. It could just as easily be waiting for 20 responses as for 1.

    Unfortunately there is no way of killing a thread in Python, which is a shame. The usual solution would be to have the thread polling for an exit flag, but it can't do that in this case since it is blocked on the socket call. You can do it if the socket is called with 'select', but since we are calling it through the urllib2 library we do not have that sort of low-level control of the socket. I have not looked at the urllib2 code, but it might be possible to subclass it to do this, although IMHO would be more trouble than it was worth.

    Dave - The Developers' Coach
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Posts
    266
    Rep Power
    14
    I appreciate all the feedback. I tried the thread idea and it would still hang. I was not able to kill the thread, at least that is what I think was happening.

    But I did find a solution using the signal module. I am able to set an alarm before each try catch statement and if the alarm goes off the url does not exists else the url exists. It seem sort of sloppy but it works well.

    Code:
    import signal
    import urllib2
    
    def handler(signum, frame):
        print 'Signal handler called with signal', signum
        raise IOError, "Couldn't open device!"
    
    # Set the signal handler and a 5-second alarm
    signal.signal(signal.SIGALRM, handler)
    signal.alarm(4)
    
    # This open() may hang indefinitely
    try:
           urllib2.urlopen(url)
           print "exists"
    except:
           print "does not exist"

IMN logo majestic logo threadwatch logo seochat tools logo