#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    38
    Rep Power
    11

    httplib,urllib....how do I keep a connection open.


    I could find any info on this topic and I need help.

    I have a website that I want to capture webpage source from but I need to login into that site first before I gain access to the webpages I want. Im using python 2.3.3

    Code:
    import httplib
    	import urllib
    	params = urllib.urlencode({'username': 'penngray1', 'password': 'testpwd'})
    	headers = {"Content-type": "application/x-www-form-urlencoded","Accept": "text/plain"}
    	conn = httplib.HTTPConnection("fantasygames.sportingnews.com")
    	conn.request("POST", "/crs/home_check_reg.html", params, headers)
    	response = conn.getresponse()
    	return response.msg
    	if response.status == 302:
    		conn.request("GET", "/baseball/fullseason/ultimate/team_center.html")
    		r2 = conn.getresponse()
    		return r2.msg
    	return str(response.msg) + " " + str(response.status)


    The connection doesnt stay open after my POST for some reason I can access the team_center.html.

    thanks
    penn
    Last edited by penngray; April 25th, 2004 at 03:50 PM.
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    This may have to do with the way HTTP works (and urllib is built ontop of this so). In HTTP a connection is opened, the data is sent and the conection close - this is exactly what your web browser does. But I'm not aware of any way to have a long running connection with HTTP sorry.

    What are you trying to do, maybe there is another solusion that you havnt considered.

    Mark.
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    38
    Rep Power
    11
    Thanks for the reply.

    Im trying to capture results from a fantasy baseball league and post summaries on my own website. I can do it with windows based programs already but I want to do it dynamically using python (more specifically mod_python with apache).

    I was able to figure it out last night. Essentially I capture the cookies in the header and set those in each subsequent webpage I want. I found a great little 3rd part package called ClientCookies that does this all for me


    NOW!! I can run this program from python and it calls the webpage(s) up in 2-3 seconds but I run it through mod_python that is called form a form on a webpage and it takes 20 seconds or gives me an error (see below)

    I suspect it has to do with url resolution. Running it from python Im a user on linux and the url resolution is instant. Running it from a webpage means Im running it as a user called 'nobody' set up in Apache's http.conf. I actually have set my httpd.conf to a specific user with all the rights and paths but no luck there either.

    Suggestions on how to make it work with mod_python. This is a drag


    The error....

    Code:
    Mod_python error: "PythonHandler mod_python.publisher"
    
    Traceback (most recent call last):
    
      File "/usr/lib/python2.3/site-packages/mod_python/apache.py", line 299, in HandlerDispatch
        result = object(req)
    
      File "/usr/lib/python2.3/site-packages/mod_python/publisher.py", line 136, in handler
        result = util.apply_fs_data(object, req.form, req=req)
    
      File "/usr/lib/python2.3/site-packages/mod_python/util.py", line 361, in apply_fs_data
        return object(**args)
    
      File "/usr/local/apach2/fantasy/TSN/htmlread.py", line 44, in getwebpage
        resp = getwebpage.testwebpage()
    
      File "getwebpage.py", line 3, in testwebpage
        resp = getwebpage("http://fantasygames.sportingnews.com/crs/home_check_reg.html",'username=dug1&password=nuts1769')
    
      File "getwebpage.py", line 12, in getwebpage
        response = ClientCookie.urlopen(request)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 829, in urlopen
        return _opener.open(url, data)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 526, in open
        response = meth(req, response)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 426, in http_response
        response = self.parent.error(
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 543, in error
        result = apply(self._call_chain, args)
    
      File "/var/tmp/python2.3-2.3.3-root/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
        result = func(*args)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 181, in http_error_302
        return self.parent.open(new)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 520, in open
        response = urllib2.OpenerDirector.open(self, req, data)
    
      File "/var/tmp/python2.3-2.3.3-root/usr/lib/python2.3/urllib2.py", line 326, in open
        '_open', req)
    
      File "/var/tmp/python2.3-2.3.3-root/usr/lib/python2.3/urllib2.py", line 306, in _call_chain
        result = func(*args)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 754, in http_open
        return self.do_open(httplib.HTTP, req)
    
      File "/usr/lib/python2.3/site-packages/ClientCookie/_urllib2_support.py", line 612, in do_open
        raise URLError(err)
    
    URLError:
    Last edited by penngray; April 26th, 2004 at 02:53 PM.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    You left out the most important part of the traceback! The last line!
    Debian - because life's too short for worrying.
    Best. (Python.) IRC bot. ever.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    38
    Rep Power
    11
    The last line IS

    URLError:

    with nothing after it.

    you can run it www.fantasysportstools.com/TSN/testhtmlread.html

    it may work 2 to 3 times but if you keep "back paging" and clicking the "login" in button it will fail.
  10. #6
  11. Banned ;)
    Devshed Supreme Being (6500+ posts)

    Join Date
    Nov 2001
    Location
    Woodland Hills, Los Angeles County, California, USA
    Posts
    9,615
    Rep Power
    4247
    Originally Posted by netytan
    This may have to do with the way HTTP works (and urllib is built ontop of this so). In HTTP a connection is opened, the data is sent and the conection close - this is exactly what your web browser does. But I'm not aware of any way to have a long running connection with HTTP sorry.

    What are you trying to do, maybe there is another solusion that you havnt considered.

    Mark.
    Actually, there is a Connection: Keep-alive option that you can pass in the header, if you use HTTP/1.1 protocol. In ancient days (python 1.5.2), the httplib only supported HTTP/1.0. So, someone wrote a replacement httplib that supports HTTP/1.1 (http://www.lyra.org/greg/python/). This replacement httplib is now part of the python 2.0 distribution, so you don't need to download if you have python > 2.0. All you need to do is add an extra item to headers:
    headers = {"Connection:" : "Keep-alive", rest of headers }

    Note that the other side may not necessarily respect your Keep-alive request though
    Up the Irons
    What Would Jimi Do? Smash amps. Burn guitar. Take the groupies home.
    "Death Before Dishonour, my Friends!!" - Bruce D ickinson, Iron Maiden Aug 20, 2005 @ OzzFest
    Down with Sharon Osbourne

    "I wouldn't hire a butcher to fix my car. I also wouldn't hire a marketing firm to build my website." - Nilpo
  12. #7
  13. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Way, you learn something new every day - thanks for the info scorp . Will remember that for if it ever comes up again in the future.

    Gotta run, take care guys,

    Mark.
    programming language development: www.netytan.com Hula


IMN logo majestic logo threadwatch logo seochat tools logo