#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0

    Having trouble logging into a web site and saving the source code


    Hi guys,

    I'm getting really frustrated at my failed attempts at using the requests module to log in to a web site. Can anyone help me?

    I am able to retrieve and save the source of other sites, but I really, really want to be able to log in to a site, then go to another site that requires me to be logged in and save that page's source code.

    I appreciate the help! My code thus far is below.

    Tony



    Code:
    import cookielib
    import urllib
    import urllib2
    import requests
    import sys
    
    downloaded_data  = urllib2.urlopen('https://login.yahoo.com/config/login?.src=spt&.intl=us&.lang=en-US&.done=http://sports.yahoo.com/fantasy/')
    
    text_file = open("Output.txt", "w")
    
    for line in downloaded_data.readlines():
        print line
        text_file.write(line)
    text_file.close()
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0
    Anyone?
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0
    Your code is working (though you have unnecessary imports) - it reads in the html that the Yahoo URL returns, and writes it to a txt file.

    However, I'm guessing that that login page is not what you're looking for because that URL will only ever return the login page, even if you are already logged in previously.

    So the problem is that the script is doing *exactly* what you're telling it to do:

    1. Go to the secure yahoo url you provided in the script.

    2. Read the raw html output to memory (which is always a Yahoo login page, so useless data).

    3. Write the raw html response to a txt file.

    It is doing that. So your next step is probably to get the right URL, because the one you're using is useless.

    I wish I could help more but unfortunately I'm inexperienced (I found your post while going to ask a question myself!). Good luck!
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0
    Thanks for your response Mark.

    Yes, it does exactly what I'm telling it to lol, but I really want to add the part that logs me in so THEN I can go to the other site that requires me to be logged in to view.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0
    Here's my updated code guys:

    Code:
    import cookielib
    import urllib
    import urllib2
    import requests
    import sys
    
    s = requests.Session()
    data={"login":"PythonTesting","password":"123Hello123"}
    url="https://login.yahoo.com/config/login?.src=spt&.intl=us&.lang=en-US&.done=http://sports.yahoo.com/fantasy"
    r = s.post(url,data=data)
    
    downloaded_data = urllib2.urlopen('http://basketball.fantasysports.yahoo.com/nba/reg/joinleague/public')
    
    text_file = open("Output.txt", "w")
    
    for line in downloaded_data.readlines():
        text_file.write(line)
    text_file.close()
    It unfortunately doesn't work though.

    Oh and the user id and password in my code are valid.

    Tony
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0
    Your code is pointing to a page in which there is no useful data - it's just a login page. From there, even if you logged in, it just takes you to a page to join a league.

    I think you need a direct URL to the data you are trying to scrape.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0
    I'm sorry about that Mark. The code below now contains correct login info and correct URLs. That second URL contains player information that I'd like to be able to scrape.

    Code:
    import cookielib
    import urllib
    import urllib2
    import requests
    import sys
    
    s = requests.Session()
    data={"login":"pythontesting","password":"123Hello123"}
    url="https://login.yahoo.com/config/login?.src=spt&.intl=us&.lang=en-US&.done=http://sports.yahoo.com/fantasy"
    r = s.post(url,data=data)
    
    downloaded_data  = urllib2.urlopen('http://basketball.fantasysports.yahoo.com/nba/204781/research')
    
    text_file = open("Output.txt", "w")
    
    for line in downloaded_data.readlines():
        text_file.write(line)
    text_file.close()
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    4
    Rep Power
    0
    Did you see this by any chance -

    Code:
    https://github.com/mleveck/YHandler
    Apparently Yahoo has an API for their fantasy data.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0
    Wow, that seems very complex lol, but thank you. I'll take a closer look at it. Am I not kind of close though? I feel like I am.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2013
    Posts
    9
    Rep Power
    0
    Actually Mark, I got it now! I set up my Yahoo league so that you don't HAVE to be logged in!

    I was so happy when my code started working, BUT NOW, only a mere hour later, I started getting all these errors. What the hell is going on? lol. I didn't change anything!

    Code:
    Traceback (most recent call last):
      File "C:\Users\Tony\Desktop\test2.py", line 4, in <module>
        import requests
      File "C:\Python27\lib\requests\__init__.py", line 58, in <module>
        from . import utils
      File "C:\Python27\lib\requests\utils.py", line 24, in <module>
        from .compat import parse_http_list as _parse_list_header
      File "C:\Python27\lib\requests\compat.py", line 7, in <module>
        from .packages import charade as chardet
      File "C:\Python27\lib\requests\packages\__init__.py", line 3, in <module>
        from . import urllib3
      File "C:\Python27\lib\requests\packages\urllib3\__init__.py", line 16, in <mod
    ule>
        from .connectionpool import (
      File "C:\Python27\lib\requests\packages\urllib3\connectionpool.py", line 38, i
    n <module>
        from .request import RequestMethods
      File "C:\Python27\lib\requests\packages\urllib3\request.py", line 12, in <modu
    le>
        from .filepost import encode_multipart_formdata
      File "C:\Python27\lib\requests\packages\urllib3\filepost.py", line 15, in <mod
    ule>
        from .fields import RequestField
      File "C:\Python27\lib\requests\packages\urllib3\fields.py", line 7, in <module
    >
        import email.utils
    ImportError: No module named utils

IMN logo majestic logo spyfu logo threadwatch logo seochat tools logo