#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    25
    Rep Power
    0

    Web Scraping the MLB Web Site


    I'm trying to do web scraping on the mlb website. I'm trying to pull the stat from one player. When I go to the official website for the player and I do view source on my desktop, I don't see the player stat on the source. What is rare, is that when I check the view source on my Ipad, I can see the player stat.
    Here is my code:
    Code:
    import urllib2
    from BeautifulSoup import  BeautifulSoup
    import re
    
    soup = BeautifulSoup(urllib2.urlopen("http://mlb.mlb.com/team/player.jsp?player_id=429664#gameType='R'&sectionType=career&statType=1&season=2012&level='ALL'").read())
    
    print soup
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,931
    Rep Power
    481
    Look at the page source. The stats aren't there!
    The page is dynamic, that is it evaluates the javascript when rendering and the stats load from a database (in Cooperstown, no doubt.)
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    25
    Rep Power
    0
    Originally Posted by b49P23TIvg
    Look at the page source. The stats aren't there!
    The page is dynamic, that is it evaluates the javascript when rendering and the stats load from a database (in Cooperstown, no doubt.)
    Can you give me an example, I have no clue and I dont see any good tutorial about this online. I would like to get the country that this player was born. Thanks
  6. #4
  7. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,183
    Rep Power
    9398
    The MLB is very territorial about their data.

    Terms of Use, Section 1:
    You must not use the MLBAM Properties or Community Features (defined below) to:... (xii) use automated scripts to collect information from or otherwise interact with this Website or the other MLBAM Properties
    Your script is dead in the water.
  8. #5
  9. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,931
    Rep Power
    481
    Interesting. Good I didn't recommend evaluating the javascript to get the information.

    Did you try this link?


    Robinson Jose (Mercedes) Cano (twitter: @RobinsonCano)

    Position: Second Baseman
    Bats: Left, Throws: Right
    Height: 6' 0", Weight: 210 lb.

    Born: October 22, 1982 in San Pedro de Macoris, San Pedro de Macoris, D.R. (Age 30)
    [code]Code tags[/code] are essential for python code and Makefiles!
  10. #6
  11. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,183
    Rep Power
    9398
    Originally Posted by b49P23TIvg
    Interesting. Good I didn't recommend evaluating the javascript to get the information.

    Did you try this link?
    They don't like it either.
    6. Site Content.

    You may not frame, capture, harvest, or collect any part of the Site or Content without SRL's advance written consent. [...]

IMN logo majestic logo threadwatch logo seochat tools logo