#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0

    Need Help Can't Install Beautifulsoup, Cant webscrape


    I am brand new to coding, i need some coding for work/project, i hope people here can help me, i have read read and tried and tried i have no idea why i cant get my codes to work.

    So here is my situation, i need to get some web scraping done, i can't even install beautifulsoup on my Mac. I follow everyone else instruction type Python setup.py install and what i got was : >>> python setup.py install
    File "<stdin>", line 1
    python setup.py install
    ^
    SyntaxError: invalid syntax

    I dont really know how to get it installed.

    Then i have to scrape, data from this site:
    drexel.bncollege.c 0m/webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&storeId=31061&langId=-1
    (have to change the o to 0 cos i can't post URL on post)


    Every single Campus, every single Term, Every single Department,Every single Course .....Section.... Book and It's Price. Then i have to put these data into an excel file, under their own column heading.

    I had an attempt to scrape but have totally no idea how to put into excel.
    Below is what i wrote, i just use the codes from a tutorial, because i don't know where else i can

    from urllib import urlopen
    from BeautifulSoup import BeautifulSoup
    import re

    webpage = urlopen('(drexel.bncollege.c /webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&storeId=31061&langId=-1').re
    (I changed the URL above cos i can't post URL in post).

    patFinderTitle = re.compile( '<title>(.*)</title>')
    patFinderLink = re.compile( '<link rel.*href="(.*)"/>')

    findPatTitle = re.findall(patFinderTitle,webpage)
    findPatLink = re.findall(patFinderLink ,webpage)

    listIterator = []
    listIterator[:] = range(1,20)

    for i in listIterator:
    print findPatTitle[i]
    print findPatLink[i]

    articlePage = urlopen(findPatLink[i]).read()

    divBegin = articlePage.find(<div align="left">Schedule for Winter Quarter 12-13</div>
    article = articlePage[divBegindivBegin+1000)]

    soup = BeautifulSoup(article)

    paraglist = soup.findAll('p')

    for i in paraglist:
    print i
    print "\n"

    Thanks for your time and help
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,851
    Rep Power
    481
    Don't know anything about beautiful soup, however....

    You should run this program from the operating system shell, not from the python shell.


    $ python setup.py install

    or, if you happen to use a DOS computer,
    A:> python setup.py install
    [code]Code tags[/code] are essential for python code and Makefiles!
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by b49P23TIvg
    Don't know anything about beautiful soup, however....

    You should run this program from the operating system shell, not from the python shell.


    $ python setup.py install

    or, if you happen to use a DOS computer,
    A:> python setup.py install
    I ran it on terminal on Mac tho, it still gave me that. I have no idea what i suppose to do
  6. #4
  7. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,851
    Rep Power
    481
    $

    That is your shell prompt.


    >>>

    This is a python prompt.


    Running the python setup command in a terminal is correct.

    Starting python first, and then giving the python setup command is wrong.




    Correct procedure:
    1)Open a terminal.
    2)Change directory to that directory holding the beautiful soup setup.py file. Use the cd command.
    3) python setup.py install
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo