I am brand new to coding, i need some coding for work/project, i hope people here can help me, i have read read and tried and tried i have no idea why i cant get my codes to work.
So here is my situation, i need to get some web scraping done, i can't even install beautifulsoup on my Mac. I follow everyone else instruction type Python setup.py install and what i got was : >>> python setup.py install
File "<stdin>", line 1
python setup.py install
^
SyntaxError: invalid syntax
I dont really know how to get it installed.
Then i have to scrape, data from this site:
drexel.bncollege.c 0m/webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&storeId=31061&langId=-1
(have to change the o to 0 cos i can't post URL on post)
Every single Campus, every single Term, Every single Department,Every single Course .....Section.... Book and It's Price. Then i have to put these data into an excel file, under their own column heading.
I had an attempt to scrape but have totally no idea how to put into excel.
Below is what i wrote, i just use the codes from a tutorial, because i don't know where else i can
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re
webpage = urlopen('(drexel.bncollege.c /webapp/wcs/stores/servlet/TBWizardView?catalogId=10001&storeId=31061&langId=-1').re
(I changed the URL above cos i can't post URL in post).
patFinderTitle = re.compile( '<title>(.*)</title>')
patFinderLink = re.compile( '<link rel.*href="(.*)"/>')
findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink ,webpage)
listIterator = []
listIterator[:] = range(1,20)
for i in listIterator:
print findPatTitle[i]
print findPatLink[i]
articlePage = urlopen(findPatLink[i]).read()
divBegin = articlePage.find(<div align="left">Schedule for Winter Quarter 12-13</div>
article = articlePage[divBegin

divBegin+1000)]
soup = BeautifulSoup(article)
paraglist = soup.findAll('p')
for i in paraglist:
print i
print "\n"
Thanks for your time and help