April 19th, 2013, 12:55 PM
Web-Page Data Mining With Python 3.2?
So, I've been interested for a LONG, LONG time in copying data off of websites and storing it externally en masse. My code framework of choice, of course, is Python.
I don't know if there is a good library out there to help me do this, or even if the basic library could do it.
One particular example, out of about five databases I've been oogling for quite some time, is BibleGateway.
I wanted to write a program to pull information
off their database, effectively creating a little Python bible applet.
Dictionary.com, maybe google... I've wanted to find some way to make a searchable hardcopy backup of some facebook data (of my own, of course) for quite sometime now. I just don't even know what point A would be.
Does anyone know how I might do this with Python?
April 19th, 2013, 12:58 PM
How do you propose to connect to their database? I assume you mean screen scraping the data out of the HTML right?
Bugs that go away by themselves come back by themselves
Beware - your loyalty will not be rewarded
April 19th, 2013, 01:29 PM
You will need a computer for your database.
Comments on this post
[/code] are essential for python code and Makefiles!
April 19th, 2013, 03:44 PM
I haven't used it for anything non-trivial myself, but BeautifulSoup is a well-regarded library for scraping HTML.
April 19th, 2013, 05:33 PM
This was the first step to a code I had designed to study marketplace listings in GaiaOnline a long time ago.
itemPage = "http://www.gaiaonline.com/marketplace/itemdetail/"+str(itemNo)
f = urllib.urlopen(itemPage)
s = f.read()
After I opened the webpage and got its data, I just parsed it using some string manipulations. Once I got the data I wanted cleaned up, I saved it into text files that could be further studied by my matplotlib graph plotter scripts.
This would be a good place to start.
Last edited by eliskan; April 19th, 2013 at 05:35 PM.