July 11th, 2012, 02:30 PM
Extracting a table from the Web
I have recently begun learning computer programming using the Python language (using Python 2.5.4). I use Mac OS X. Amidst my learning, I have been assigned the task to extract a web table, which I will specify in a second, using Python programming and convert it into a format that is readable by (can be placed neatly and directed into) Microsoft Excel.
The link below has one table with statistics about the National Hockey League (NHL).
I have been reading about ways to complete the task, but I realize that people with much more experience using Python may be able to help me more than the books.
If anyone has a code that is designed to do just this and can be adjusted to the particular website that I need to work with, provide any helpful and guiding knowledge in generating the code, or even texts which I can read that will help me write the code, that would be greatly appreciated! Thanks in advance!
July 11th, 2012, 03:52 PM
Looks like there are several tables on that page. Five tables? (search the page source for "<table")
The most obvious one says that it shows rows 1-30 of 1230 results. Do you need all 1230?
You might try the python csv library module to write a file that excel can read.
The python libraries also are packed with html functionality. There could be a reader that, as one of its features, identifies tables. I myself would do something stupid like write my own code to parse the page source, find the table rows <tr>blah blah blah </tr>
where the stuff in between is table data <td>information</td>
but hey, that may account for my being unemployed.
[/code] are essential for python code and Makefiles!