The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Python Programming
|
Findall, occurrences, sorted
Discuss Findall, occurrences, sorted in the Python Programming forum on Dev Shed. Findall, occurrences, sorted Python Programming forum discussing coding techniques, tips and tricks, and Zope related information. Python was designed from the ground up to be a completely object-oriented programming language.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

January 27th, 2013, 06:18 AM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 27
Time spent in forums: 17 h 19 m 39 sec
Reputation Power: 0
|
|
|
Findall, occurrences, sorted
Hi, I'm having trouble getting re.findall to produce all my occurrences.
Right now, it's only producing the very last row of the table.
This is part of the html code of the table:
Code:
<div id="content"><table border='0' style='padding-left:20'><tr><th>Rank</th><th align='left'>Country</th><th align='left' colspan='2'>Exports (Billion $)</th></tr><tr><td align='right'>1</td><td><a href='../china/exports.html'>China</a></td><td align='right'>1,904</td><td><img src='/img/g.gif' height='10' width='350'></td></tr><tr><td align='right'>2</td><td><a href='../united_states/exports.html'>United States</a></td><td align='right'>1,497</td><td><img src='/img/g.gif' height='10' width='275'></td></tr><tr><td align='right'>3</td><td><a href='../germany/exports.html'>Germany</a></td><td align='right'>1,408</td><td><img src='/img/g.gif' height='10' width='259'></td></tr><tr><td align='right'>4</td><td><a href='../japan/exports.html'>Japan</a></td><td align='right'>788</td><td><img src='/img/g.gif' height='10' width='145'></td></tr><tr><td align='right'>5</td><td><a href='../france/exports.html'>France</a></td><td align='right'>587.1</td><td><img src='/img/g.gif' height='10' width='108'></td></tr>
I know the problem is somewhere in my re.findall() line of code, but I can't figure out what needs to be added to make it print out all the occurrences instead of just one.
this is a part of my code:
Code:
def extract_data(filename):
country_export = []
# Open and read file
f = open(filename, 'rU')
text = f.read()
tuples = re.findall(r'<td .*>(.*)</td><td><a .*>(.*)</a></td><td .*>(.*)</td>', text)
print tuples
So this code is only giving me
Code:
[('221', 'Tokelau', '')]
The other data don't show up and also, I'm not sure about why the zero is not showing up either...when I tried it in IDLE, it showed up, but when I tried in the command line (as seen from the output above) it doesn't show.
|

January 27th, 2013, 02:18 PM
|
 |
Contributing User
|
|
|
|
Instead of .* in your patterns
[^>/n]*
might be more appropriate.
Any character except > or newline.
I suppose either of us could read the manual about how newlines are treated. Productive for another project.
How about you use the html parser? Recently people have complained about the unreadable, impossible to understand python documentation. This bit would be a counterexample.
__________________
[code] Code tags[/code] are essential for python code!
|

January 27th, 2013, 03:30 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 27
Time spent in forums: 17 h 19 m 39 sec
Reputation Power: 0
|
|
Oh my gosh! Thank you so much! It's working correctly now!!
I changed it a slight bit but I would never have gotten there without your help!!
Thank you! 
|

January 27th, 2013, 03:59 PM
|
 |
Contributing User
|
|
|
|
|
So I didn't convince you to use the html parser. Your loss.
|

January 27th, 2013, 04:09 PM
|
|
Contributing User
|
|
Join Date: Nov 2012
Posts: 43
Time spent in forums: 15 h 59 m 28 sec
Reputation Power: 1
|
|
|
Yes I would suggest using a module built specifically for HTML parsing. I know BeautifulSoup is another module that people seem to enjoy.
|

January 27th, 2013, 04:11 PM
|
|
Registered User
|
|
Join Date: Jan 2013
Posts: 27
Time spent in forums: 17 h 19 m 39 sec
Reputation Power: 0
|
|
Quote: | Originally Posted by b49P23TIvg So I didn't convince you to use the html parser. Your loss. |
haha, you actually did in fact. I will aim to implement it in the future.
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|