Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old January 27th, 2013, 06:18 AM
taeBaby taeBaby is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 27 taeBaby User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 17 h 19 m 39 sec
Reputation Power: 0
Findall, occurrences, sorted

Hi, I'm having trouble getting re.findall to produce all my occurrences.

Right now, it's only producing the very last row of the table.

This is part of the html code of the table:
Code:
<div id="content"><table border='0' style='padding-left:20'><tr><th>Rank</th><th align='left'>Country</th><th align='left' colspan='2'>Exports (Billion $)</th></tr><tr><td align='right'>1</td><td><a href='../china/exports.html'>China</a></td><td align='right'>1,904</td><td><img src='/img/g.gif' height='10' width='350'></td></tr><tr><td align='right'>2</td><td><a href='../united_states/exports.html'>United States</a></td><td align='right'>1,497</td><td><img src='/img/g.gif' height='10' width='275'></td></tr><tr><td align='right'>3</td><td><a href='../germany/exports.html'>Germany</a></td><td align='right'>1,408</td><td><img src='/img/g.gif' height='10' width='259'></td></tr><tr><td align='right'>4</td><td><a href='../japan/exports.html'>Japan</a></td><td align='right'>788</td><td><img src='/img/g.gif' height='10' width='145'></td></tr><tr><td align='right'>5</td><td><a href='../france/exports.html'>France</a></td><td align='right'>587.1</td><td><img src='/img/g.gif' height='10' width='108'></td></tr>

I know the problem is somewhere in my re.findall() line of code, but I can't figure out what needs to be added to make it print out all the occurrences instead of just one.

this is a part of my code:
Code:
def extract_data(filename):
  country_export = []
  
  # Open and read file
  f = open(filename, 'rU')
  text = f.read()

  tuples = re.findall(r'<td .*>(.*)</td><td><a .*>(.*)</a></td><td .*>(.*)</td>', text)
  print tuples


So this code is only giving me
Code:
[('221', 'Tokelau', '')]

The other data don't show up and also, I'm not sure about why the zero is not showing up either...when I tried it in IDLE, it showed up, but when I tried in the command line (as seen from the output above) it doesn't show.

Reply With Quote
  #2  
Old January 27th, 2013, 02:18 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,393 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 3 Days 15 h 37 m 10 sec
Reputation Power: 383
Instead of .* in your patterns
[^>/n]*
might be more appropriate.
Any character except > or newline.
I suppose either of us could read the manual about how newlines are treated. Productive for another project.

How about you use the html parser? Recently people have complained about the unreadable, impossible to understand python documentation. This bit would be a counterexample.
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #3  
Old January 27th, 2013, 03:30 PM
taeBaby taeBaby is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 27 taeBaby User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 17 h 19 m 39 sec
Reputation Power: 0
Oh my gosh! Thank you so much! It's working correctly now!!
I changed it a slight bit but I would never have gotten there without your help!!
Thank you!

Reply With Quote
  #4  
Old January 27th, 2013, 03:59 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,393 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 3 Days 15 h 37 m 10 sec
Reputation Power: 383
So I didn't convince you to use the html parser. Your loss.

Reply With Quote
  #5  
Old January 27th, 2013, 04:09 PM
eliskan eliskan is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Posts: 43 eliskan User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 15 h 59 m 28 sec
Reputation Power: 1
Yes I would suggest using a module built specifically for HTML parsing. I know BeautifulSoup is another module that people seem to enjoy.

Reply With Quote
  #6  
Old January 27th, 2013, 04:11 PM
taeBaby taeBaby is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jan 2013
Posts: 27 taeBaby User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 17 h 19 m 39 sec
Reputation Power: 0
Quote:
Originally Posted by b49P23TIvg
So I didn't convince you to use the html parser. Your loss.

haha, you actually did in fact. I will aim to implement it in the future.

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > Findall, occurrences, sorted

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap