Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Iron Speed
Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Be the architects of evolution and help create the mobile internet future. It’s your move---enter to win here!
  #1  
Old September 22nd, 2003, 08:03 PM
coscarart coscarart is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Posts: 4 coscarart User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Python and Regular expressions

I am trying to learn python and regular expressions.
I am trying to figure out a way to pull the following line out of a string and then pull out the ip address. I have the file opened and put into a string, I am just having trouble matching the following line (in python with the re module).

IP Address:</td><td><font face=verdana size=2>anyipaddress</td>

Any help would be appreciated.

coscarart

Reply With Quote
  #2  
Old September 23rd, 2003, 02:10 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
I'm a lil unclear here , which part of the string did you want? Oh, and does 'IP Address' appear at the beginning of the line ot was this just an example?

Anyway I made a very small regexp and matched it no problem of corse there are some improvments you might want to make to this i.e. replace the (.+?) with something like [0-9\.]

>>> import re
>>> s = 'IP Address:</td><td><font face=verdana size=2>anyipaddress</td>'
>>> re.findall('[a-zA-z]:<.*>(.+?)<', s)
['anyipaddress']

Note: since i didn't know which part of the string you wanted i've just gone and grabbed the text in bold. You can change which part of the regex gets returned from findall by moving or adding more () groups

Have fun.
Mark
__________________
programming language development: www.netytan.com Hula


Reply With Quote
  #3  
Old September 23rd, 2003, 08:25 AM
cvchen cvchen is offline
Hi, I'm Calvin
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Location: LosAngeles, SanDiego, Houston
Posts: 50 cvchen User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 m 7 sec
Reputation Power: 5
Mark,
Hey, I'm kind of confused about this part that you posted:

-----------------------------------------
re.findall('[a-zA-z]:<.*>(.+?)<', s)
-----------------------------------------

the first paramter for the findall() function looks very cryptic, which is sort of boggling to me because i didn't think you could code in pythin like that. i've never seen that kind of notation, or whatever that's called.

could ya tell me what it is, or where I might be able to look that kind of notation up? i mean, when i was reading the coscarart's question i was thinking up a way to do it, but the way i was concocting in my head was a hell of a lot more complicated...

thanks for helping a python newbie learn!

Reply With Quote
  #4  
Old September 23rd, 2003, 09:06 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Hi,

In most of the doc's on regex you'll see re.compile() allot, it's a great object in itself but for such a small task i hardly see the point especially when Python allows you to write it this way (most of the functions in re allow this). But if your going to use the same regex over and over it's probably a good idea to compile it first

The re.findall() function is pretty simple itself, you pass it a pattern and it returns all the parts of the matched pattern within '()' definatly easier to use than match.. The regex i used is simple so i'm guessing you understood that?

Anyway i hope that this answers your questions If not feel free to ask more , always happy to help them if i can.

Have fun,
Mark.

Reply With Quote
  #5  
Old September 23rd, 2003, 09:19 AM
cvchen cvchen is offline
Hi, I'm Calvin
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Location: LosAngeles, SanDiego, Houston
Posts: 50 cvchen User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 m 7 sec
Reputation Power: 5
haha... i was more referring to the first parameter of the finall() function

'[a-zA-z]:<.*>(.+?)<'

that looks almost PERL-like to me or something... i have no clue what that does!

thanks again. i'm so glad i found this board, it's fun just learning tidbits of python here and there outside of what i'm using it for, ya kno? python is so great... i'm dismayed at the fact that this internship is going to end at some point and then i'll have to go back to school and use C or Java (which I used to love) in those programming courses =/

Reply With Quote
  #6  
Old September 23rd, 2003, 09:27 AM
cvchen cvchen is offline
Hi, I'm Calvin
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Location: LosAngeles, SanDiego, Houston
Posts: 50 cvchen User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 m 7 sec
Reputation Power: 5
well, i neglected to say... i know what the function does... you've extracted 'anyipaddress' out of the string s and the function returns that string in a list. i'll assume that the string i was confused about is like a string template for telling the re.findall() function what to strip.

i'd just like to know how that works, or where i can find more information on that.

particularly, does [a-zA-Z] mean all characters lowercase and uppercase btwn a and z?

if i were to do something like [a-gH-K] would it denote all lowercase letters btwn a&g, and all uppercase letters btwn H&K?

also, what's with the '<.*>' and '(.+?)' ? oh, and '[0-9\.]' ?

i'd really love to know where i can learn this from, and how i can use this type of notation in diff ways. python never ceases to tickle my curiosity... so many cool features dude!

Reply With Quote
  #7  
Old September 23rd, 2003, 09:48 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
lol oops sorry, Python has perl style regex, which is prob' the reason it looks very perl like , bare with me people. regex are not the easiest thing to explain!

Match a char' regardless or chase or type followed by ':<' and 0 or more chars (not '\n') untill the last '>' that fits the pattern. The brackets around the '.+?' tell fetchall() to return 1 or more char's of any type untill the the first '<'. *breaths*

Ok hope that makes some sence to you. In any case if you learn how to do regex in perl or PHP you can carry them over to Python (and vies-versa) without a problem!

I know what you mean Cv, Python is a great lang, i havn't really touched much else since i picked it up .

But if you're gonna use Java and your missing Python you could always try Jython (ttp://www.jython.org/) just one of the tools in the Python programmers arsenal.. and i dont think Java has anything on Python anyway!

Mark.

Reply With Quote
  #8  
Old September 23rd, 2003, 09:51 AM
Strike Strike is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 383 Strike User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 41 m 27 sec
Reputation Power: 7
Send a message via ICQ to Strike Send a message via AIM to Strike Send a message via Yahoo to Strike
Note to netytan: \w is the same as [a-zA-Z] and you should really use <.*?> so that the * isn't greedy

Reply With Quote
  #9  
Old September 23rd, 2003, 09:55 AM
cvchen cvchen is offline
Hi, I'm Calvin
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Location: LosAngeles, SanDiego, Houston
Posts: 50 cvchen User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 7 m 7 sec
Reputation Power: 5
haha ok... so all that is part of regular expressions (or regex) or something... cool, i'll look that up and try to learn. thanks!

Reply With Quote
  #10  
Old September 23rd, 2003, 10:01 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
i'd sugest you have a look on google for a good regex tutorial i.e. http://www.amk.ca/python/howto/regex/

you have the a-zA-Z consept down and the 0-9 thing woks in exactly the same way as that, so [0-9\.] will match any number and '.'

. = any char except a '\n' (unless told otherwise)
+ = 1 or more occureneces of a given char i.e. '.'
* = 0 or more, like + this will match as many as it can (greedy)
? = stops + and * from being greedy , kinda like a girl friend
\ = escapes a special char (like " or ' in strings)

Take care,
Mark.

Last edited by netytan : September 23rd, 2003 at 10:04 AM.

Reply With Quote
  #11  
Old September 23rd, 2003, 10:08 AM
netytan's Avatar
netytan netytan is offline
Hello World :)
Dev Shed Frequenter (2500 - 2999 posts)
 
Join Date: Mar 2003
Location: Hull, UK
Posts: 2,529 netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level)netytan User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 1 Week 2 Days 17 h 19 m 5 sec
Reputation Power: 63
Send a message via ICQ to netytan Send a message via AIM to netytan Send a message via MSN to netytan Send a message via Yahoo to netytan
Note to Strike: it needed to be greedy, if it wasn't then the regex wouldn't work. Thanks for the \w though that totally slipped my mind

Reply With Quote
  #12  
Old September 23rd, 2003, 12:24 PM
coscarart coscarart is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Posts: 4 coscarart User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
I am sorry if I was not clear and wasted your time, but thank you for trying to help!. The thing I am trying to parse is the HTML of a linksys router page. So the html is actually really really long. Here is a chunk.

somestuff....IP Address:</td><td><font face=verdana size=2>192.224.214.213</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font color=white face=verdana size=2>Subnet Mask:</td><td><font face=verdana size=2>192.160.152.123</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font color=white face=verdana size=2>Default Gateway:</td><td><font face=verdana size=2>12.246.16.1</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font color=white face=verdana size=2>DNS:</td><td><font face=verdana size=2>220.213.227.654<br>204.127.202.4<br>0.0.0.0</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font></th></tr></table></center></body></html> more stuff.....

So this chunk is just part of the larger one. I have Bolded the chunk that I want extracted. What I want to do is extract the IP address after the words IP Address:
Any help would be appreciated. Also thanks for the link to the python regex howto!

Reply With Quote
  #13  
Old September 23rd, 2003, 12:29 PM
Strike Strike is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 383 Strike User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 41 m 27 sec
Reputation Power: 7
Send a message via ICQ to Strike Send a message via AIM to Strike Send a message via Yahoo to Strike
I suggest that you just strip all the HTML out first and then use regexes to find the data based on the surrounding text.

To remove all the HTML tags in a string s, you would do re.sub('<.*?>', '', s).

Example (note: string breaks are my edits, weren't actually used in the code - simply done so that the page isn't a mile wide):

Code:
>>> s = 'IP Address:</td><td><font face=verdana size=2>192.224.214.213</td></tr><tr><td bgcolor=6666cc>   <font color=white face=verdana size=2>
Subnet Mask:</td><td><font face=verdana size=2>192.160.152.123</td></tr><tr><td bgcolor=6666cc>
   <font color=white face=verdana size=2>Default Gateway:</td><td><font face=verdana size=2>
12.246.16.1</td></tr><tr><td bgcolor=6666cc>   <font color=white face=verdana size=2>DNS:</td><td><font 
face=verdana size=2>220.213.227.654<br>204.127.202.4<br>0.0.0.0</td></tr><tr><td bgcolor=6666cc>   <font></th></tr></table></center></body></html>'
>>> re.sub('<.*?>', '', s)
'IP Address:192.224.214.213   Subnet Mask:192.160.152.123   Default Gateway:12.246.16.1   DNS:220.213.227.654204.127.202.40.0.0.0   '
>>>


Note that the DNS entries are jumbled (and one is an invalid IP address ..), so you may want to put in spaces for all <br> tags as well.

Last edited by Strike : September 23rd, 2003 at 12:31 PM.

Reply With Quote
  #14  
Old September 23rd, 2003, 01:50 PM
coscarart coscarart is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Sep 2003
Posts: 4 coscarart User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Thanks! Getting rid of all of the html made it easy to get what I wanted! The code I have goes as follows
Code:
#!/usr/bin/python2.2
import sys
import os
import re
def ExternalIP():
	os.system ("wget -O/tmp/Status.htm --http-pass='nothing' --http-user='nothing' URL")
	status = open('/tmp/Status.htm').read()
	os.system ('rm /tmp/Status.htm')
	ipline = re.sub('<.*?>|&nbsp','',status)
	ipline = re.sub(';','\n',ipline)
	ipline = re.sub('\n[+]','\n',ipline)
	ip = re.findall ('IP Address:.*',ipline
	ipiwant = re.sub ('[a-zA-Z:]','',ip[1])
	return ipiwant


I then use this to update my dynamic dns service.

This is the first program I have ever written so I know it probably sucks, but it works! Thanks for the help!

By the way does anyone know if there is a python module that can pull a password protected file from a server? I looked at urllib and it couldn't do it so I used wget.

coscarart