Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    4
    Rep Power
    0

    Python and Regular expressions


    I am trying to learn python and regular expressions.
    I am trying to figure out a way to pull the following line out of a string and then pull out the ip address. I have the file opened and put into a string, I am just having trouble matching the following line (in python with the re module).

    IP Address:</td><td><font face=verdana size=2>anyipaddress</td>

    Any help would be appreciated.

    coscarart
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    I'm a lil unclear here , which part of the string did you want? Oh, and does 'IP Address' appear at the beginning of the line ot was this just an example?

    Anyway I made a very small regexp and matched it no problem of corse there are some improvments you might want to make to this i.e. replace the (.+?) with something like [0-9\.]

    >>> import re
    >>> s = 'IP Address:</td><td><font face=verdana size=2>anyipaddress</td>'
    >>> re.findall('[a-zA-z]:<.*>(.+?)<', s)
    ['anyipaddress']

    Note: since i didn't know which part of the string you wanted i've just gone and grabbed the text in bold. You can change which part of the regex gets returned from findall by moving or adding more () groups

    Have fun.
    Mark
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    Hi, I'm Calvin
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Location
    LosAngeles, SanDiego, Houston
    Posts
    50
    Rep Power
    11
    Mark,
    Hey, I'm kind of confused about this part that you posted:

    -----------------------------------------
    re.findall('[a-zA-z]:<.*>(.+?)<', s)
    -----------------------------------------

    the first paramter for the findall() function looks very cryptic, which is sort of boggling to me because i didn't think you could code in pythin like that. i've never seen that kind of notation, or whatever that's called.

    could ya tell me what it is, or where I might be able to look that kind of notation up? i mean, when i was reading the coscarart's question i was thinking up a way to do it, but the way i was concocting in my head was a hell of a lot more complicated...

    thanks for helping a python newbie learn!
  6. #4
  7. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Hi,

    In most of the doc's on regex you'll see re.compile() allot, it's a great object in itself but for such a small task i hardly see the point especially when Python allows you to write it this way (most of the functions in re allow this). But if your going to use the same regex over and over it's probably a good idea to compile it first

    The re.findall() function is pretty simple itself, you pass it a pattern and it returns all the parts of the matched pattern within '()' definatly easier to use than match.. The regex i used is simple so i'm guessing you understood that?

    Anyway i hope that this answers your questions If not feel free to ask more , always happy to help them if i can.

    Have fun,
    Mark.
    programming language development: www.netytan.com Hula

  8. #5
  9. No Profile Picture
    Hi, I'm Calvin
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Location
    LosAngeles, SanDiego, Houston
    Posts
    50
    Rep Power
    11
    haha... i was more referring to the first parameter of the finall() function

    '[a-zA-z]:<.*>(.+?)<'

    that looks almost PERL-like to me or something... i have no clue what that does!

    thanks again. i'm so glad i found this board, it's fun just learning tidbits of python here and there outside of what i'm using it for, ya kno? python is so great... i'm dismayed at the fact that this internship is going to end at some point and then i'll have to go back to school and use C or Java (which I used to love) in those programming courses =/
  10. #6
  11. No Profile Picture
    Hi, I'm Calvin
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Location
    LosAngeles, SanDiego, Houston
    Posts
    50
    Rep Power
    11
    well, i neglected to say... i know what the function does... you've extracted 'anyipaddress' out of the string s and the function returns that string in a list. i'll assume that the string i was confused about is like a string template for telling the re.findall() function what to strip.

    i'd just like to know how that works, or where i can find more information on that.

    particularly, does [a-zA-Z] mean all characters lowercase and uppercase btwn a and z?

    if i were to do something like [a-gH-K] would it denote all lowercase letters btwn a&g, and all uppercase letters btwn H&K?

    also, what's with the '<.*>' and '(.+?)' ? oh, and '[0-9\.]' ?

    i'd really love to know where i can learn this from, and how i can use this type of notation in diff ways. python never ceases to tickle my curiosity... so many cool features dude!
  12. #7
  13. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    lol oops sorry, Python has perl style regex, which is prob' the reason it looks very perl like , bare with me people. regex are not the easiest thing to explain!

    Match a char' regardless or chase or type followed by ':<' and 0 or more chars (not '\n') untill the last '>' that fits the pattern. The brackets around the '.+?' tell fetchall() to return 1 or more char's of any type untill the the first '<'. *breaths*

    Ok hope that makes some sence to you. In any case if you learn how to do regex in perl or PHP you can carry them over to Python (and vies-versa) without a problem!

    I know what you mean Cv, Python is a great lang, i havn't really touched much else since i picked it up .

    But if you're gonna use Java and your missing Python you could always try Jython (ttp://www.jython.org/) just one of the tools in the Python programmers arsenal.. and i dont think Java has anything on Python anyway!

    Mark.
    programming language development: www.netytan.com Hula

  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    Note to netytan: \w is the same as [a-zA-Z] and you should really use <.*?> so that the * isn't greedy
  16. #9
  17. No Profile Picture
    Hi, I'm Calvin
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Location
    LosAngeles, SanDiego, Houston
    Posts
    50
    Rep Power
    11
    haha ok... so all that is part of regular expressions (or regex) or something... cool, i'll look that up and try to learn. thanks!
  18. #10
  19. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    i'd sugest you have a look on google for a good regex tutorial i.e. http://www.amk.ca/python/howto/regex/

    you have the a-zA-Z consept down and the 0-9 thing woks in exactly the same way as that, so [0-9\.] will match any number and '.'

    . = any char except a '\n' (unless told otherwise)
    + = 1 or more occureneces of a given char i.e. '.'
    * = 0 or more, like + this will match as many as it can (greedy)
    ? = stops + and * from being greedy , kinda like a girl friend
    \ = escapes a special char (like " or ' in strings)

    Take care,
    Mark.
    Last edited by netytan; September 23rd, 2003 at 10:04 AM.
    programming language development: www.netytan.com Hula

  20. #11
  21. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    Note to Strike: it needed to be greedy, if it wasn't then the regex wouldn't work. Thanks for the \w though that totally slipped my mind
    programming language development: www.netytan.com Hula

  22. #12
  23. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    4
    Rep Power
    0
    I am sorry if I was not clear and wasted your time, but thank you for trying to help!. The thing I am trying to parse is the HTML of a linksys router page. So the html is actually really really long. Here is a chunk.

    somestuff....IP Address:</td><td><font face=verdana size=2>192.224.214.213</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font color=white face=verdana size=2>Subnet Mask:</td><td><font face=verdana size=2>192.160.152.123</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font color=white face=verdana size=2>Default Gateway:</td><td><font face=verdana size=2>12.246.16.1</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font color=white face=verdana size=2>DNS:</td><td><font face=verdana size=2>220.213.227.654<br>204.127.202.4<br>0.0.0.0</td></tr><tr><td bgcolor=6666cc>&nbsp; &nbsp;<font></th></tr></table></center></body></html> more stuff.....

    So this chunk is just part of the larger one. I have Bolded the chunk that I want extracted. What I want to do is extract the IP address after the words IP Address:
    Any help would be appreciated. Also thanks for the link to the python regex howto!
  24. #13
  25. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    I suggest that you just strip all the HTML out first and then use regexes to find the data based on the surrounding text.

    To remove all the HTML tags in a string s, you would do re.sub('<.*?>', '', s).

    Example (note: string breaks are my edits, weren't actually used in the code - simply done so that the page isn't a mile wide):

    Code:
    >>> s = 'IP Address:</td><td><font face=verdana size=2>192.224.214.213</td></tr><tr><td bgcolor=6666cc>   <font color=white face=verdana size=2>
    Subnet Mask:</td><td><font face=verdana size=2>192.160.152.123</td></tr><tr><td bgcolor=6666cc>
       <font color=white face=verdana size=2>Default Gateway:</td><td><font face=verdana size=2>
    12.246.16.1</td></tr><tr><td bgcolor=6666cc>   <font color=white face=verdana size=2>DNS:</td><td><font 
    face=verdana size=2>220.213.227.654<br>204.127.202.4<br>0.0.0.0</td></tr><tr><td bgcolor=6666cc>   <font></th></tr></table></center></body></html>'
    >>> re.sub('<.*?>', '', s)
    'IP Address:192.224.214.213   Subnet Mask:192.160.152.123   Default Gateway:12.246.16.1   DNS:220.213.227.654204.127.202.40.0.0.0   '
    >>>
    Note that the DNS entries are jumbled (and one is an invalid IP address ..), so you may want to put in spaces for all <br> tags as well.
    Last edited by Strike; September 23rd, 2003 at 12:31 PM.
  26. #14
  27. No Profile Picture
    Junior Member
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2003
    Posts
    4
    Rep Power
    0
    Thanks! Getting rid of all of the html made it easy to get what I wanted! The code I have goes as follows
    Code:
    #!/usr/bin/python2.2
    import sys
    import os
    import re
    def ExternalIP():
    	os.system ("wget -O/tmp/Status.htm --http-pass='nothing' --http-user='nothing' http://192.168.0.1/Status.htm")
    	status = open('/tmp/Status.htm').read()
    	os.system ('rm /tmp/Status.htm')
    	ipline = re.sub('<.*?>|&nbsp','',status)
    	ipline = re.sub(';','\n',ipline)
    	ipline = re.sub('\n[+]','\n',ipline)
    	ip = re.findall ('IP Address:.*',ipline
    	ipiwant = re.sub ('[a-zA-Z:]','',ip[1])
    	return ipiwant
    I then use this to update my dynamic dns service.

    This is the first program I have ever written so I know it probably sucks, but it works! Thanks for the help!

    By the way does anyone know if there is a python module that can pull a password protected file from a server? I looked at urllib and it couldn't do it so I used wget.

    coscarart
    Last edited by coscarart; September 23rd, 2003 at 01:52 PM.
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2001
    Location
    Houston, TX
    Posts
    383
    Rep Power
    13
    urllib2 can do it, what problems were you having? It's just a matter of how you pass the password in. I'm not sure how you do it, honestly, but I imagine it's just a header that you set.
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo