Python Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me

The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.

Go Back   Dev Shed ForumsProgramming LanguagesPython Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old February 9th, 2013, 03:17 PM
dariyoosh's Avatar
dariyoosh dariyoosh is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Location: Iran / France
Posts: 138 dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Days 9 h 8 m 33 sec
Reputation Power: 133
Question RegExp: Question about the caret operator in multiline strings with flag re.MULTILINE

Hello everybody


OS: Windows Vista (32 bits)
Python version: 3.2.3
Text editor: Notepad++ with Windows End Of Line


I have a question about the caret operator within python regular expressions. According to the online documentation (Library reference):
http://docs.python.org/release/3.2.3/library/re.html
Quote:
... (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline ...
Well, I wanted to see how this works as it could really be helpful in some cases to check for example the first character of a multiline string. I created the following script:

Code:
import re

def main():
    text = """line1
line2
line3"""

    prog = re.compile(r"^l", re.MULTILINE)
    if prog.match(text):
        print ("Text does match the regular expression")
    else:
        print ("Text does not match the regular expression")
    
    
main()


So in this example, let's say that we want to see whether the string matches a pattern according to which the first character of each line (right after the new line character) is the letter 'l' (the lowercase of 'L'). Well this example, obviously works as 'line1', 'line2' and 'line3' start with the letter 'l'. So here is the output of the script

Code:
C:\> python -tt myscript.py
Text does match the regular expression
C:\>


Just, to see the impact of the re.MULTILINE, I changed the first letter of the second line and I put for example 'D' instead of 'l' and I expected the text to be rejected this time, yet I got the very same output. Again I checked the online documentation for the flag MULTILINE in the re module:

http://docs.python.org/release/3.2.3/library/re.html#re.MULTILINE
Quote:
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.


So if the pattern matches the beginning of each line, why the multiline string is not rejected when the first character of the second line does not start with 'l'?

Could someone kindly make some clarification?

Thanks in advance,
Dariyoosh

Reply With Quote
  #2  
Old February 9th, 2013, 09:27 PM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,458 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 6 h 26 m 43 sec
Reputation Power: 403
I haven't found anyone yet who understands why there is a re match functionality. Your program might work as you expect if in place of the nasty match use use

prog.search
__________________
[code]Code tags[/code] are essential for python code!

Reply With Quote
  #3  
Old February 9th, 2013, 09:51 PM
requinix's Avatar
requinix requinix is offline
Still alive
Dev Shed God 16th Plane (12500 - 12999 posts)
 
Join Date: Mar 2007
Location: Washington, USA
Posts: 12,869 requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)requinix User rank is General 120th Grade (Above 100000 Reputation Level)  Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1Folding Points: 417516 Folding Title: Super Ultimate Folder - Level 1
Time spent in forums: 5 Months 1 Week 5 Days 6 h 19 m 22 sec
Reputation Power: 8977
Send a message via AIM to requinix Send a message via MSN to requinix Send a message via Yahoo to requinix Send a message via Google Talk to requinix
Quote:
Originally Posted by dariyoosh
So if the pattern matches the beginning of each line, why the multiline string is not rejected when the first character of the second line does not start with 'l'?

Because it does match on the first line. All it requires is one match somewhere.

A better test would be changing the first letter of the first line. With MULTILINE it will still match (on the second line) and without it will not match.

Reply With Quote
  #4  
Old February 10th, 2013, 06:35 AM
dariyoosh's Avatar
dariyoosh dariyoosh is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Location: Iran / France
Posts: 138 dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Days 9 h 8 m 33 sec
Reputation Power: 133
Quote:
Originally Posted by b49P23TIvg
... Your program might work as you expect if in place of the nasty match use prog.search ...
The problem with search() is that as I understand it looks for at least one occurrence of the searched pattern and does not impose that all occurrences match the pattern (I want to make sure that the first character of each line starts by the letter 'l') So for example if the first two lines don't start with 'l' and only the third line starts with 'l' then search() will validate the string because at least one occurrence (in the third line) was found.

Reply With Quote
  #5  
Old February 10th, 2013, 06:36 AM
dariyoosh's Avatar
dariyoosh dariyoosh is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Location: Iran / France
Posts: 138 dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Days 9 h 8 m 33 sec
Reputation Power: 133
Quote:
Originally Posted by requinix
... Because it does match on the first line. All it requires is one match somewhere.
I think this is actually (according to the online documentation) the definition of the search() function not the match() function.
http://docs.python.org/release/3.2.3/library/re.html#re.match
Quote:
...
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.

If you want to locate a match anywhere in string, use search() instead.
...


So what I understand is that the MULTILINE flag is irrelevent while using the match() function within the context of my problem.

Quote:
Originally Posted by requinix
... A better test would be changing the first letter of the first line. With MULTILINE it will still match (on the second line) and without it will not match. ...
Not really, because I changed the code in the following way
Code:
import re

def main():
    text = """sine1       # So here I put 's' instead of 'l'
line2
line3"""

    prog = re.compile(r"^l", re.MULTILINE)
    if prog.match(text):
        print ("Text does match the regular expression")
    else:
        print ("Text does not match the regular expression")
    
    
main()


And we can see that even with the MULTILINE flag it doesn't match when we
change the first letter of the first line.
Code:
C:\> python -tt myscript.py
Text does not match the regular expression
C:\> 

Reply With Quote
  #6  
Old February 10th, 2013, 06:37 AM
dariyoosh's Avatar
dariyoosh dariyoosh is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Location: Iran / France
Posts: 138 dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Days 9 h 8 m 33 sec
Reputation Power: 133
So here is finally how I managed to solve the problem (using both findall() and match())

Code:
import re
import sys

def main():
    text = """line1
line2
line3"""

    entireLineProg = re.compile(r"[^\r\n]+")
    lines = entireLineProg.findall(text)
    firstOfLineProg = re.compile("^l")
    for token in lines:
        if firstOfLineProg.match(token):
            continue
        else:
            print ("Text does not match the regular expression")
            print ("bad token = " + token)
            sys.exit(-1)
            
    print ("The text was validated successfully according to the pattern")
    
main()


and this time it worked as I expected.

Thank you very much both of you for your time and your attention to my problem.


Regards,
Dariyoosh

Reply With Quote
  #7  
Old February 10th, 2013, 07:42 AM
b49P23TIvg's Avatar
b49P23TIvg b49P23TIvg is offline
Contributing User
Dev Shed Loyal (3000 - 3499 posts)
 
Join Date: Aug 2011
Posts: 3,458 b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level)b49P23TIvg User rank is Major (30000 - 40000 Reputation Level) 
Time spent in forums: 1 Month 2 Weeks 4 Days 6 h 26 m 43 sec
Reputation Power: 403
When using match why trouble yourself with the "start of line caret"?

Reply With Quote
  #8  
Old February 10th, 2013, 07:57 AM
dariyoosh's Avatar
dariyoosh dariyoosh is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2012
Location: Iran / France
Posts: 138 dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level)dariyoosh User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 2 Days 9 h 8 m 33 sec
Reputation Power: 133
Quote:
Originally Posted by b49P23TIvg
When using match why trouble yourself with the "start of line caret"?
This was just a particular example, because currently I'm reading the re module documentation in order to learn and understand better the regular expressions (that I should admit can sometimes become tricky!) and therefore in the document different operators including caret were explained. I encountered this problem among several different test scripts that I had created while I was reading the document.

So, the purpose of the question was just for learning and in fact there is not always necessarily the need to use caret each time we use match()

Quote:
Originally Posted by b49P23TIvg
I haven't found anyone yet who understands why there is a re match functionality.
Well, I think my question proved that match() can be useful in some cases

Thanks a again,

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPython Programming > RegExp: Question about the caret operator in multiline strings with flag re.MULTILINE

Developer Shed Advertisers and Affiliates



Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

© 2003-2013 by Developer Shed. All rights reserved. DS Cluster - Follow our Sitemap