Hello everybody
OS: Windows Vista (32 bits)
Python version: 3.2.3
Text editor: Notepad++ with Windows End Of Line
I have a question about the caret operator within python regular expressions. According to the online documentation (Library reference):
http://docs.python.org/release/3.2.3/library/re.html
Quote:
| ... (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline ... |
Well, I wanted to see how this works as it could really be helpful in some cases to check for example the first character of a multiline string. I created the following script:
Code:
import re
def main():
text = """line1
line2
line3"""
prog = re.compile(r"^l", re.MULTILINE)
if prog.match(text):
print ("Text does match the regular expression")
else:
print ("Text does not match the regular expression")
main()
So in this example, let's say that we want to see whether the string matches a pattern according to which the first character of each line (right after the new line character) is the letter 'l' (the lowercase of 'L'). Well this example, obviously works as 'line1', 'line2' and 'line3' start with the letter 'l'. So here is the output of the script
Code:
C:\> python -tt myscript.py
Text does match the regular expression
C:\>
Just, to see the impact of the re.MULTILINE, I changed the first letter of the second line and I put for example 'D' instead of 'l' and I expected the text to be rejected this time, yet I got the very same output. Again I checked the online documentation for the flag MULTILINE in the re module:
http://docs.python.org/release/3.2.3/library/re.html#re.MULTILINE
Quote:
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string. |
So if the pattern matches the beginning of each line, why the multiline string is not rejected when the first character of the second line does not start with 'l'?
Could someone kindly make some clarification?
Thanks in advance,
Dariyoosh