#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    11
    Rep Power
    0

    Negative Lookahead problems


    I am currently making a program to remove certain sections from a very large log file, I have experimented with lookaheads, lookbehinds, and a bunch of other things, but none of these seem to be working for this.
    Here is an example of what I need.


    1 'some text here' FieldID: 'some text here'
    2 'Thousands of lines of text here'
    3 'some text here' TxSuccess: True
    4 'Another couple thousand lines'
    5 'some text' FieldID: 'some text'
    6 'many lines'
    7 'some text' TxSuccess: False
    8 'many lines'
    9 'some text' FieldID: 'some text'
    10 'many lines'
    11 'some text' TxSuccess: True
    12 'many lines'
    13 'some text' FieldID: 'some text'
    14 'many lines'
    15 'some text' TxSuccess: False


    I need it to match everything from "FieldID" to "TxSuccess: False", so in this example I need it to match from lines 5 to 7 and from lines 13 to 15 without matching any other lines.

    The problem with most of the regexes I've tried is that they will start the match at the first "FieldID" encountered, like this extremely obvious one:
    FieldID:\s.*(\r|\n|.*?)*?TxSuccess:\sFALSE

    Also tried using lookahead and lookbehind but none of them were able to match what I needed without any other lines.

    I'm fairly new to Regex so there might be a few concepts that I haven't even tried yet.

    Thanks in advanced,
    Christopher Wilson
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Try tthis:

    Code:
    (?s)FieldID(?:(?!FieldID).)*TxSuccess:\sFalse
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    11
    Rep Power
    0
    Originally Posted by prometheuzz
    Try tthis:

    Code:
    (?s)FieldID(?:(?!FieldID).)*TxSuccess:\sFalse

    According to RegexBuddy, that doesnt match any text in the log. However, the (?s) is a useful little trick that I didn't know about, thanks
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by cwilson
    According to RegexBuddy, that doesnt match any text in the log. However, the (?s) is a useful little trick that I didn't know about, thanks
    Then you're probably not using RegexBuddy correctly because the regex I posted is PCRE-all-the-way!

    Let me demonstrate by posting an example. If you execute this PHP script:

    PHP Code:
    $text "1 'some text here' FieldID: 'some text here'
    2 'Thousands of lines of text here'
    3 'some text here' TxSuccess: True
    4 'Another couple thousand lines'
    5 'some text' FieldID: 'some text'
    6 'many lines'
    7 'some text' TxSuccess: False
    8 'many lines'
    9 'some text' FieldID: 'some text'
    10 'many lines'
    11 'some text' TxSuccess: True
    12 'many lines'
    13 'some text' FieldID: 'some text'
    14 'many lines'
    15 'some text' TxSuccess: False"
    ;

    preg_match_all('/(?s)FieldID(?:(?!FieldID).)*TxSuccess:\sFalse/'$text$matches);

    print_r($matches); 
    it will produce the following output:

    Code:
    Array
    (
        [0] => Array
            (
                [0] => FieldID: 'some text'
    6 'many lines'
    7 'some text' TxSuccess: False
                [1] => FieldID: 'some text'
    14 'many lines'
    15 'some text' TxSuccess: False
            )
    
    )
    which is exactly what you said you want to match.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    11
    Rep Power
    0
    Ok, I tried the actual example I put into this thread and it worked perfectly, however the log is much much more complex, i will put a few lines of it in here for you to try.


    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: TRUE
    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: FALSE
    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: TRUE
    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: FALSE

    This is a slightly more accurate example, even if it is missing many thousands of lines.
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by cwilson
    Ok, I tried the actual example I put into this thread and it worked perfectly,
    Yes, I knew that.


    Originally Posted by cwilson
    however the log is much much more complex, i will put a few lines of it in here for you to try.


    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: TRUE
    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: FALSE
    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: TRUE
    2009/06/18 10:40:44:421 ThreadID = 1836 INFO PORTALIMAGING 60 BeginBeamDelivery() - Data : FieldID: 3-1
    2009/06/18 10:47:20:546 ThreadID = 1836 INFO PORTALIMAGING 60 EndBeamDelivery() - Data : TxSuccess: FALSE

    This is a slightly more accurate example, even if it is missing many thousands of lines.
    In the example from your original post, you mentioned "False" but now you wrote "FALSE".
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    11
    Rep Power
    0
    Hahaha if thats all it was I feel like an idiot, thank you so much.

    And by the way, how efficiently will this handle a 20+ megabyte file?
  14. #8
  15. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by cwilson
    Hahaha if thats all it was I feel like an idiot, thank you so much.

    And by the way, how efficiently will this handle a 20+ megabyte file?
    It al depends on how many text there will be in between "FieldID" and "TxSuccess FALSE". But there are (of course) more efficient ways to find the text you're interested in without loading 20 MB, or more, of text in-memory.
    Last edited by prometheuzz; July 30th, 2009 at 04:51 PM.
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    11
    Rep Power
    0
    There is usually only about 100 lines between fieldid and tx = false, since the tx = false means that the accelerator beam failed to start correctly and must start over.

    My program currently extracts data from the log with a run time of about 2 minutes, but I am no longer on my work computer, so I have no means to test it, but either way I know where to proceed from here.

    If it's fairly efficient, an extra 30 seconds or less of run time shouldnt be a problem, since the program will be run while performing another task.

    However, if it is like most lookarounds I've tried it could take significantly longer than that, in which case I will have to run the logs through powerGREP to make them a more manageable size before starting to gather data.

    Thank you very much for your help!


    Christopher Wilson
    Radiation Oncology - Physics
    Helen F. Graham Cancer Center
    4701 Ogletown-Stanton Road
    Newark, DE 19713
    302-623-4500
  18. #10
  19. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by cwilson
    ...

    Thank you very much for your help!
    You're welcome Christopher!

IMN logo majestic logo threadwatch logo seochat tools logo