#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    6
    Rep Power
    0

    Merging preg matches


    Hello,
    I trying to merge 10 preg matches in to one, I end up with this:
    Code:
    /<[^>]*(script|object|iframe|applet|meta|style|form)*"?[^>]*>|\([^>]*"?[^)]*\)|"|'/
    I created test to ensure what new preg match works same as old, it passes all tests, except it returns true on
    Code:
    <>
    then it must return false, any ideas how to fix that?
    Thanks.
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Well, "<>" matches this part of your regex (divided over multiple lines for clarity):

    Code:
    <
    [^>]*
    (script|object|iframe|applet|meta|style|form)*
    "?
    [^>]*
    >
    As you can see, everything but the < and > are optional, so that's why. Are you sure the * after (script|object|iframe|applet|meta|style|form) is correct? That will match "appletappletappletappletappletapplet" for example (but also an empty string).
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    6
    Rep Power
    0
    Yes * is correct, for example:
    Code:
    <script scriptapplet scriptscript>alert('XSS!')</script scriptscriptscript scriptscript script>
    works fine.
    Oh ok, so why this preg match returns false on <> :
    Code:
    /<[^>]*script*"?[^>]*>|<[^>]*object*"?[^>]*>|<[^>]*iframe*"?[^>]*>|<[^>]*applet*"?[^>]*>|<[^>]*meta*"?[^>]*>|<[^>]*style*"?[^>]*>|<[^>]*form*"?[^>]*>|\([^>]*"?[^)]*\)|"|\'/
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by ignas2526
    Oh ok, so why this preg match returns false on <> :
    ...
    Here's the first part of your regex with a little explanation:

    Code:
    <             // match a '<'
    [^>]*         // match zero or more characters other than '>'
    scrip         // match the string 'scrip'
    t*            // match zero or more 't'-s
    "?            // match an optional double-quote
    [^>]*         // match zero or more characters other than '>'
    >             // match a '>'
    In other words, that part of your regex will cause the following strings to match:
    Code:
    <scrip>
    <scripttttttttttttttttttttttt>
    <scrip"<>
    to name just three.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    6
    Rep Power
    0
    I found where!
    Code:
    <[^>]*(script|object|iframe|applet|meta|style|form)*[^>]*>
    must be:
    Code:
    <[^>]*(script|object|iframe|applet|meta|style|form)*[^>]>
    That regex wasn't my, it was from 414 characters, and speed was about 1.6 secs in test, now its from 99 characters, and speed 0.4 secs in the same test.
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by ignas2526
    I found where!
    Code:
    <[^>]*(script|object|iframe|applet|meta|style|form)*[^>]*>
    must be:
    Code:
    <[^>]*(script|object|iframe|applet|meta|style|form)*[^>]>
    That regex wasn't my, it was from 414 characters, and speed was about 1.6 secs in test, now its from 99 characters, and speed 0.4 secs in the same test.
    Okay. Now take the following text:

    if a < b then there 'a' is bigger than 'b' ... some more text <script> abcdefg </script> ...

    Your regex will find two matches (the underlined parts):

    if a < b then there 'a' is bigger than 'b' ... some more text <script> abcdefg </script> ...
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jun 2009
    Posts
    6
    Rep Power
    0
    All regex are failure in some cases, however in my case I don't care what is before or what is after, the only thing I need is to detect if there any XSS, if its detected, script simply destroys whole string, so no matter what is before or after. The more important is to not detect strings who does not contain XSS like <>

IMN logo majestic logo threadwatch logo seochat tools logo