Thread: Find inner HTML

    #1
  1. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Aug 2002
    Location
    Queensland, Australia
    Posts
    827
    Rep Power
    142

    Find inner HTML


    I've created a simple template system with language translation.

    All the template files should contain either HTML tags or <? ?> for PHP code. There should be no text in the templates as the text should all be stored in PHP variables.

    I need a regular expression that I can use to search through all my templates and make sure it doesn't contain any text between tags.

    To be thorough I would also like it to locate text used in attributes such as alt or title, but that's not so important.

    I don't seem to be able to get my head around how to search for words that do not begin with < or end with >
    Ooh, they have the Internet on computers now!
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    So your templates are all just tags (or should be at least). Then this could do the trick:

    Code:
    ^(\s*<[^>]*>)+\s*$
  4. #3
  5. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Aug 2002
    Location
    Queensland, Australia
    Posts
    827
    Rep Power
    142
    Originally Posted by prometheuzz
    So your templates are all just tags (or should be at least). Then this could do the trick:

    Code:
    ^(\s*<[^>]*>)+\s*$
    That matches the tags. I need the opposite of that so I can use it to find files that contain text between tags.
    Ooh, they have the Internet on computers now!
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by DrWorm
    That matches the tags. I need the opposite of that so I can use it to find files that contain text between tags.
    Well, the negation can be handled in the programming language you're writing this in.
    But, this will match text outside of a tag:

    Code:
    [^<>]+(?=[^>]*(?:<|$))
  8. #5
  9. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Aug 2002
    Location
    Queensland, Australia
    Posts
    827
    Rep Power
    142
    Originally Posted by prometheuzz
    Well, the negation can be handled in the programming language you're writing this in.
    I should have stressed I'm not using this in a programming language. I'm using it in a "Find" dialog in NetBeans.
    But, this will match text outside of a tag:

    Code:
    [^<>]+(?=[^>]*(?:<|$))
    It does except it also matches spaces. It's really only [a-zA-Z] that needs to be found. I feel like I've already trouble you too much, but if you're compelled to figure out another regex I would truely appreciate it. In the mean time this expression will give me something to examine and learn from.
    Ooh, they have the Internet on computers now!
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by DrWorm
    I should have stressed I'm not using this in a programming language. I'm using it in a "Find" dialog in NetBeans.

    It does except it also matches spaces. It's really only [a-zA-Z] that needs to be found. I feel like I've already trouble you too much, but if you're compelled to figure out another regex I would truely appreciate it. In the mean time this expression will give me something to examine and learn from.
    I think that after a short explanation of the regex, you will be able to change it (slightly) so that it suits your needs.

    Code:
    [^<>]+      // one or more characters of any type except '<' and '>'
    (?=         // start positive look ahead
      [^>]*     //   zero or more characters of any type except '>'
      (?:<|$)   //   either '<' or the end of the string
    )           // stop positive look ahead
    Feel free to post back if you have further questions.
  12. #7
  13. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Aug 2002
    Location
    Queensland, Australia
    Posts
    827
    Rep Power
    142
    Thanks very much for the explaination. I'm not familiar with "look aheads" so that'll give me something to research on.

    The regex is no longer urgent so i'm happy to take the time to figure it out and learn from it.
    Ooh, they have the Internet on computers now!

IMN logo majestic logo threadwatch logo seochat tools logo