#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2008
    Posts
    13
    Rep Power
    0

    Lightbulb Match Closing Tag


    Dear All,

    I am in beginner in regex, using perl i have to solve the problem i.e., close the div tags, where only opening divisional tags such div1, div2 and div3 elements present inthe xml file. I have to put the closing divisional tags based on their hierarchical.

    The XML structure is

    div1 - Parent element
    div2 - Child
    div3 - subchild

    In the xml file there might be nested <div> tags i.e.,
    Input:
    <div1>...
    <div2>...
    <div2>...
    <div3>...
    <div1>...

    Output:
    <div1>...
    <div2>...</div2>
    <div2>...
    <div3>...</div3>
    </div2>
    </div1>
    <div1>...</div1>

    Your solution/feedbacks would be like medicine for my headache...

    Thanks,
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2004
    Location
    Northern Ireland
    Posts
    59
    Rep Power
    11
    Originally Posted by rangeshram
    close the div tags, where only opening divisional tags such div1, div2 and div3 elements present inthe xml file.
    I think this problem is too complex for regular expressions (although I may be wrong).
    In this case, you are not really trying to match a set pattern.

    Matching tags on one line should be simple enough.
    <div2>...
    Code:
    <div\d+>.*?$
    Should match any div with a number in it that may have something after it and ends in a new line.

    Matching tags across multiple lines is alot more trouble.
    <div1>
    <div2>...
    <div1> has to know that <div2> is inside it because of the indenting. However, that means that the match has to remember the indeting before the <div1> (in this case none) check it against the indenting of <div2> and see if div2 has more.
    I don't think that regex can do that; they have no built in memory.

    I think it would be better to read the xml file in line by line, check the indenting before any tags and use some sort of stack to hold tags that need to be closed on other lines.
    Not having used any perl, I cannot really be of help.
    "True Power Lies Within The Blood Of Your Peoples Revenge... The Devils Fruit Can Lead Me There..." - Uchiha Sasuke
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2008
    Posts
    13
    Rep Power
    0

    Lightbulb Re: Match Closing Tag


    Thanks for your reply.

    I am indenting the div tags for just to differentiate them. But in the xml file there would be no more indent. It is a plain xml file where no junks would be present but each division should start from the new line.

    Sample XML:

    <div1>Heading1-one
    <div2>Heading2-one
    <div2>Heading2-two
    <div3>Heading3-one
    <div2>Heading2-three
    <div1>Heading1-two

    Hope this is clear and sorry for the confusion...

    We need the solution where using the REGEX the closing tag should be automatically inserted for each opening tag.

    ===============



    Originally Posted by jedi_ralf
    I think this problem is too complex for regular expressions (although I may be wrong).
    In this case, you are not really trying to match a set pattern.

    Matching tags on one line should be simple enough.
    <div2>...
    Code:
    <div\d+>.*?$
    Should match any div with a number in it that may have something after it and ends in a new line.

    Matching tags across multiple lines is alot more trouble.
    <div1>
    <div2>...
    <div1> has to know that <div2> is inside it because of the indenting. However, that means that the match has to remember the indeting before the <div1> (in this case none) check it against the indenting of <div2> and see if div2 has more.
    I don't think that regex can do that; they have no built in memory.

    I think it would be better to read the xml file in line by line, check the indenting before any tags and use some sort of stack to hold tags that need to be closed on other lines.
    Not having used any perl, I cannot really be of help.
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    938
    Originally Posted by rangeshram
    ...
    In the xml file there might be nested <div> tags i.e.,
    ...
    Because of that requirement, regular expressions is not suited for this job. You'll need (to build) a "true" recursive decent parser.

    But perhaps you need to fix the source of your problem. I mean, at some point you're receiving this invalid XML: try to fix that instead. By trying to correct this invalid XML, it's like fixing something with sticky tape.
    Last edited by prometheuzz; February 23rd, 2009 at 04:35 AM.

IMN logo majestic logo threadwatch logo seochat tools logo