#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    22
    Rep Power
    0

    Regex to put heading tags


    I have 1000+ pages without heading tag I want to add <h1> </h1> tags around the text like below how to achieve this?

    Code:
    <div style="position: absolute; width: 749px; height: 0; z-index: 1; left: 4px; top: 241px" id="layer1">
    	<p>
    	<b><font face="Tahoma">Kolesterolünüzü Düşürecek
    	Bir Egzersiz Programı<br>
    What it should be:

    Code:
    <div style="position: absolute; width: 749px; height: 0; z-index: 1; left: 4px; top: 241px" id="layer1">
    	<h1>
    	<b><font face="Tahoma">Kolesterolünüzü Düşürecek
    	Bir Egzersiz Programı<br></h1>
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    837
    Rep Power
    496
    There is nothing complicated with adding heading tags, the difficulty is to identify precisely where they should be added, i.e. to describe precisely what type of pattern will tell you where to put a heading start tag and where to put an end tag.

    I should add, however that, while it is certainly feasible to do it with regular expressions if you can define very precisely your requirements and if they are very simple, regexes are usually not recommended for dealing with HTML format tags. A specialized HTML library or module is most of the time more appropriate, but I can't say much more than that since I don't even know which programming language you are using.

    Or, to put it in another way, if you can define a very simple search-and-replace rule, regexes are usable for your purpose, but if you have to start to deal (even only marginally) with HTML tag structure, try to chose another route.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    22
    Rep Power
    0
    I'm sure about where it should be. That's why I include div style section, a simple regex is enough for me.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    837
    Rep Power
    496
    Ok, you are sure, but I am not, actually I have no idea. Please tell exactly how you recognize and know where to put the heading start and end tag, and I will be prbably able to give you a regex.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    22
    Rep Power
    0
    Just around the text.

    Code:
    <div style="position: absolute; width: 749px; height: 0; z-index: 1; left: 4px; top: 241px" id="layer1">
               <p>
    	<b><font face="Tahoma">*<br>

    Code:
    <div style="position: absolute; width: 749px; height: 0; z-index: 1; left: 4px; top: 241px" id="layer1">
    	<p>
    	<b><font face="Tahoma"><h1>*</h1><br>
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    837
    Rep Power
    496
    Hi, I understand that, in the example, you want to have the heading tags around the "Kolesterolünüzü Düşürecek Bir Egzersiz Programı" subtitle.

    What I am asking for is a rule that will allow the program to figure out when to put these headings tags, even if the subtitle is something different. In other words, some invariant feature in the file that can help determine that a <H1> tag should always be inserted there. You are not providing this kind of rule.
  12. #7
  13. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    22
    Rep Power
    0
    I added above <br> at the end of the line. Just between them there will be heading tags around text.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    837
    Rep Power
    496
    Then, you could try this as a starting point:

    Code:
    $line =~ s/<b><font face="Tahoma">([^>]+)<br>/<b><font face="Tahoma"><h1>$1<br>/;
  16. #9
  17. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2011
    Posts
    22
    Rep Power
    0
    Worked fine. Thank you Laurent_R

IMN logo majestic logo threadwatch logo seochat tools logo