#1
  1. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2000
    Location
    Perth West Australia
    Posts
    757
    Rep Power
    15
    The Brick wall hurts less!

    if I have a string say :

    'blah blah <a href="this.html">this</a><a href="neother.htm">other</a> blah blah'

    ie: 2 'a hrefs' on the same line - & assuming I know nothing about the content or size of the links... the expression ,

    eregi("href="(.*)</a>?",$buffer,$link)

    will return :
    'href="this.html">this</a><a href="neother.htm">other</a>'

    when I actually want:
    'href="this.html">this</a>'

    That is one headbanger(for me anyway)

    the other is.. this code replaces ' href=" ' with ""(blank space)

    $plain= eregi_replace("href="","",$link[0]);

    how do I get it to replace 'href="' or 'href ="' or 'href= "' or even 'href = "' ?

    Any help would be really appreciated - I have scrounged around all of my usual haunts and have gathered all sorts of useful info on regex - but still can not get the hang of anything but the simplest stuff - its starting to get to me now.

    With thanks,

    Simon.

    ------------------
    Simon Wheeler
    FirePages -DHTML/PHP/MySQL
  2. #2
  3. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2000
    Location
    Perth West Australia
    Posts
    757
    Rep Power
    15
    pretty please?

    ------------------
    Simon Wheeler
    FirePages -DHTML/PHP/MySQL
  4. #3
  5. .Net Developer
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2000
    Location
    London
    Posts
    987
    Rep Power
    15

    <<
    eregi("href="(.*)</a>?",$buffer,$link)

    will return :
    'href="this.html">this</a><a href="neother.htm">other</a>'

    when I actually want:
    'href="this.html">this</a>'
    >>


    Hi Mr.Simon,

    i don't know wether this logic is acceptable to you or not..

    i am going to use explode() function to solve both your problems..

    Here is the soultion for your first question..

    i am starting from that return value..
    use a split or explode() function to seperate both the links..

    <?php

    $string="href="this.html">this</a><a href="neother.htm">other</a>";


    $values=explode("</a>",$string);

    echo $values[0]."</a>";
    //it will print 'href="this.html">this</a>'
    ?>

    -----

    solution for second problem..

    <?php

    $test="href ="this.html">this</a>";

    $first_val=explode(""",$test);

    $plain= eregi_replace($first_val[0].""","",$test);

    echo $plain;

    //prints--> this.html">this</a>
    ?>

    i hope this may help you in some way..

    actually i'm not getting anyother ideas now..


    ------------------
    SR -
    webshiju.com

    "The fear of the LORD is the beginning of knowledge..."



    [This message has been edited by Shiju Rajan (edited July 12, 2000).]
  6. #4
  7. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2000
    Location
    Perth West Australia
    Posts
    757
    Rep Power
    15
    Thankyou for your reply Shiju Rajan.

    I have considered exploding the lines etc , however for this particular project I am trying to find the quickest possible way to acheive the results, and while explode() or split() is faster than using regex - I still have to use regex anyway after the split. I was hoping to do as much work in 1 regular expression as possible.

    The script will be validating links on pages that may contain several hundred links, with no way of knowing where the links will appear in the HTML, I also need to check for relative and absolute links , ftp links , function links and inpage links and then group them / print them out with stats / & validate if and where possible!

    Scary! - So basically I am trying to write code that is as compact and efficient as possible - thinking about it while writing perhaps I should do as much as possible with explode() substr() etc and leave the regex till last - or learn regex properly!

    Thanks again for your reply, if you have any more ideas please let me know.

    Regards,

    Simon Wheeler

    ------------------
    Simon Wheeler
    FirePages -DHTML/PHP/MySQL
  8. #5
  9. No Profile Picture
    freebsd
    Guest
    Devshed Newbie (0 - 499 posts)
    firepages, here is the Perl way. It should met all your requirements. Hopefully you or someone else can convert it to PHP.
    #this.html
    <html>
    <body>
    blah blah <a href="this.html">this</a><a href="neother.htm">other</a> blah blah
    blah blah <a href="hello.html">hello</a><a href="another.htm">another</a> blah blah
    blah blah <a href="that.html">that</a><a href="other.htm">other1</a> blah blah
    </body>
    </html>
    #############################################
    #!/usr/local/bin/perl

    print "Content-type: text/htmlnn";
    open (THIS, "this.html");
    @lines = <THIS>;
    close (THIS);
    foreach $line (@lines) {
    @break = split(/</a>/,$line);
    foreach $result (@break) {
    if ($result =~ /hrefs?=s?(.*)>(.*)/gi) {
    $url = $1;
    $label = $2;
    $url =~ s/"//;
    $url =~ s/s+$//;
    chop($url) if ($url =~ /"$/);
    $label =~ s/^s+//;
    $label =~ s/s+$//;
    print "<a href="$url">$label</a><br>n";
    }
    }
    }
    #############################################
    1)@break = split(/</a>/,$line);
    </a> is the delimiter. It will break anything after </a> to the next line

    >>how do I get it to replace 'href="' or 'href ="' or 'href= "' or even 'href = "' ?
    2) if ($result =~ /hrefs?=s?(.*)>(.*)/gi) {
    This should fullfill such requirements

    3) initial $url would be "this.html"
    $url =~ s/"//;
    $url would become this.html"

    4) chop($url) if ($url =~ /"$/);
    This should remove the ending " character
    note: this is to avoid <a href=this.html> without double quotes which you haven't thought of.

    5) Remove leading and trailing white spaces
    $label =~ s/^s+//;
    $label =~ s/s+$//;

    If only ONE URL is found in one line, $result will always print "the_fist_url.html"

    The result of this script would be:
    <a href="this.html">this</a><br>
    <a href="neother.htm">other</a><br>
    <a href="hello.html">hello</a><br>
    <a href="another.htm">another</a><br>
    <a href="that.html">that</a><br>
    <a href="other.htm">other1</a><br>

    >>on pages that may contain several hundred links
    That is fine. But there is a known bug for this script if

    href="neother.htm">other

    is break into two lines. Though, it will not ruin the display of your script but such URL will be left out.

    [This message has been edited by freebsd (edited July 12, 2000).]
  10. #6
  11. Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2000
    Location
    Perth West Australia
    Posts
    757
    Rep Power
    15
    I just remembered how hard my head hurt looking at PERL - lol

    Thankyou freeBSD - I should be able to translate the regex in your code to PHP's Perl-compatible regex -

    I have managed to sort out my first problem with only one regex call (using explode() Shiju!) as I initially forgot about the possibility of more than one link per line!

    This should get me there - Thanks again all.

    ------------------
    Simon Wheeler
    FirePages -DHTML/PHP/MySQL
  12. #7
  13. .Net Developer
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2000
    Location
    London
    Posts
    987
    Rep Power
    15

    <<
    if ($result =~ /hrefs?=s?(.*)>(.*)/gi) {
    $url = $1;
    $label = $2;
    $url =~ s/"//;
    $url =~ s/s+$//;
    chop($url) if ($url =~ /"$/);
    $label =~ s/^s+//;
    $label =~ s/s+$//;
    print "<a href="$url">$label</a><br>n";
    >>


    Nice coding freebsd!!!Really i like this..

    Mr.Simon,You got a wonderful solution..



    ------------------
    SR -
    webshiju.com

    "The fear of the LORD is the beginning of knowledge..."

IMN logo majestic logo threadwatch logo seochat tools logo