#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Location
    Sydney
    Posts
    3
    Rep Power
    0

    "Push" single document root URL rewrite also drop file ext.


    We have reached our limits here, so hopefuly someone can explain this. We are containing windows .chm helpfile source with unhelpful URLs under a single online navigator. Here is an example of .chm source URL:

    Code:
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/IDR_LISP_B118.htm">
    I have this developed to sanitize, works most of the time:
    PHP Code:
    $body ereg_replace("mk:@MSITStore:[^<>[:space:]]+[[:alnum:]/]+.chm::/"$docroot$body);
    $body str_replace(".GIF"".gif"$body);
    $body ereg_replace(".htm|.hhk"""$body); 
    ...where $docroot is a string created by another function to return the URL base of this "content manager". This results in any of the hundreds of HTML source files in directory being callable by name from a single base URL xxx?help=NAME. My problem is twofold:

    I have another set of HTML source documents parsed from the hhc indexes which actually uses "sane" HTML links which need to be "rewritten" upon display for same behavior as above. I would also like to drop all file extensions that same line as well (more efficient) -- here is the conversion:

    (match a)|(match b)+(FILENAME)+(extension)=
    ($docroot)(FILENAME)

    where a = "mk:@MSITStore:[^<>[:space:]]+[[:alnum:]/]+.chm::/"
    but b = nothing (these elements look like <a href="filename.htm">)

    Since we can't search for nothing, probably I actually need to search for all <a> tags and push/replace WHATEVER MIGHT prepend the href's filename with $docroot to PHP buffer:

    (match href=?)+(FILENAME)+(extension)=
    ($docroot)(FILENAME)

    The resulting URLS are always xxx?help=FILENAME
    (minus extension, and where "xxx?help=" represents what is actually in $docroot)

    Sorry I don't know how to describe the problem more simply, but I know there is an answer. Can anyone help?
  2. #2
  3. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by scanf
    ...
    Can anyone help?
    Probably.
    First, why are you using the older ereg functions instead of the preferred preg ones?
    Second, forget about regex for the moment. Could you post a couple of example strings and for each string post the desired transformation? Also, describe what exactly is changed for each example.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Location
    Sydney
    Posts
    3
    Rep Power
    0
    Originally Posted by prometheuzz
    Probably.
    First, why are you using the older ereg functions instead of the preferred preg ones?
    Legacy code from a previous project. Can change it no problem, wasn't aware of a preference for preg_replace thanks

    Originally Posted by prometheuzz
    Second, forget about regex for the moment. Could you post a couple of example strings and for each string post the desired transformation? Also, describe what exactly is changed for each example.
    Sure, first examples from chm source:
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/B118.htm">
    to
    <A HREF="http://mysite?help=B118">
    ::
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/B156.htm">
    to
    <A HREF="http://mysite?help=B156">
    :: second example set from "index" file type
    <a href='1062.htm'>
    to
    <A HREF="http://mysite?help=1062">
    ::
    <a href='LOCAL-B.htm'>
    to
    <A HREF="http://mysite?help=LOCAL-B">


    Where "http://mysite?help=" is a generic URL to represent what is actually being stored (correctly) in $docroot variable as explained above. Thanks a lot!
  6. #4
  7. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Try something like this:
    PHP Code:
    $text 'abc
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/B118.htm">
    def
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/B156.htm">
    ghi
    <a href=\'1062.htm\'>
    jkl
    <a href=\'LOCAL-B.htm\'>'
    ;

    $docroot 'http://mysite?help=';

    echo 
    $text "\n-------------------------------\n";
    echo 
    preg_replace(
      
    '#<a\s+href=[\'"](?:mk:@MSITStore:[^/]*/)?(.*?)\.htm[\'"]>#i'
      
    "<a href=\"$docroot$1\">"
      
    $text
    );

    /* output:

    abc
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/B118.htm">
    def
    <A HREF="mk:@MSITStore:ITC_IDR.chm::/B156.htm">
    ghi
    <a href='1062.htm'>
    jkl
    <a href='LOCAL-B.htm'>
    -------------------------------
    abc
    <a href="http://mysite?help=B118">
    def
    <a href="http://mysite?help=B156">
    ghi
    <a href="http://mysite?help=1062">
    jkl
    <a href="http://mysite?help=LOCAL-B">

    */ 
    Last edited by prometheuzz; July 4th, 2009 at 06:43 AM.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2009
    Location
    Sydney
    Posts
    3
    Rep Power
    0

    Works good


    I don't know how to thank you.
  10. #6
  11. No Profile Picture
    User 165270
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2005
    Posts
    497
    Rep Power
    937
    Originally Posted by scanf
    I don't know how to thank you.
    Saying that you don't know how to thank me is more than enough gratitude!
    You're most welcome.

IMN logo majestic logo threadwatch logo seochat tools logo