#1
  1. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,670
    Rep Power
    171

    Regext questions and friendly url question.


    Hi;

    2 questions:

    1 - How can I access the number in these? For example 29098.

    There is 1 dash to seperate id from suburb. There maybe dash(s) in the suburb name. So I need the code to work with suburbs with and without dash in their title.
    PHP Code:
    $string "http://test.site.com/newyork-rentals/Queens-Park-29098"
    And
    PHP Code:
    $string "http://test.site.com/newyork-rentals/manhathan-28771"
    2 - Would this be proper way of creating URLs. Considering I have to break the argument and search the DB based on the id.

    http://test.site.com/newyork-rentals/Queens-Park-42010


    Thakn you
    Last edited by English Breakfast Tea; September 2nd, 2013 at 10:43 PM.
  2. #2
  3. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Hi,

    you simply look for a sequence of digits at the end of the string, optionally followed by a trailing slash:

    PHP Code:
    '#\d+/?$#' 
    Yes, that's a typically way of creating user-friendly URLs.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  4. #3
  5. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,952
    Rep Power
    4033
    Originally Posted by English Breakfast Tea
    1 - How can I access the number in these? For example 29098.
    Given the rules you stated, there is no need for any regex for this problem. Just find the last - and take everything after it to get the ID.

    Code:
    $id = substr(strrchr($url, '-'), 1);

    Comments on this post

    • Jacques1 disagrees
    Recycle your old CD's, don't just trash them



    If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
  6. #4
  7. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,670
    Rep Power
    171
    Originally Posted by kicken
    Code:
    $id = substr(strrchr($url, '-'), 1);
    Gold
  8. #5
  9. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Relying on the dash is rather stupid in my opinion, since the whole description (including the dash) is just an additional info. It's not necessary. So why would you rely on its existence instead of simply fetching the ID? Power users may very well wanna use the raw ID (for whatever reason).
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  10. #6
  11. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,670
    Rep Power
    171
    Originally Posted by Jacques1
    Relying on the dash is rather stupid in my opinion, since the whole description (including the dash) is just an additional info. It's not necessary. So why would you rely on its existence instead of simply fetching the ID? Power users may very well wanna use the raw ID (for whatever reason).
    How do you recommend to build friendly urls here uncle Jaque?

    Edit:

    I got the idea from this site. They are huge.
    Last edited by English Breakfast Tea; September 2nd, 2013 at 11:45 PM.
  12. #7
  13. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    As I already said in my first reply, the URLs themselves are fine. This is a typical way of creating user-friendly URLs.

    What's not fine is extracting the ID based on the dash as suggested by kicken. Since the whole description is just additional information, it should not be required. It should be possible to use the raw ID without the description:

    Code:
    http://test.site.com/newyork-rentals/29098
    Don't forget you're dealing with different users from all over the world. There's the average Joe who only clicks around in his browser and uses whatever you give him. But there's also advanced users who use the application in the way they want. This may include accessing the ID directly without the description stuff (for web scraping or whatever).

    You should cover both user types. If I see URLs like this, I know how they work, and I expect the description part to removeable. If it turns out that I need to leave at least a dash (otherwise the application will blow up), then I consider this to be a bug.

    Use the regex above the extract the digits at the end of the string. This way you make no assumptions about whether or not there's something before that.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  14. #8
  15. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,670
    Rep Power
    171
    Originally Posted by Jacques1
    It should be possible to use the raw ID without the description:

    Code:
    http://test.site.com/newyork-rentals/29098

    But baby Jaque you changed my url! I want those suburb names in there for my SEO people.


    PHP Code:
     $string "http://test.site.com/newyork-rentals/Queens-Park-29098"
  16. #9
  17. Wiser? Not exactly.
    Devshed God 1st Plane (5500 - 5999 posts)

    Join Date
    May 2001
    Location
    Bonita Springs, FL
    Posts
    5,952
    Rep Power
    4033
    Originally Posted by English Breakfast Tea
    But baby Jaque you changed my url! I want those suburb names in there for my SEO people.
    Jacques1's point is that removing the suburb name (and dash) should not cause the URL to not work. That way if someone wanted to shorten the URL for use in say an IM or email or something they could just remove that excess text and leave the ID.

    You yourself would still use the suburb name when generating the URLs on your site to improve the SEO factor.

    I happen to disagree that requiring the dash is a problem/bug, so long as the application fails gracefully if the ID is not found. To each their own though. Removing the dash requirement is not a hard thing to do so if you want to allow the flexibility go for it.
    Recycle your old CD's, don't just trash them



    If I helped you out, show some love with some reputation, or tip with Bitcoins to 1N645HfYf63UbcvxajLKiSKpYHAq2Zxud
  18. #10
  19. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    On a side note: I find it rather ironic when web developers, who've probably spent a large part of their life playing around with tech and using it in creative ways, seem to view their own users as mindless drones doing nothing but clicking big buttons in their Internet Explorer. Is it so hard to imagine that some of your users may be just as much into tech and experimenting as you are?

    Sure, you can frustrate them and tell them they have to use your application exactly as you indended. You can block any URL that doesn't conform to your officially certified SEO scheme -- just like the home page designers in the 90s didn't want their users to do a right click. But don't you find that stupid and arrogant? Shouldn't your application be fun to use?

    There's actually a great article about designing URLs, and one chapter is about "hackable URLs". You should read it.

    But it's up to you. If being in control means everything to you, go ahead. Force people to use your predefined SEO URLs.

    Comments on this post

    • PaulGer agrees
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  20. #11
  21. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,670
    Rep Power
    171
    Originally Posted by kicken
    Jacques1's point is that removing the suburb name (and dash) should not cause the URL to not work.
    Why not
    PHP Code:
    if(!is_numeric($string)
        {
            
    $string preg_replace("/[^0-9]/","",$string)
        }                                   
    $sql "SELECT title FROM table_x WHERE id = :id";
    $args = array('id'=>$string); 
  22. #12
  23. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    This is much better, but it will fall apart if the URL happens to contain a digit outside of the ID:

    Code:
    http://test.site.com/planet-earth/city17-29098
    Your code would mangle that into

    Code:
    http://test.site.com/planet-earth/1729098
    Which is clearly not what the user wanted.

    Why do you dislike the regex above so much? I couldn't think of a simpler and more robust way of extracting the ID. You can make it a bit stricter by forcing the digits to be either the only content or come after a dash:

    PHP Code:
    '#(?<=^|-)\d+$#' 
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".
  24. #13
  25. A Change of Season
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2004
    Location
    Next Door
    Posts
    2,670
    Rep Power
    171
    Originally Posted by Jacques1
    Why do you dislike the regex above so much?
    PHP Code:
    '#(?<=^|-)\d+$#' 
    Because I don't understand it. I realise the concept and the purpose, but I don't understand it character by character.

    To be honest learning regex well takes a lot of time. I am in the middle of 5 other things. I thought I use a code I understand and I can explain.
  26. #14
  27. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,959
    Rep Power
    1014
    Originally Posted by English Breakfast Tea
    Because I don't understand it.
    Well, then why don't you ask instead of dancing around the issue for 10 posts? This is no rocket science.

    The most straightforward approach would be to simply fetch the sequence of integers at the end of the string:

    PHP Code:
    '\\d+$' 
    "At least one digit followed by the end of the string."

    However, this will give unwanted results for invalid URLs like

    Code:
    http://test.site.com/planet-earth/city17
    This would extract the "17" at the end, which doesn't make sense.

    So we want either a raw ID (only digits) or an ID after a dash:

    PHP Code:
    '(?<=^|-)\\d+$' 
    "The beginning of the string or a dash followed by at least one digit followed by the end of the string."

    For additional robustness, you may want to allow an optional trailing slash:

    PHP Code:
    '#(?<=^|-)\\d+(?=/?$)#' 
    Testing it:

    PHP Code:
    <?php

    $id_pattern 
    '#(?<=^|-)\\d+(?=/?$)#';

    preg_match($id_pattern'Queens-Park-29098'$id_matches);
    $id $id_matches[0];

    var_dump($id);
    Gives you '29098' as expected.

    When you want pretty URLs, there's really no way around regexes. Didn't you have to set up the .htaccess already?
    Last edited by Jacques1; September 4th, 2013 at 05:44 AM.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".

IMN logo majestic logo threadwatch logo seochat tools logo