#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Location
    Ithaca
    Posts
    68
    Rep Power
    2

    A URL validator...


    Well I am trying to figure out a way to check for a valid url. Found an article from this link below, it gives rise to the following regular expression:
    http://phpcentral.com/208-url-validation-in-php.html
    PHP Code:
    $urlregex "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
    if(
    preg_match($urlregex$url)) {
        echo 
    "URL is valid";
    }
    else {
        echo 
    "URL is invalid";

    The problem is, well, it only allows for url that begins with http:// and ftp://. There are two other circumstances that url may be valid.

    The first one is a url beginning with www. but without http, it is common for many circumstances. Another is a relative path referring to the current domain directory. A url like this "inc/class_url.php" should be valid too, although the the url validator will not recognize it.

    So I was wondering, is there a way to write a PHP regular expression that not only validates urls, but also allows for variations without http:// and for relative paths? Please help.

    Edit: btw, this url is also invalid in the validator, weird:
    http://upload.wikimedia.org/wikipedia/commons/c/ca/Button-Lightblue.svg
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Loyal (3000 - 3499 posts)

    Join Date
    Jul 2003
    Posts
    3,230
    Rep Power
    593
    There probably is but you might get a better response moving this to the regexp forum (click on the red triangle in the upper right).
    There are 10 kinds of people in the world. Those that understand binary and those that don't.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Location
    Ithaca
    Posts
    68
    Rep Power
    2
    umm I guess only a Mod can move thread?

    Anyway I am using this now to get rid of the http:// being a required portion issue:
    PHP Code:
    $regex =  "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i"
    It still does not allow relative path though, I was wondering how to accomplish that. Perhaps I have to add an if..else statement to account for relative path being valid?
  6. #4
  7. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,742
    Rep Power
    9397
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Location
    Ithaca
    Posts
    68
    Rep Power
    2
    Originally Posted by requinix
    Moved.

    See also Regex Resources and this bit of overkill.
    Oh thanks, that one looks complex. XD I was wondering though, does it identify certain combination of words/phrases as invalid? I mean, if someone enters something like 'ht==tp://' it will be marked invalid?

    I also figured out a neat way to validate relative path. Relative paths are usually used to include library and image/template files, so I simply use file_exists() to check on them. But of course, the supplied url is checked for absolute path validation first to ensure maximum performance.
  10. #6
  11. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,742
    Rep Power
    9397
    Originally Posted by Hall of Famer
    Oh thanks, that one looks complex. XD I was wondering though, does it identify certain combination of words/phrases as invalid? I mean, if someone enters something like 'ht==tp://' it will be marked invalid?
    Yep. Doesn't support IRIs but otherwise it is 100% accurate because it validates according to the RFCs (check the source).
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Location
    Ithaca
    Posts
    68
    Rep Power
    2
    Originally Posted by requinix
    Yep. Doesn't support IRIs but otherwise it is 100% accurate because it validates according to the RFCs (check the source).
    Oh great, thanks. Guess I will be using it in my url class then. XD
  14. #8
  15. JavaScript is not spelt java
    Devshed Novice (500 - 999 posts)

    Join Date
    Feb 2011
    Location
    Landan, England
    Posts
    743
    Rep Power
    165
    Originally Posted by requinix
    Nice regex; bet you had some weird dreams after creating that (although I hope that you didn't create it from scratch).
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Dec 2012
    Location
    Ithaca
    Posts
    68
    Rep Power
    2
    Indeed, that regex looks powerful, but it must be a painful process to design it all by yourself.
  18. #10
  19. Come play with me!
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    13,742
    Rep Power
    9397
    It was very relaxing, actually. All the rules are in the RFCs so it was just a matter of piecing them all together. I mean I didn't even bother to optimize it.

IMN logo majestic logo threadwatch logo seochat tools logo