Hello, clever people who can do Regex!
I like to think of myself as an intelligent person, and a fairly good PHP scripter, but Regex makes my brain hurt! Can someone please help me write this rule before I start to reassess my career choices and retrain as a manual laborer...
I know there are many Regex rules already to isolate the various parts of a url, for example:
Code:
/^(http|https|ftp)://([A-Z0-9][A-Z0-9_-]*(?:.[A-Z0-9][A-Z0-9_-]*)+):?(d+)?/?/
- or -
/^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(:([^\/]*))?((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?$/
What I need is something slightly different, I want to extract
anything that fits the criteria for a domain name from a text string. So, as far as I can break it down, I need to identify:
Quote:
(not a hyphen or alpha-numeric character)
- followed by -
(2 or more alpha-numeric or hyphen character)
- followed by -
(a dot ".")
- followed by -
(2-7 alpha characters)
- optionally followed by -
(a dot "." and 2-3 more alpha characters) |
It should be able to identify and extract any valid domains, which could be located inside text, quotes, tags, urls, anything, so the rule must include that the preceeding and following characters must be the start or end of line, or any non-alpha-numeric or hyphen character. For example:
Quote:
Contact me at: <a href="mailto:contact@me.co.uk?subject=xxx">blah</a> my favourite musium is "cymru.museum", ebay in australia is at:ebay.com.au! Some people use hacks like del.icio.us... and some domains are very ugly, like [y687-hy6-7yg54676-9076j--f798k-767658765.info]. Is this even possible!?
It should find:
me.co.uk
cymru.museum
ebay.com.au
icio.us
y687-hy6-7yg54676-9076j--f798k-767658765.info |
Can anybody help me please!?
Many thanks in advance,
Neil