|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Search engine friendly URLs
Hi,
I am trying to come up with a way to make my URLs search engine friendly, yet also very flexible in terms of URL variables. When I say search engine I mean google - thats all I'm really interested in . The first variable is always 'page=...', so I have come up with the following scheme.Original URL: http://www.server.com/index.php?page=12&var1=hello&var2=goodbye&var3=43 Friendly URL: http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3/43/ Question (1) - Is the '?' after index.php going to cause a problem for google? Question (2) - Is the face that there is no '.html' or '.php' at the end going to cause a problem for google? From my searching google it seems that neither of these are a problem, but then I'm not sure. Now, the mod rewrite. The rules are: 1) IF the url contains the string 'page/'.... 2) Find the LAST occurance of 'anything' 'slash' 'anything' 'slash' 'anything-or-nothing' 3) Replace this with 'anything' '=' 'anything' '&' This would recursively break down the URL like so: http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3/43/ $1 = http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3 $2 = 43 $3 = [nothing] result: http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3=43& Following on from this: http://www.server.com/index.php?page/12/var1/hello/var2=goodbye&var3=43& http://www.server.com/index.php?page/12/var1=hello&var2=goodbye&var3=43& http://www.server.com/index.php?page=12&var1=hello&var2=goodbye&var3=43& At this point the condition ...IF the url contains the string 'page/' ... no longer fits and we have the new URL. Question (3) how (using mod rewrite in .htaccess file) do I find the rule for - Find the LAST occurance of 'anything' 'slash' 'anything' 'slash' 'anything-or-nothing' ? Any help/feedback greatly appreciated, Cheers, Pea |
|
#2
|
||||
|
||||
|
1. I have read accross the forums that one ?foo=bar is okay, but multiple ?foo=bar&baz=foo is not. Ideally you would want to remove it completely.
2. No 3. This is a little more complicated b/c your logic no longer works after the first replacement b/c now you end with a &. I don't think this is the best approach for a variable number of /foo/bar/baz/vars. Check out the ForceType method instead, which will allow you to get all the variables as a string and parse them out. |
|
#3
|
|||
|
|||
|
Thanks for the reply.
However, my logic is still ok after the first iteration, because if you read it again, it doesn't rely on their being a slash at the end: Find the LAST occurance of 'anything' 'slash' 'anything' 'slash' 'anything-or-nothing' I will have a search on the ForceType method. Thanks for the tip. |
|
#4
|
||||
|
||||
|
The regular expression to match your literal description is (actually, it's anything, slash, anything except a slash, slash, optional anything, at the end (aka, last occurence)):
Code:
(.+)/([^/]+)/(.+)?$
__________________
# Jeremy Explain your problem instead of asking how to do what you decided was the solution. |
|
#5
|
|||
|
|||
|
Quote:
Thanks for that. I followed your advice on ForceType, and it works remarkably well A simple .htaccess entry, some simple PHP, and my content engine is working wonders.re: RewriteRule above. I don't know if that would work, because the 'optional anything at the end' could potentially contain another slash, so wouldn't it need to be: anything, slash, anything thats not a slash, slash, zero or more thats not a slash ? |
|
#6
|
||||
|
||||
|
Only if it was two slashes, b/c one would still match. It's then followed by anything, but that anything after the last slash is optional.
|
|
#7
|
|||
|
|||
|
Anybody that is looking for a simple solution to ugly URLs in the future and comes accross this post:
You can use this solution for ANY amount and type of URL variables. EXAMPLE: http://www.server.com/index/somepage.php/var1/value1/var2/value2/var3 REWRITTEN TO: http://www.server.com/somepage.php?var1=value1&var2=value2&var3=dummy .HTACCESS Add this to your .htaccess file: Code:
<Files index> ForceType application/x-httpd-php </Files> What this does is tell apache to treat the file 'index' (with no extension) as a PHP page, and force it to process that file. Apache (and PHP) then treat the url as: http://www.server.com/index INDEX Create a file called 'index' (with no extension) and place it in the webroot. Edit this file, and enter: PHP Code:
NOTES Now just make sure that you rework any URLs on your site to be in this format, and all will work fine ![]() Last edited by jharnois : May 5th, 2005 at 08:42 PM. Reason: Changed CODE tags to PHP tags. |
|
#8
|
|||
|
|||
|
jharnois:
Yes, but what stops this rule: (.+)/([^/]+)/(.+)?$ And this URL: http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3/43/ From producing the following match: $1 = http:/ $2 = www.server.com $3 = index.php?page/12/var1/hello/var2/goodbye/var3/43/ ?? $1 = anything slash $2 = anything that not a slash slash $3 = anything end When it should be finding the last match, not the first? |
|
#9
|
||||
|
||||
|
<discussion about="php" status="offtopic">
Quote:
PHP Code:
PHP Code:
Quote:
$1 = http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3 $2 = 43 $3 = Regular expressions are greedy by default. |
|
#10
|
|||
|
|||
|
ok, I believe you
![]() And thanks for the tip on trim, I didn't know about the second argument. On a related note: You can see that the URL contained in the page is formatted nicely for google. However, does google use that link it finds in the page, or does it try to resolve it, and then use the resulting link (i.e. After the redirect)? For example: <a href="http://www.server.com/index/mypage.php/name/bob/"> My Page </a> Would google index this as: http://www.server.com/index/mypage.php/name/bob/ Or would it follow the link, take the result (after redirect) and attempt to index that: http://www.server.com/mypage.php?name=bob ? |
|
#11
|
||||
|
||||
|
Google is no different than any other browser in relation to how page requests are handled by your server. Your server doesn't care who's getting the information (at least not in this context). Google just looks at it differently.
So, when you use mod_rewrite to internally rewrite one URL to another, the server serves up the content of the rewritten URL for the request to the original URL. IOW, internal rewrites via mod_rewrite will get indexed as requested (eg, .../name/bob/). There is an external redirect option w/ mod_rewrite (eg [R] flag). If you were to use this, then the indexed URL would be the rewritten target (eg, ?name=bob). If you were to use ForceType with PHP and include()s (instead of header()), Google would index the requested URL (eg .../name/bob/). However, using header() w/ Location I'm not too sure about. I think the originally requested URL will be indexed (eg .../name/bob/). |
|
#12
|
|||
|
|||
|
Thanks jharnois, it's your last line I am interested in
And I myself think thats how it happens too, but I want to be sure .You see, I have two options (I have tried both): 1) Instead of using a header for redirect, include the page that is being called. No this works all well and good except that you have to set the baseHref on each page (and each page that it calls after that, such as popups) otherwise you get missing images, missing css, missing js etc etc. 2) Use redirect, which means I don't have to edit my sites at all (I have multiple sites I am going to integrate this with you see) just edit the .htaccess and the php redirect script. BUT does google like this? Otherwise it is no use at all ![]() |