Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb DesignSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old May 5th, 2005, 05:41 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
Search engine friendly URLs

Hi,

I am trying to come up with a way to make my URLs search engine friendly, yet also very flexible in terms of URL variables. When I say search engine I mean google - thats all I'm really interested in . The first variable is always 'page=...', so I have come up with the following scheme.

Original URL:
http://www.server.com/index.php?page=12&var1=hello&var2=goodbye&var3=43

Friendly URL:
http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3/43/

Question (1) - Is the '?' after index.php going to cause a problem for google?
Question (2) - Is the face that there is no '.html' or '.php' at the end going to cause a problem for google?

From my searching google it seems that neither of these are a problem, but then I'm not sure.

Now, the mod rewrite. The rules are:

1) IF the url contains the string 'page/'....
2) Find the LAST occurance of 'anything' 'slash' 'anything' 'slash' 'anything-or-nothing'
3) Replace this with 'anything' '=' 'anything' '&'

This would recursively break down the URL like so:

http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3/43/

$1 = http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3
$2 = 43
$3 = [nothing]

result:
http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3=43&

Following on from this:
http://www.server.com/index.php?page/12/var1/hello/var2=goodbye&var3=43&
http://www.server.com/index.php?page/12/var1=hello&var2=goodbye&var3=43&
http://www.server.com/index.php?page=12&var1=hello&var2=goodbye&var3=43&

At this point the condition ...IF the url contains the string 'page/' ... no longer fits and we have the new URL.

Question (3) how (using mod rewrite in .htaccess file) do I find the rule for - Find the LAST occurance of 'anything' 'slash' 'anything' 'slash' 'anything-or-nothing' ?

Any help/feedback greatly appreciated,
Cheers,
Pea

Reply With Quote
  #2  
Old May 5th, 2005, 06:02 PM
jharnois's Avatar
jharnois jharnois is offline
mod_dev_shed
Dev Shed God 19th Plane (14000 - 14499 posts)
 
Join Date: Sep 2002
Location: Atlanta, GA
Posts: 14,299 jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 1 Week 2 Days 1 h 54 m 17 sec
Reputation Power: 760
1. I have read accross the forums that one ?foo=bar is okay, but multiple ?foo=bar&baz=foo is not. Ideally you would want to remove it completely.
2. No
3. This is a little more complicated b/c your logic no longer works after the first replacement b/c now you end with a &.

I don't think this is the best approach for a variable number of /foo/bar/baz/vars. Check out the ForceType method instead, which will allow you to get all the variables as a string and parse them out.

Reply With Quote
  #3  
Old May 5th, 2005, 06:41 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
Thanks for the reply.

However, my logic is still ok after the first iteration, because if you read it again, it doesn't rely on their being a slash at the end:

Find the LAST occurance of 'anything' 'slash' 'anything' 'slash' 'anything-or-nothing'

I will have a search on the ForceType method. Thanks for the tip.

Reply With Quote
  #4  
Old May 5th, 2005, 07:05 PM
jharnois's Avatar
jharnois jharnois is offline
mod_dev_shed
Dev Shed God 19th Plane (14000 - 14499 posts)
 
Join Date: Sep 2002
Location: Atlanta, GA
Posts: 14,299 jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 1 Week 2 Days 1 h 54 m 17 sec
Reputation Power: 760
The regular expression to match your literal description is (actually, it's anything, slash, anything except a slash, slash, optional anything, at the end (aka, last occurence)):
Code:
(.+)/([^/]+)/(.+)?$
__________________
# Jeremy

Explain your problem instead of asking how to do what you decided was the solution.

Reply With Quote
  #5  
Old May 5th, 2005, 07:54 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
Quote:
Originally Posted by jharnois
The regular expression to match your literal description is (actually, it's anything, slash, anything except a slash, slash, optional anything, at the end (aka, last occurence)):
Code:
(.+)/([^/]+)/(.+)?$


Thanks for that. I followed your advice on ForceType, and it works remarkably well A simple .htaccess entry, some simple PHP, and my content engine is working wonders.

re: RewriteRule above.
I don't know if that would work, because the 'optional anything at the end' could potentially contain another slash, so wouldn't it need to be:

anything, slash, anything thats not a slash, slash, zero or more thats not a slash ?

Reply With Quote
  #6  
Old May 5th, 2005, 08:05 PM
jharnois's Avatar
jharnois jharnois is offline
mod_dev_shed
Dev Shed God 19th Plane (14000 - 14499 posts)
 
Join Date: Sep 2002
Location: Atlanta, GA
Posts: 14,299 jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 1 Week 2 Days 1 h 54 m 17 sec
Reputation Power: 760
Only if it was two slashes, b/c one would still match. It's then followed by anything, but that anything after the last slash is optional.

Reply With Quote
  #7  
Old May 5th, 2005, 08:07 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
Anybody that is looking for a simple solution to ugly URLs in the future and comes accross this post:

You can use this solution for ANY amount and type of URL variables.

EXAMPLE:
http://www.server.com/index/somepage.php/var1/value1/var2/value2/var3

REWRITTEN TO:
http://www.server.com/somepage.php?var1=value1&var2=value2&var3=dummy

.HTACCESS
Add this to your .htaccess file:
Code:
<Files index> 
ForceType application/x-httpd-php 
</Files>

What this does is tell apache to treat the file 'index' (with no extension) as a PHP page, and force it to process that file. Apache (and PHP) then treat the url as:
http://www.server.com/index

INDEX
Create a file called 'index' (with no extension) and place it in the webroot. Edit this file, and enter:
PHP Code:
<?php
$rewrite 
$_SERVER["REQUEST_URI"];
// Remove leading and trailing slashes
if (substr($rewrite,0,1)=="/"){ $rewrite substr($rewrite,1); }
if (
substr($rewrite,strlen($rewrite)-1,1)=="/"){ $rewrite substr($rewrite,0,strlen($rewrite)-1); }
// Split into an array
$rewrite explode("/",$rewrite);
// If there are an uneven number of keys and values, add a 'dummy' value to the end
if ((count($rewrite)%2)==0){ $rewrite[]="dummy"; }
// The page to call is the first array value
$location $rewrite[0]."?";
// The remaining array values are key-value pairs
$count floor(count($rewrite)/2);
for (
$i=0$i<$count$i++){
$location.=$rewrite[$i*2+1]."=".$rewrite[$i*2+2]."&";
}
// Strip the last character (which is either a & or a ?)
$location substr($location,0,strlen($location)-1);
// Redirect the page
Header("Location: /$location"); 
?>


NOTES
Now just make sure that you rework any URLs on your site to be in this format, and all will work fine

Last edited by jharnois : May 5th, 2005 at 08:42 PM. Reason: Changed CODE tags to PHP tags.

Reply With Quote
  #8  
Old May 5th, 2005, 08:13 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
jharnois:

Yes, but what stops this rule:
(.+)/([^/]+)/(.+)?$

And this URL:
http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3/43/

From producing the following match:
$1 = http:/
$2 = www.server.com
$3 = index.php?page/12/var1/hello/var2/goodbye/var3/43/

??

$1 = anything
slash
$2 = anything that not a slash
slash
$3 = anything
end

When it should be finding the last match, not the first?

Reply With Quote
  #9  
Old May 5th, 2005, 08:54 PM
jharnois's Avatar
jharnois jharnois is offline
mod_dev_shed
Dev Shed God 19th Plane (14000 - 14499 posts)
 
Join Date: Sep 2002
Location: Atlanta, GA
Posts: 14,299 jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 1 Week 2 Days 1 h 54 m 17 sec
Reputation Power: 760
<discussion about="php" status="offtopic">
Quote:
PHP Code:
if (substr($rewrite,strlen($rewrite)-1,1)=="/"
There is no need for the strlen(), just use -1 and don't use the optional third argument:
PHP Code:
if(substr($rewrite,-1) == '/'
However, we can replace both lines and the six function calls with one call to trim():
PHP Code:
 trim($rewrite,'/'); 
</discussion>
Quote:
Yes, but what stops this rule:
(.+)/([^/]+)/(.+)?$

...
The dollar sign ($), which means the match is at the end. So it should end up like this (using your example URL):

$1 = http://www.server.com/index.php?page/12/var1/hello/var2/goodbye/var3
$2 = 43
$3 =

Regular expressions are greedy by default.

Reply With Quote
  #10  
Old May 6th, 2005, 05:22 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
ok, I believe you
And thanks for the tip on trim, I didn't know about the second argument.

On a related note:
You can see that the URL contained in the page is formatted nicely for google. However, does google use that link it finds in the page, or does it try to resolve it, and then use the resulting link (i.e. After the redirect)?

For example:
<a href="http://www.server.com/index/mypage.php/name/bob/"> My Page </a>

Would google index this as:
http://www.server.com/index/mypage.php/name/bob/

Or would it follow the link, take the result (after redirect) and attempt to index that:
http://www.server.com/mypage.php?name=bob

?

Reply With Quote
  #11  
Old May 6th, 2005, 06:06 PM
jharnois's Avatar
jharnois jharnois is offline
mod_dev_shed
Dev Shed God 19th Plane (14000 - 14499 posts)
 
Join Date: Sep 2002
Location: Atlanta, GA
Posts: 14,299 jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level)jharnois User rank is Major General (70000 - 90000 Reputation Level) 
Time spent in forums: 1 Month 1 Week 2 Days 1 h 54 m 17 sec
Reputation Power: 760
Google is no different than any other browser in relation to how page requests are handled by your server. Your server doesn't care who's getting the information (at least not in this context). Google just looks at it differently.

So, when you use mod_rewrite to internally rewrite one URL to another, the server serves up the content of the rewritten URL for the request to the original URL. IOW, internal rewrites via mod_rewrite will get indexed as requested (eg, .../name/bob/). There is an external redirect option w/ mod_rewrite (eg [R] flag). If you were to use this, then the indexed URL would be the rewritten target (eg, ?name=bob).

If you were to use ForceType with PHP and include()s (instead of header()), Google would index the requested URL (eg .../name/bob/).

However, using header() w/ Location I'm not too sure about. I think the originally requested URL will be indexed (eg .../name/bob/).

Reply With Quote
  #12  
Old May 6th, 2005, 06:12 PM
Peter.vullings Peter.vullings is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2001
Location: Palmerston North, New Zealand
Posts: 49 Peter.vullings User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 3 h 25 m 9 sec
Reputation Power: 7
Thanks jharnois, it's your last line I am interested in And I myself think thats how it happens too, but I want to be sure .

You see, I have two options (I have tried both):
1) Instead of using a header for redirect, include the page that is being called. No this works all well and good except that you have to set the baseHref on each page (and each page that it calls after that, such as popups) otherwise you get missing images, missing css, missing js etc etc.
2) Use redirect, which means I don't have to edit my sites at all (I have multiple sites I am going to integrate this with you see) just edit the .htaccess and the php redirect script. BUT does google like this? Otherwise it is no use at all

Reply With Quote