Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb DesignSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old December 30th, 2005, 01:47 AM
websolutions04 websolutions04 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Apr 2004
Posts: 10 websolutions04 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 18 h 24 m 35 sec
Reputation Power: 0
Question Utility of Robots.txt

HI Friends,
What is use of Robots.txt in context of SEO ?

Thanks

Reply With Quote
  #2  
Old December 30th, 2005, 01:44 PM
kaskudoo's Avatar
kaskudoo kaskudoo is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2004
Location: east coast
Posts: 221 kaskudoo User rank is Private First Class (20 - 50 Reputation Level)kaskudoo User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 2 Days 13 h 25 m 39 sec
Reputation Power: 5
Send a message via AIM to kaskudoo Send a message via MSN to kaskudoo Send a message via Yahoo to kaskudoo
i typed this question into google and got following piece of text out of the results:
Quote:
Originally Posted by Donna from pixel2life.com
Why do I need this robots.txt file anyway?


A great reason to use a robots.txt file is actually the fact that many search engines, including Google, post suggestions for the public to make use of this tool. Why is it such a big deal that Google teaches people about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don't agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There's only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you.

There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that's where your home page is).

On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it. Not sure how true that is, but hey, why not be on the safe side?

Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site.

Don't I want my site indexed? Why stop robots?


Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control.

Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots?

Here are some scenarios:

1. Unfinished site

You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time.

2. Security

Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry.

3. Privacy

You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc.

4. Doorway pages

Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages.

5. Bad bot, bad bot, what'cha gonna do...

You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world.

6. Your site gets overwhelmed

In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it.

Reply With Quote
  #3  
Old January 19th, 2006, 01:40 PM
rehash rehash is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Nov 2005
Posts: 164 rehash User rank is Sergeant (500 - 2000 Reputation Level)rehash User rank is Sergeant (500 - 2000 Reputation Level)rehash User rank is Sergeant (500 - 2000 Reputation Level)rehash User rank is Sergeant (500 - 2000 Reputation Level)rehash User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 11 h 25 m 35 sec
Reputation Power: 8
the idea is: you dont care about robots.txt unless you want to restrict spiders to get some files from your site

Reply With Quote
  #4  
Old January 19th, 2006, 07:41 PM
sonjay sonjay is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2003
Location: Sunny Florida
Posts: 77 sonjay User rank is Corporal (100 - 500 Reputation Level)sonjay User rank is Corporal (100 - 500 Reputation Level)sonjay User rank is Corporal (100 - 500 Reputation Level)sonjay User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 5 Days 22 h 39 m 2 sec
Reputation Power: 8
The above-quoted discussion of robots.txt needs some clarification and/or additions:

Quote:
There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it.


This is the first time I've ever heard of this "unconfirmed rumor." Certainly I've never heard it in relation to G, Y or any of the major or semi-major players.

Quote:
Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc.


Direct access to your cgi-bin shouldn't be permitted regardless. Robots.txt won't keep out anyone who wants to go there.

Quote:
You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc.


Then those directories should be password-protected or otherwise blocked. Listing them in robots.txt merely tells the world where to find them.

Quote:
You might want to exclude robots whose known purpose is to collect email addresses


Again, block 'em with .htaccess or in your httpd.conf. Those bad bots don't obey robots.txt, so it's ineffective for this purpose.

Reply With Quote
  #5  
Old January 20th, 2006, 11:57 PM
TAK's Avatar
TAK TAK is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2004
Location: North America
Posts: 147 TAK User rank is Private First Class (20 - 50 Reputation Level)TAK User rank is Private First Class (20 - 50 Reputation Level) 
Time spent in forums: 1 Day 21 h 27 m 50 sec
Reputation Power: 5
Send a message via AIM to TAK Send a message via MSN to TAK
Quote:
Originally Posted by sonjay
This is the first time I've ever heard of this "unconfirmed rumor." Certainly I've never heard it in relation to G, Y or any of the major or semi-major players.


I would still recommend to at least create a robots.txt file. I find it quite annoying to find "robots.txt not found" over and over again in log files. Give the bots the file, and they will be happy.

Reply With Quote
  #6  
Old January 21st, 2006, 04:05 AM
sonjay sonjay is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2003
Location: Sunny Florida
Posts: 77 sonjay User rank is Corporal (100 - 500 Reputation Level)sonjay User rank is Corporal (100 - 500 Reputation Level)sonjay User rank is Corporal (100 - 500 Reputation Level)sonjay User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 1 Week 5 Days 22 h 39 m 2 sec
Reputation Power: 8
Oh, I agree completely, and I create a robots.txt for all the sites I manage. I was just saying that I've never heard of a robot "getting annoyed" at not finding one, and going away because of their annoyance.

Quote:
Originally Posted by TAK
I would still recommend to at least create a robots.txt file. I find it quite annoying to find "robots.txt not found" over and over again in log files. Give the bots the file, and they will be happy.

Reply With Quote
  #7  
Old January 29th, 2006, 06:24 PM
andymoo's Avatar
andymoo andymoo is offline
Timelord
Dev Shed Novice (500 - 999 posts)
 
Join Date: Oct 2003
Location: Loughborough, Leicestershire
Posts: 605 andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level)andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level)andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level)andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level)andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level)andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level)andymoo User rank is Second Lieutenant (5000 - 10000 Reputation Level) 
Time spent in forums: 4 Days 5 h 53 m 46 sec
Reputation Power: 63
It's funny I should see this here tonight as I was reading http://www.searchengineworld.com/ro...ts_tutorial.htm yesterday and some of the things mentioned here are covered over there.

Give them a robots.txt one reason is your log file gets cluttered if you don't, a 404 will make the server question how to present a 404 so you're giving your hardware one more thing to think about, then the search engine is going to get garb and use much more bandwidth than a robots.txt would take up.

Final reason. Friendly spiders take notice of it as it makes it easy for them, spiders are our friends, they bring us traffic so let's treat them as friends and give them what they want.
__________________
Andy Moore << oh no it's got a blog.....
Word Press WAP Plugin with Ad Mob Advertising revenue
PHP developer
deploying ringtones, mp3 downloads and realtones
I'm a geek who's obsessed with stats and gadgets

Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb DesignSearch Engine Optimization > Utility of Robots.txt


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 3 hosted by Hostway