Search Engine Optimization
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsWeb DesignSearch Engine Optimization

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old April 24th, 2004, 11:44 AM
dustin999 dustin999 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2004
Posts: 5 dustin999 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Why doesn't Google spider through all of my links?

Why doesn't Google spider through all of my links? I have a site that sells various products, and each page has a common look and theme with each page being 95% similar to the previous (for the most part). Within this site are nested links between each page, comparable to what you might see on Amazon (i.e. the link that says "customers who purchased this product also liked these other products).

I probably have over 1000 unique pages, with each page linking to roughly 10-20 other unique pages. The site has been active for a couple of months now, and Google has spidered my page. However, it only spidered 40 of the pages and then stopped. It has been back several times but has not completed the spidering process, as I can see from my web log and search results on Google.

Does anyone have any ideas as to why Google might only spider through a portion of my site and not the whole thing? I think I've followed most of the guidelines listed here like having no more than 50 links per page (in my case it's no more than 20).

Thanks,
Dustin

Reply With Quote
  #2  
Old April 25th, 2004, 02:00 AM
dejaone dejaone is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2004
Posts: 300 dejaone User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 7 m 22 sec
Reputation Power: 5
You need to get some links from other sites and get a better PageRank values for your pages.

Reply With Quote
  #3  
Old April 25th, 2004, 09:40 AM
dustin999 dustin999 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2004
Posts: 5 dustin999 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Thanks dejaone, so what you're saying is, if you don't have as many links to your pages and the pagerank is low, then Google won't actually spider your entire site but will only selectively spider a few pages?

Reply With Quote
  #4  
Old April 25th, 2004, 10:13 AM
dejaone dejaone is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Mar 2004
Posts: 300 dejaone User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 8 h 7 m 22 sec
Reputation Power: 5
dustin,

that's correct. Someone estimated that there're 7 trillions of documents on the Web. Google indexed about 4.3 billions of HTML pages. All search engines have to make decisions what page to include into their index database. It won't make much difference if your pages aren't ranked well on search engine result pages even they are indexed by the search engines.

An excerpt from an article I wrote recently How Search Engines Work:
To crawl billions of pages effectively, a crawler needs to make two major challenging decisions:

1) What Page to Crawl - Each search engine uses different criteria to determine what pages to crawl. Google will not include a page if it's not linked by indexed page(s).
2) Frequency of Updating - Google updates pages with higher PageRank values more frequently and updates home page of a site on a daily basis.

dejaone

Last edited by dejaone : April 25th, 2004 at 10:15 AM.

Reply With Quote
  #5  
Old April 26th, 2004, 03:49 AM
webguy's Avatar
webguy webguy is offline
Power User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Apr 2004
Location: Canada
Posts: 180 webguy User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 9 m 13 sec
Reputation Power: 5
Send a message via AIM to webguy Send a message via Yahoo to webguy
it could be other things as well, such as your urls. Are they dynamic, and spider unfriendly...?

You said each page is 95% similar to the previous, if this is true then google won't index all your pages based on the fact that they are too similar.
Also, with that many pages it takes time.
I only have about 17 pages and it took google 2 months to get most of my pages indexed. In my case my pages are updated weekly, google didn't see the need to spider my website as often.

Give it some more time. See if google starts indexing more pages, if so then you have your answer..time.

Reply With Quote
  #6  
Old April 27th, 2004, 10:18 AM
softcell softcell is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2003
Location: Faridabad
Posts: 44 softcell User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 52 m 6 sec
Reputation Power: 5
Send a message via Yahoo to softcell
It depends on three factors

3 factors affect the frequency of googlebot crawling a site-
1. frequency of changes in the site.
2. External links to the site.
3. Static vs dynamic links. Google and all other search engines like static links.

Reply With Quote
  #7  
Old May 5th, 2004, 09:59 AM
amstel_za amstel_za is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2004
Posts: 239 amstel_za User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 Days 20 h 29 m 26 sec
Reputation Power: 5
Softcell,

Can I query 2 of your points?

1. frequency of changes in the site.

How does the bot determine this???

3. Static vs dynamic links. Google and all other search engines like static links.

What do you mean by dynamic links?? I have links created at runtime through php (reads a directory for link listing)..but it generates HTML...but does that mean it won't get through the other linked pages because they're not actual static, hardcoded HTML links?

Ben

Reply With Quote
  #8  
Old May 5th, 2004, 10:28 AM
softcell softcell is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2003
Location: Faridabad
Posts: 44 softcell User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 52 m 6 sec
Reputation Power: 5
Send a message via Yahoo to softcell
google bot revisits

Whenever bot revisits it queries last modified property of the files. Even if that property keeps changing that is enough.
For dynamic links- googlebot can follow dynamic links also. The easier is the link - link abc.asp or abc.php the easier it is to follow and it becomes more complex with the number of parameters in the querystring like abc.asp?x=sasy=sfd and like that. If you are using php files without paramters they are as easy to follow as html pages

Reply With Quote
  #9  
Old May 5th, 2004, 10:39 AM
amstel_za amstel_za is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2004
Posts: 239 amstel_za User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 Days 20 h 29 m 26 sec
Reputation Power: 5
Softcell,

Cheers, my php pages contain php info and don't pass parameters from link to link so that's okay...

Cheers,
Ben

Reply With Quote
  #10  
Old February 28th, 2008, 03:21 AM
sladeetal sladeetal is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2008
Posts: 1 sladeetal User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 10 m 25 sec
Reputation Power: 0
Why did't google index my links page?

The last time google did their update they didn't index my links page. My links page only has about 37 links and is not called linkshtml .They indexed 4 out of my 5 pages. Last time they did the update. They didn't index 2 of my other pages.
Right now I hanging at a pr3.

I have a question - should i move all my links to the indexed page or leave it and see?

Dave luomapinyin.com
my links page is the resource page.
Take a look and let me know what you think.

Reply With Quote
  #11  
Old March 8th, 2008, 09:28 AM
jazajay jazajay is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jul 2007
Posts: 224 jazajay User rank is Corporal (100 - 500 Reputation Level)jazajay User rank is Corporal (100 - 500 Reputation Level)jazajay User rank is Corporal (100 - 500 Reputation Level)jazajay User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 2 Days 4 h 17 m 4 sec
Reputation Power: 3
dustin999
As a lot of the correct reasons to your query have been overlooked we will go through them.
Quote:
common look and theme with each page being 95% similar to the previous

You are getting penalized for duplicate content. Plain and simple. Your pages need to be at least 35-40% different. Google doesn't want to fill it's index with pages it already has. As yours are similar it is not indexing them. It has to show it's site visitors relevant results quickly.

If it has a million pages the same it will slow it's algorithm down in determining which is the most relevant and the true owner and who have stolen the content.

To be honest I think you are being a little hard on your self, the customers who purchased this will be different, the product descriptions, images and alt text will all be different so say 85% at a ruff guess.

But you will still need to increase the amount of differences per page as this is an issue.

Quote:
Within this site are nested links between each page, comparable to what you might see on Amazon

Thats not an issue. Amazon, ebay gods knows how many other sites do it and don't have problems.
Quote:
However, it only spidered 40 of the pages and then stopped

this is an issue. I have a friend who has 2000+ pages he has 1 site map and got all his pages spidered within a week.

What techniques do you use on your site?
1. JavaScript links?
2. Parameters in the URL? If so how many?
3. Ajax generated content?
4. Flex? Flash site?
5. Do you have a robots.txt file?
6. Do you have a XML site map?
7. Black hat? Hidden links? Cloaking? keyword spamming? Page Jacking? Hug thats an old one.
8. Quality content or scraped?
9. You are been penalized for duplicate content.
10. When you validate your code how many errors do you get?
11. Do you have canonicalization issues?
12. Are nesercary redirects/domain's established?
13. Back links? How many? Spread though out the site?
14. Anchor text on any backlinks the same?
15. All reciprocal linking?
Quote:
I've followed most of the guidelines listed here like having no more than 50 links per page

Ha is that the latest myth going around?
Links wont matter, amount of words wont matter what matters is Search Engine Accessible pages. love it.

Don't get me wrong as it helps, but thats because internal pages get more equity from that page.

A page with equity of 100 000 and 10 links from it, be it internal or external is better than a page with 100 links from it.

So for example -
100 000/10 = 10 000 equity per linked page.

a page with 100 links on and has 100 000 equity would give 1000 equity to each page therefore not as good.

Love that I really do.
Right then where to start.

dejaone
Quote:
You need to get some links from other sites and get a better PageRank values for your pages.

Good advice, in general but not a reason why the pages are not being indexed. A page with PR of 0 can still get indexed and all the pages off it.
Quote:
that's correct. Someone estimated that there're 7 trillions of documents on the Web.

No thats not correct. Estimated pages out there is though.
Google will follow links that are accessible. It doesn't know what the PR of a page is until it retrieves it and the algorithm determines it from pre-set variables designated by the Guys and Gals at G, Yahoo, Live, Ask etc..

The reason a lot of pages are not indexed is because they are either 1 of the following:
1. Are not accessible.
1a. JavaSript dependant links
Quote:
<a href="javascript:....">

1b. Are flash or flex sites.
1c. The URl's from pages that link to those pages have a lot of parameters in the href - 6+.
1d. Are only accessible from forms
1e. Are login and CMS pages therefore blocked from the spiders.
1f. Secured via SSL
2. Are blocked by robots.txt, meta noindex, noindex header if it is PDF.
3. Are duplicate pages that already exist in the index. As in this case.

They account for millions of pages and I'm sure I have missed a few more reasons as well.
Quote:
Google will not include a page if it's not linked by indexed page(s).

Only because it is not been indexed. To me that sounds like you mean google knows about the page and wont crawl it until a index page links to it. If it's not linked to from a page in it's index it cant find the page plain and simple.
Quote:
Google updates pages with higher PageRank values more frequently

Reason?
Because pages with high PR/equity are linked to from lots of sites. or have links from sites that have lots of links to them so get crwled frequently as G follows links to those pages more often from other sources.
Quote:
updates home page of a site on a daily basis.

What? love it
No, show me in the web master guide lines where it says that is the case. The home page gets indexed more often granted but thats because it's a hub. All your internal linking should link to it and 70% of the time a lot of home pages get more external links pointing to them therefore they are found more often and indexed.

It's not because G personally indexes them.
Why would it personally index them if it cant index all those other pages in the world? Love it.

WebGuy
Quote:
it could be other things as well, such as your urls. Are they dynamic, and spider unfriendly...?

You said each page is 95% similar to the previous, if this is true then google won't index all your pages based on the fact that they are too similar.
Also, with that many pages it takes time.

exactly.

Softcell
Quote:
1. frequency of changes in the site.
2. External links to the site.

I'll give you those 2.
Frequency of change, helps indexing if the content is unquie. Take news sites that update every hour for example.

External links are followed so if there are a lot they get followed more often to those pages. So yeah spot on.
Quote:
Static vs dynamic links. Google and all other search engines like static links.

No sorry. G like pages with content.
Dynamic pages tend to have problems if they contain 5-6+ parameters.

G can index 4 with out a problem maybe 5 but after that it gives up. I could go into why.

Do you think G like pages that have to be manually added by hand what about product pages. You have 5 pages. 1 item you stop selling so manually have to move up the other 4 just for rankings? If thats the case why do my competitors do better than me on some keywords or do well at all as they are all database driven.

amstel_za
Quote:
Cheers, my php pages contain php info and don't pass parameters from link to link so that's okay...

As long as your pages don't contain more than 5 it wouldn't be an issue.

sladeetal
Quote:
The last time google did their update they didn't index my links page

And your complaining. You are not bleeding equity this is a good thing. You don't want it indexed.
Quote:
Right now I hanging at a pr3.

Means very little. It indicates how well you are doing nothing more. You can rank higher than sites with higher PR than you.

The link you provided -

luomapinyin dot com

Shows 48ish pages indexed and most are PDF's
Quote:
should i move all my links to the indexed page or leave it and see?

Internal or external links?

Hope that clears a lot up.
Jaza

Reply With Quote
  #12  
Old March 8th, 2008, 02:23 PM
seo_marketing seo_marketing is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Feb 2008
Location: india
Posts: 33 seo_marketing User rank is Sergeant (500 - 2000 Reputation Level)seo_marketing User rank is Sergeant (500 - 2000 Reputation Level)seo_marketing User rank is Sergeant (500 - 2000 Reputation Level)seo_marketing User rank is Sergeant (500 - 2000 Reputation Level)seo_marketing User rank is Sergeant (500 - 2000 Reputation Level) 
Time spent in forums: 2 h 51 m
Reputation Power: 6
You need to concentrate on creating an on-page guidemap of your site on each page.

take for example your page Web Promotion Software should have navigation links given below :-

HOME > ARTICLES > DOWNLOAD > Web Promotion Software

This could help to minimize this issue.

Along with that you should make sure that you are not duplicating content in between your product pages. This could turn-off google completely

Reply With Quote
  #13  
Old May 10th, 2008, 02:21 AM
bloggersmosaic bloggersmosaic is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2008
Posts: 12 bloggersmosaic User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 28 m 27 sec
Reputation Power: 0
thanks guys for all these info

Reply With Quote
  #14  
Old Yesterday, 12:23 PM
danushman danushman is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2008
Location: Chicago, IL
Posts: 29 danushman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 1 h 39 m 38 sec
Reputation Power: 0
You should sign up for Google Webmaster Tools. They will give you some clarity as to what Google bot actually sees.

One thing to consider, you may have to tell it to spider certain pages... Google (lol) for the term "robots.txt"

There are also meta tags you can use to increase spider activity.



Quote:
Originally Posted by dustin999
Why doesn't Google spider through all of my links? I have a site that sells various products, and each page has a common look and theme with each page being 95% similar to the previous (for the most part). Within this site are nested links between each page, comparable to what you might see on Amazon (i.e. the link that says "customers who purchased this product also liked these other products).

I probably have over 1000 unique pages, with each page linking to roughly 10-20 other unique pages. The site has been active for a couple of months now, and Google has spidered my page. However, it only spidered 40 of the pages and then stopped. It has been back several times but has not completed the spidering process, as I can see from my web log and search results on Google.

Does anyone have any ideas as to why Google might only spider through a portion of my site and not the whole thing? I think I've followed most of the guidelines listed here like having no more than 50 links per page (in my case it's no more than 20).

Thanks,
Dustin

Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb DesignSearch Engine Optimization > Why doesn't Google spider through all of my links?


Thread Tools  Search this Thread 
Email this Page