|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
Why doesn't Google spider through all of my links?
Why doesn't Google spider through all of my links? I have a site that sells various products, and each page has a common look and theme with each page being 95% similar to the previous (for the most part). Within this site are nested links between each page, comparable to what you might see on Amazon (i.e. the link that says "customers who purchased this product also liked these other products).
I probably have over 1000 unique pages, with each page linking to roughly 10-20 other unique pages. The site has been active for a couple of months now, and Google has spidered my page. However, it only spidered 40 of the pages and then stopped. It has been back several times but has not completed the spidering process, as I can see from my web log and search results on Google. Does anyone have any ideas as to why Google might only spider through a portion of my site and not the whole thing? I think I've followed most of the guidelines listed here like having no more than 50 links per page (in my case it's no more than 20). Thanks, Dustin |
|
#2
|
|||
|
|||
|
You need to get some links from other sites and get a better PageRank values for your pages.
__________________
Wedding Gifts | Web Development | Order Fulfllment | Supply Chain | E-Business | Add to 100 SEO Friendly Directories fast do it yourself |
|
#3
|
|||
|
|||
|
Thanks dejaone, so what you're saying is, if you don't have as many links to your pages and the pagerank is low, then Google won't actually spider your entire site but will only selectively spider a few pages?
|
|
#4
|
|||
|
|||
|
dustin,
that's correct. Someone estimated that there're 7 trillions of documents on the Web. Google indexed about 4.3 billions of HTML pages. All search engines have to make decisions what page to include into their index database. It won't make much difference if your pages aren't ranked well on search engine result pages even they are indexed by the search engines. An excerpt from an article I wrote recently How Search Engines Work: To crawl billions of pages effectively, a crawler needs to make two major challenging decisions: 1) What Page to Crawl - Each search engine uses different criteria to determine what pages to crawl. Google will not include a page if it's not linked by indexed page(s). 2) Frequency of Updating - Google updates pages with higher PageRank values more frequently and updates home page of a site on a daily basis. dejaone Last edited by dejaone : April 25th, 2004 at 10:15 AM. |
|
#5
|
||||
|
||||
|
it could be other things as well, such as your urls. Are they dynamic, and spider unfriendly...?
You said each page is 95% similar to the previous, if this is true then google won't index all your pages based on the fact that they are too similar. Also, with that many pages it takes time. I only have about 17 pages and it took google 2 months to get most of my pages indexed. In my case my pages are updated weekly, google didn't see the need to spider my website as often. Give it some more time. See if google starts indexing more pages, if so then you have your answer..time.
__________________
Alberta Custom Website Designs Websites for small businesses Support Forums and Freeware Tools Weekly Web News, PC News, HardWare News etc.. |
|
#6
|
|||
|
|||
|
It depends on three factors
3 factors affect the frequency of googlebot crawling a site-
1. frequency of changes in the site. 2. External links to the site. 3. Static vs dynamic links. Google and all other search engines like static links. |
|
#7
|
|||
|
|||
|
Softcell,
Can I query 2 of your points? 1. frequency of changes in the site. How does the bot determine this??? 3. Static vs dynamic links. Google and all other search engines like static links. What do you mean by dynamic links?? I have links created at runtime through php (reads a directory for link listing)..but it generates HTML...but does that mean it won't get through the other linked pages because they're not actual static, hardcoded HTML links? Ben |
|
#8
|
|||
|
|||
|
google bot revisits
Whenever bot revisits it queries last modified property of the files. Even if that property keeps changing that is enough.
For dynamic links- googlebot can follow dynamic links also. The easier is the link - link abc.asp or abc.php the easier it is to follow and it becomes more complex with the number of parameters in the querystring like abc.asp?x=sasy=sfd and like that. If you are using php files without paramters they are as easy to follow as html pages |
|
#9
|
|||
|
|||
|
Softcell,
Cheers, my php pages contain php info and don't pass parameters from link to link so that's okay... Cheers, Ben |
|
#10
|
|||
|
|||
|
Why did't google index my links page?
The last time google did their update they didn't index my links page. My links page only has about 37 links and is not called linkshtml .They indexed 4 out of my 5 pages. Last time they did the update. They didn't index 2 of my other pages.
Right now I hanging at a pr3. I have a question - should i move all my links to the indexed page or leave it and see? Dave luomapinyin.com my links page is the resource page. Take a look and let me know what you think. |
|
#11
|
|||||||||||||||||||
|
|||||||||||||||||||
|
dustin999
As a lot of the correct reasons to your query have been overlooked we will go through them. Quote:
You are getting penalized for duplicate content. Plain and simple. Your pages need to be at least 35-40% different. Google doesn't want to fill it's index with pages it already has. As yours are similar it is not indexing them. It has to show it's site visitors relevant results quickly. If it has a million pages the same it will slow it's algorithm down in determining which is the most relevant and the true owner and who have stolen the content. To be honest I think you are being a little hard on your self, the customers who purchased this will be different, the product descriptions, images and alt text will all be different so say 85% at a ruff guess. But you will still need to increase the amount of differences per page as this is an issue. Quote:
Thats not an issue. Amazon, ebay gods knows how many other sites do it and don't have problems. Quote:
this is an issue. I have a friend who has 2000+ pages he has 1 site map and got all his pages spidered within a week. What techniques do you use on your site? 1. JavaScript links? 2. Parameters in the URL? If so how many? 3. Ajax generated content? 4. Flex? Flash site? 5. Do you have a robots.txt file? 6. Do you have a XML site map? 7. Black hat? Hidden links? Cloaking? keyword spamming? Page Jacking? Hug thats an old one. 8. Quality content or scraped? 9. You are been penalized for duplicate content. 10. When you validate your code how many errors do you get? 11. Do you have canonicalization issues? 12. Are nesercary redirects/domain's established? 13. Back links? How many? Spread though out the site? 14. Anchor text on any backlinks the same? 15. All reciprocal linking? Quote:
Ha is that the latest myth going around? Links wont matter, amount of words wont matter what matters is Search Engine Accessible pages. love it. Don't get me wrong as it helps, but thats because internal pages get more equity from that page. A page with equity of 100 000 and 10 links from it, be it internal or external is better than a page with 100 links from it. So for example - 100 000/10 = 10 000 equity per linked page. a page with 100 links on and has 100 000 equity would give 1000 equity to each page therefore not as good. Love that I really do. Right then where to start. dejaone Quote:
Good advice, in general but not a reason why the pages are not being indexed. A page with PR of 0 can still get indexed and all the pages off it. Quote:
No thats not correct. Estimated pages out there is though. Google will follow links that are accessible. It doesn't know what the PR of a page is until it retrieves it and the algorithm determines it from pre-set variables designated by the Guys and Gals at G, Yahoo, Live, Ask etc.. The reason a lot of pages are not indexed is because they are either 1 of the following: 1. Are not accessible. 1a. JavaSript dependant links Quote:
1b. Are flash or flex sites. 1c. The URl's from pages that link to those pages have a lot of parameters in the href - 6+. 1d. Are only accessible from forms 1e. Are login and CMS pages therefore blocked from the spiders. 1f. Secured via SSL 2. Are blocked by robots.txt, meta noindex, noindex header if it is PDF. 3. Are duplicate pages that already exist in the index. As in this case. They account for millions of pages and I'm sure I have missed a few more reasons as well. Quote:
Only because it is not been indexed. To me that sounds like you mean google knows about the page and wont crawl it until a index page links to it. If it's not linked to from a page in it's index it cant find the page plain and simple. Quote:
Reason? Because pages with high PR/equity are linked to from lots of sites. or have links from sites that have lots of links to them so get crwled frequently as G follows links to those pages more often from other sources. Quote:
What? love it No, show me in the web master guide lines where it says that is the case. The home page gets indexed more often granted but thats because it's a hub. All your internal linking should link to it and 70% of the time a lot of home pages get more external links pointing to them therefore they are found more often and indexed. It's not because G personally indexes them. Why would it personally index them if it cant index all those other pages in the world? Love it. WebGuy Quote:
exactly. Softcell Quote:
I'll give you those 2. Frequency of change, helps indexing if the content is unquie. Take news sites that update every hour for example. External links are followed so if there are a lot they get followed more often to those pages. So yeah spot on. Quote:
No sorry. G like pages with content. Dynamic pages tend to have problems if they contain 5-6+ parameters. G can index 4 with out a problem maybe 5 but after that it gives up. I could go into why. Do you think G like pages that have to be manually added by hand what about product pages. You have 5 pages. 1 item you stop selling so manually have to move up the other 4 just for rankings? If thats the case why do my competitors do better than me on some keywords or do well at all as they are all database driven. amstel_za Quote:
As long as your pages don't contain more than 5 it wouldn't be an issue. sladeetal Quote:
And your complaining. You are not bleeding equity this is a good thing. You don't want it indexed. Quote:
Means very little. It indicates how well you are doing nothing more. You can rank higher than sites with higher PR than you. The link you provided - luomapinyin dot com Shows 48ish pages indexed and most are PDF's Quote:
Internal or external links? Hope that clears a lot up. Jaza |
|
#12
|
|||
|
|||
|
You need to concentrate on creating an on-page guidemap of your site on each page.
take for example your page Web Promotion Software should have navigation links given below :- HOME > ARTICLES > DOWNLOAD > Web Promotion Software This could help to minimize this issue. Along with that you should make sure that you are not duplicating content in between your product pages. This could turn-off google completely ![]() |
|
#13
|
|||
|
|||
|
thanks guys for all these info
|
|
#14
|
|||
|
|||
|
You should sign up for Google Webmaster Tools. They will give you some clarity as to what Google bot actually sees.
One thing to consider, you may have to tell it to spider certain pages... Google (lol) for the term "robots.txt" There are also meta tags you can use to increase spider activity. Quote:
|
![]() |
| Viewing: Dev Shed Forums > Web Design > Search Engine Optimization > Why doesn't Google spider through all of my links? |
| Thread Tools | Search this Thread |