Beginner Programming
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOtherBeginner Programming

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #1  
Old June 11th, 2001, 02:36 PM
Karate_Chick Karate_Chick is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2000
Posts: 12 Karate_Chick User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Question Internet Spiders

Does anyone know how an Internet Spider works? I understand the whole concept that the spider goes out and "crawls" the internet sites and looks for meta tag, html code, and things of that nature. I know that it follows links from one page to another, depending on how deep the "spider" software is told to search. My question is how does it know where to search? How does it acutally find the pages?

Reply With Quote
  #2  
Old June 14th, 2001, 06:57 AM
ame12 ame12 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2000
Posts: 23 ame12 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 22 m 26 sec
Reputation Power: 0
seed it

you seed the spider with a list of url's to attack. then it adds to the list itself as it traverses pages (usually only hitting content-types of html or text).

the tricky parts come in revisiting pages and updating the indexes that result from collating keywords. to do it right, spiders are written in C/C++ for highest performance. lots of open-source and commercial products out there to do spidering.


===========================================
http://badblue.com/helpphp.htm
Free small footprint web server for Windows
PHP, P2P file-sharing, transcoding and more
===========================================

Reply With Quote
  #3  
Old June 14th, 2001, 07:02 AM
Karate_Chick Karate_Chick is offline
Junior Member
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2000
Posts: 12 Karate_Chick User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Do you seed it with a range of IP addresses? I don't actually have to code this. I'm just trying to understand the whole process better.

Reply With Quote
  #4  
Old June 16th, 2001, 08:41 AM
ame12 ame12 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Dec 2000
Posts: 23 ame12 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 22 m 26 sec
Reputation Power: 0
found one

Found a php based spider... no idea how good it is...

http://phpdig.toiletoine.net/


===========================================
http://badblue.com/helpphp.htm
Free small footprint web server for Windows
PHP, P2P file-sharing, transcoding and more
===========================================

Reply With Quote
Reply

Viewing: Dev Shed ForumsOtherBeginner Programming > Internet Spiders


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway