|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Stay one step ahead of the competition. Evaluate and give feedback
on some of the hottest web development tools on the market today.
Make your opinion heard! Click
Here
|
|
#1
|
|||
|
|||
|
Does anyone know how an Internet Spider works? I understand the whole concept that the spider goes out and "crawls" the internet sites and looks for meta tag, html code, and things of that nature. I know that it follows links from one page to another, depending on how deep the "spider" software is told to search. My question is how does it know where to search? How does it acutally find the pages?
|
|
#2
|
|||
|
|||
|
seed it
you seed the spider with a list of url's to attack. then it adds to the list itself as it traverses pages (usually only hitting content-types of html or text).
the tricky parts come in revisiting pages and updating the indexes that result from collating keywords. to do it right, spiders are written in C/C++ for highest performance. lots of open-source and commercial products out there to do spidering. =========================================== http://badblue.com/helpphp.htm Free small footprint web server for Windows PHP, P2P file-sharing, transcoding and more =========================================== |
|
#3
|
|||
|
|||
|
Do you seed it with a range of IP addresses? I don't actually have to code this. I'm just trying to understand the whole process better.
|
|
#4
|
|||
|
|||
|
found one
Found a php based spider... no idea how good it is...
http://phpdig.toiletoine.net/ =========================================== http://badblue.com/helpphp.htm Free small footprint web server for Windows PHP, P2P file-sharing, transcoding and more =========================================== |
![]() |
| Viewing: Dev Shed Forums > Other > Beginner Programming > Internet Spiders |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|