|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
SlickEdit: Code in over 40 languages across 7 platforms. SlickEdit’s unmatched power, speed, and flexibility allows even the most accomplished developers to write better code faster. Download a free trial today! |
|
#1
|
||||
|
||||
|
Hi all,
I have just been tasked here at work with creating a search engine for our 2000+ page intranet. We are running IIS 4.0 on a WinNT 4.0 platform. I would like to know if anyone out there knows of anything as far as web sites, books, etc. that deal with the theory behind creating an effective search engine. I, of course, first and foremost need to make it work here, but I would also like to make it something worthwhile that can ported elsewhere to other sites on other OSes, servers, etc. |
|
#2
|
|||
|
|||
|
there is good search engine scripts alredy, why reinvent the wheel? look at ht://dig (this is the name of the program, not an url
)some theory: make indecies of all words on all pages, put them into a binary tree and build a fault-tolerant search on it....
__________________
-- Manuel Hirsch - Linux, FreeBSD, programming, administration articles, tutorials and more. |
|
#3
|
|||
|
|||
|
Yeah, htdig is pretty nice and easy fast robust
dont' know about on your platform doh also swish-e |
|
#4
|
||||
|
||||
|
)Before I posted ht://dig was something I figured I would look into after I found it in a search of the forums. But, I also found a paper on the the original development of the google search engine and some of the methodology behind it. I think that building an effective, scalable search engine and database would be a (massively) challenging test for myself and anyone else who wants to help on the side (off company time that is).
I note that ht://dig is provided under the GNU.. good. Does anyone out there know if anyone has already ported it to Win32? |
|
#5
|
|||
|
|||
|
search engines aren't all that difficult... it's just using knocking down keywords using eregi(), then pulling then pulling it outa mysql. But then again... you said you were a perl person
![]()
__________________
You know your a web programmer when you see a '$' and think of PHP rather than money. |
|
#6
|
||||
|
||||
|
You're using IIS, why not just use the indexing service? You'll get the search engine running in a day and that'll be that.
|
|
#7
|
||||
|
||||
|
Quote:
You have GOT to be kidding here. Search engines are EXTREMELY difficult to do at any level above extremely basic. How about stemming words, searching for phrases, required/excluded terms and phrases, relevancy ranking and searching different types of files in different locations? Perl has FAR better tools for creating search engines than PHP anyway- do a quick little search of cpan and you'll see a bunch of different modules that give you a load of functionality. The one I use is DBIx::FullTextSearch, which is an inverted indexer that uses MySQL for the backend. It's VERY fast, the tables are extremely well optimized and it has a number of nice features like indexing files on the file system, web pages using LWP, and plain scalars of course. It's kind of bare-bones, but if you want/need to create your own, it's an EXCELLENT module. BTW, it supports all the stuff I mentioned above, and should would perfectly on win32, given that it's pure perl. That being said, I think you should look at htdig a little closer. It can extract text from different types of files automatically, and I'd be surprised if someone hasn't ported it to win32. If they haven't, set up a cheapy linux box, figure out samba and you should be good to go. |
|
#8
|
||||
|
||||
|
Quote:
Actually, IIS and Indexing Service are what prompted this request to begin with. We're so fed up with the problems we've had with the IIS server that I was basically told it was now my job to come up with a replacement for Indexing Service. I'm going to check out that CPAN module you mentioned Hero. I found this paper about the original Google engine, so it gave me a pretty good idea of the sort of things to take into consideration: Anatomy of a Search Engine I'm going to jump into the deep end here and start production on our own engine using perl. I'm sure that I'll be back in perl forum asking plenty of questions once I get started Thanks to everyone on this thread for pitching in ideas! |
|
#9
|
|||
|
|||
|
Quote:
It all depends on how your search engine searches. Mine, I've seen a lot of crappy search engines that count the number of times a word is on a page then display them in order that way, and a lot of times it's not what I want at all. So the way I did mine was controlling it by loading a bunch of keywords for a subject into an eregi() and then a variable at the end. The variable matches any one of those keywords. Then I tell the browser which linkes to display. But maybe this would be a harder way for someone using 1000+ pages, cause I didn't have that many. Anyways for my bit smaller site, it gave me more control on what showed up and better results, though I haven't quite finished it (cause I've took a break in making it for a while for personal reasons), I know the thing works through testing. And the great thign is the Eregi() already takes out words like 'of' cause it just sorts the string looking only for the keywords. |
|
#10
|
|||
|
|||
|
OMG, Wew *watches as the information flies over his head*
i didn't get any of that, but don't try to explain, i'll analyze and figure it out... i hope ![]() |
![]() |
| Viewing: Dev Shed Forums > Other > Dev Shed Lounge > Search Engine Design |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|