Scripts
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
FaxWave - Free Trial.
Go Back   Dev Shed ForumsWeb Site ManagementScripts

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #1  
Old September 16th, 2003, 02:54 PM
cliffyman cliffyman is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Location: Washington DC, USA
Posts: 156 cliffyman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 26 m 7 sec
Reputation Power: 8
Send a message via AIM to cliffyman
recommendation : PHP search engine

Can anyone recommend a good PHP search engine? Features that would be nice are:

- built-in HTTP spider
- ranking of results
- page exclusion list

Thanks very much everyone!

Reply With Quote
  #2  
Old September 16th, 2003, 03:03 PM
drgroove's Avatar
drgroove drgroove is offline
pushing envelopes, not pencils
Dev Shed God 2nd Plane (6000 - 6499 posts)
 
Join Date: Feb 2002
Posts: 6,223 drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 1 Day 4 h 32 m 57 sec
Reputation Power: 174
Moved to Scripts forum from PHP.
__________________
Give a person code, and they'll hack for a day; Teach them how to code, and they'll hack forever.
Analyze twice; hack once.
The world's first existential ITIL question: If a change is released into production without a ticket to track it,
was it actually released?


About DrGroove: ITIL-Certified IT Process Engineer - Enterprise Application Architect -
Freelance IT Journalist - Devshed Moderator - Funk Bassist Extraordinaire


Reply With Quote
  #3  
Old September 16th, 2003, 03:31 PM
cliffyman cliffyman is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Location: Washington DC, USA
Posts: 156 cliffyman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 26 m 7 sec
Reputation Power: 8
Send a message via AIM to cliffyman
Wow, I never came across hotscripts.com before...

http://www.hotscripts.com/PHP/Scrip...ines/index.html

Thanks for that useful signature drgroove I still wouldn't mind any comments anyone might have - thanks again!

Reply With Quote
  #4  
Old September 16th, 2003, 03:58 PM
drgroove's Avatar
drgroove drgroove is offline
pushing envelopes, not pencils
Dev Shed God 2nd Plane (6000 - 6499 posts)
 
Join Date: Feb 2002
Posts: 6,223 drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level)drgroove User rank is First Lieutenant (10000 - 20000 Reputation Level) 
Time spent in forums: 1 Day 4 h 32 m 57 sec
Reputation Power: 174
Quote:
Originally posted by cliffyman
Wow, I never came across hotscripts.com before...

http://www.hotscripts.com/PHP/Scrip...ines/index.html

Thanks for that useful signature drgroove I still wouldn't mind any comments anyone might have - thanks again!


Wow - well, cool, glad you found hotscripts then! Let us know which search engine you settle on...

Reply With Quote
  #5  
Old September 16th, 2003, 11:40 PM
cliffyman cliffyman is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2001
Location: Washington DC, USA
Posts: 156 cliffyman User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 26 m 7 sec
Reputation Power: 8
Send a message via AIM to cliffyman
Right now the front runner looks to be:

http://www.digvid.info/isearch/home.php

Short feature list:
Spider engine written in PHP - there are no binaries to run on the server.
Runs in PHP safe mode.
Performs simple page match scoring and ranking.
Can spider subdomains or multiple domains.
Allows multiple spider entry points.
Versatile page inclusion/exclusion, including robots.txt parsing with Google extensions.
Parts of pages (e.g. menus) can be excluded from indexing.
Can generate a site map automatically from the search index.
Auto Spider feature ensures that your search index is kept up-to-date automatically, even if you cannot run "cron" jobs on your web server.

Commercial fee is 35 bucks which is well well worth it for what it does. I'll let you guys know when I get it up and running later this week...


-Cliff

Reply With Quote
  #6  
Old October 3rd, 2003, 03:01 PM
trigger trigger is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: May 2003
Location: Minnesota
Posts: 19 trigger User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
how bout this?

I'm just having a slight problem with indexing all pages. I think its a doctype problem , because it echoes back the whole site, but doesn't put all pages into in the database table. Decent code nonetheless. I'm planning on rating results based on if the term matches the title, if not the description if not the keywords, etc. Cool spider though, I just need a little final tweak. Can someone help me out please.

PHP Code:
require_once('Connections/Connection.php');
mysql_select_db($database_Connection$Connection);
//first opening of directory to extract filenames, keywords, and descriptions
$dh  opendir("C:\mywebfolder");
while (
false !== ($filename readdir($dh))) {    
    if ((
preg_match('/htm/'$filename)) || (preg_match('/sun(.*)php/'$filename))){
    
$handle fopen($filename"rb");
        
$tags get_meta_tags ($filename);
        
$description $tags['description'];
        
$keywords =  $tags['keywords'];
    echo 
"$filename : $description : $keywords \n";
    
$buffer fread($handlefilesize($filename));
            
//get the title
            
preg_match('#<title>(.*)</title>#isU'$buffer$match); 
            
$title $match[1];
            echo 
"$title \n";
            
//get the content
            
preg_match('#<body(.*)</body>#isU'$buffer$bodymatch);
            
$bodytemp $bodymatch[1];
            
//start of function to strip punctuation and tags out
            
$search = array ("'<script[^>]*?>.*?</script>'si",  // Strip out javascript
                 
"'<[\/\!]*?[^<>]*?>'si",           // Strip out html tags
                 
"'([\r\n])[\s]+'",                 // Strip out white space
                 
"'&(quot|#34);'i",                 // Replace html entities
                 
"'&(amp|#38);'i",
                 
"'&(lt|#60);'i",
                 
"'&(gt|#62);'i",
                 
"'&(nbsp|#160);'i",
                 
"'&(iexcl|#161);'i",
                 
"'&(cent|#162);'i",
                 
"'&(pound|#163);'i",
                 
"'&(copy|#169);'i",
                 
"'&#(\d+);'e");                    // evaluate as php

            
$replace = array ("",
                  
"",
                  
"\\1",
                  
"\"",
                  
"&",
                  
"<",
                  
">",
                  
" ",
                  
chr(161),
                  
chr(162),
                  
chr(163),
                  
chr(169),
                  
"chr(\\1)");

$body1 preg_replace ($search$replace$bodytemp);

$body3 str_replace('/'''$body1); 
 

//end of function
        
$body strip_tags($body3);
        
$content stripslashes($body);
        echo 
"$content \n\n";
        
//$result = mysql_query ("SELECT page_id FROM fullsearch WHERE page_url = $filename");
        //$row = mysql_fetch_assoc($result);
        //    if (mysql_num_rows($row) == "1"){
        //        $sqlquery = "UPDATE fullsearch set page_title = '$title', page_description = '$description', page_keywords = '$keywords', page_content = '$content'
        //        WHERE page_url = '$filename' ";
        //        $results = mysql_query($sqlquery);
            //}
            //else{
                
$sqlquery "INSERT INTO fullsearch (page_id, page_url, page_title, page_description, page_keywords, page_content) 
                VALUES ('','$filename', '$title', '$description','$keywords', '$content')"
;
                
$results mysql_query($sqlquery);    
            
//}
        
}
        else{
        }


Reply With Quote
Reply

Viewing: Dev Shed ForumsWeb Site ManagementScripts > recommendation : PHP search engine


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
Accelerating Trading Partner Performance
One in five. That's how many partner transactions have at least one error. That is an amazing statistic, particularly given the extraordinary leaps in innovation across the global supply chain during the past two decades. Download this white paper to learn more.

 
Competing on Analytics
This Tech Analysis is designed to help identify characteristics shared by analytics competitors, and includes information about 32 organizations that have made a commitment to quantitative, fact-based analysis.

 
Cost Effective Scaling with Virtualization and Coyote Point Systems
An overview of the industry trend toward virtualization, how server consolidation has increased the importance of application uptime and the steps being taken to integrate load balancing technology with virtualized servers.

 
Five Checkpoints to Implementing IP Telephony
Implementation planning for IP PBX software and IP telephony has become vital as businesses replace discontinued legacy PBX phone systems. This informative whitepaper outlines five "checkpoints" for any implementation plan that will help make IP communications a successful proposition.

 
Hosted Email Security: Staying Ahead of New Threats
In the last two years, email has become a fierce battleground between the nefarious forces of spam and malware, and the heroes of messaging protection. The spam volumes increased alarmingly every month, bringing clever new forms of phishing and virus propagation attacks.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway