#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Location
    Washington DC, USA
    Posts
    156
    Rep Power
    14

    recommendation : PHP search engine


    Can anyone recommend a good PHP search engine? Features that would be nice are:

    - built-in HTTP spider
    - ranking of results
    - page exclusion list

    Thanks very much everyone!
  2. #2
  3. Moderator Emeritus
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2002
    Location
    Austin, TX
    Posts
    7,188
    Rep Power
    2265
    Moved to Scripts forum from PHP.
    DrGroove, Devshed Moderator | New to Devshed? Read the User Guide | Connect with me on LinkedIn
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Location
    Washington DC, USA
    Posts
    156
    Rep Power
    14
    Wow, I never came across hotscripts.com before...

    http://www.hotscripts.com/PHP/Script...nes/index.html

    Thanks for that useful signature drgroove I still wouldn't mind any comments anyone might have - thanks again!
  6. #4
  7. Moderator Emeritus
    Devshed Supreme Being (6500+ posts)

    Join Date
    Feb 2002
    Location
    Austin, TX
    Posts
    7,188
    Rep Power
    2265
    Originally posted by cliffyman
    Wow, I never came across hotscripts.com before...

    http://www.hotscripts.com/PHP/Script...nes/index.html

    Thanks for that useful signature drgroove I still wouldn't mind any comments anyone might have - thanks again!
    Wow - well, cool, glad you found hotscripts then! Let us know which search engine you settle on...
    DrGroove, Devshed Moderator | New to Devshed? Read the User Guide | Connect with me on LinkedIn
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2001
    Location
    Washington DC, USA
    Posts
    156
    Rep Power
    14
    Right now the front runner looks to be:

    http://www.digvid.info/isearch/home.php

    Short feature list:
    Spider engine written in PHP - there are no binaries to run on the server.
    Runs in PHP safe mode.
    Performs simple page match scoring and ranking.
    Can spider subdomains or multiple domains.
    Allows multiple spider entry points.
    Versatile page inclusion/exclusion, including robots.txt parsing with Google extensions.
    Parts of pages (e.g. menus) can be excluded from indexing.
    Can generate a site map automatically from the search index.
    Auto Spider feature ensures that your search index is kept up-to-date automatically, even if you cannot run "cron" jobs on your web server.

    Commercial fee is 35 bucks which is well well worth it for what it does. I'll let you guys know when I get it up and running later this week...


    -Cliff
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2003
    Location
    Minnesota
    Posts
    19
    Rep Power
    0

    how bout this?


    I'm just having a slight problem with indexing all pages. I think its a doctype problem , because it echoes back the whole site, but doesn't put all pages into in the database table. Decent code nonetheless. I'm planning on rating results based on if the term matches the title, if not the description if not the keywords, etc. Cool spider though, I just need a little final tweak. Can someone help me out please.

    PHP Code:
    require_once('Connections/Connection.php');
    mysql_select_db($database_Connection$Connection);
    //first opening of directory to extract filenames, keywords, and descriptions
    $dh  opendir("C:\mywebfolder");
    while (
    false !== ($filename readdir($dh))) {    
        if ((
    preg_match('/htm/'$filename)) || (preg_match('/sun(.*)php/'$filename))){
        
    $handle fopen($filename"rb");
            
    $tags get_meta_tags ($filename);
            
    $description $tags['description'];
            
    $keywords =  $tags['keywords'];
        echo 
    "$filename : $description : $keywords \n";
        
    $buffer fread($handlefilesize($filename));
                
    //get the title
                
    preg_match('#<title>(.*)</title>#isU'$buffer$match); 
                
    $title $match[1];
                echo 
    "$title \n";
                
    //get the content
                
    preg_match('#<body(.*)</body>#isU'$buffer$bodymatch);
                
    $bodytemp $bodymatch[1];
                
    //start of function to strip punctuation and tags out
                
    $search = array ("'<script[^>]*?>.*?</script>'si",  // Strip out javascript
                     
    "'<[\/\!]*?[^<>]*?>'si",           // Strip out html tags
                     
    "'([\r\n])[\s]+'",                 // Strip out white space
                     
    "'&(quot|#34);'i",                 // Replace html entities
                     
    "'&(amp|#38);'i",
                     
    "'&(lt|#60);'i",
                     
    "'&(gt|#62);'i",
                     
    "'&(nbsp|#160);'i",
                     
    "'&(iexcl|#161);'i",
                     
    "'&(cent|#162);'i",
                     
    "'&(pound|#163);'i",
                     
    "'&(copy|#169);'i",
                     
    "'&#(\d+);'e");                    // evaluate as php

                
    $replace = array ("",
                      
    "",
                      
    "\\1",
                      
    "\"",
                      
    "&",
                      
    "<",
                      
    ">",
                      
    " ",
                      
    chr(161),
                      
    chr(162),
                      
    chr(163),
                      
    chr(169),
                      
    "chr(\\1)");

    $body1 preg_replace ($search$replace$bodytemp);

    $body3 str_replace('/'''$body1); 
     

    //end of function
            
    $body strip_tags($body3);
            
    $content stripslashes($body);
            echo 
    "$content \n\n";
            
    //$result = mysql_query ("SELECT page_id FROM fullsearch WHERE page_url = $filename");
            //$row = mysql_fetch_assoc($result);
            //    if (mysql_num_rows($row) == "1"){
            //        $sqlquery = "UPDATE fullsearch set page_title = '$title', page_description = '$description', page_keywords = '$keywords', page_content = '$content'
            //        WHERE page_url = '$filename' ";
            //        $results = mysql_query($sqlquery);
                //}
                //else{
                    
    $sqlquery "INSERT INTO fullsearch (page_id, page_url, page_title, page_description, page_keywords, page_content) 
                    VALUES ('','
    $filename', '$title', '$description','$keywords', '$content')";
                    
    $results mysql_query($sqlquery);    
                
    //}
            
    }
            else{
            }


IMN logo majestic logo threadwatch logo seochat tools logo