#1
  1. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533

    Simple search engine for a help manual


    I have a help manual for a site. It is based a single table called "topics" with contains maybe 50 records (which are pages) and has the following columns:
    • id (int)
    • topics_id (int) (references id)
    • name (varchar)
    • content (text)

    So for a given topic, I would display on the page the following:
    • Name of the topic
    • A bread crumb of the topic's parents (i.e. root->primaryTopic->subTopic)
    • The actual help content typically with links to other help pages
    • A list of related subtopics

    I now wish to add a search window so the users may find relevant pages, and have implemented the following:
    Code:
    SELECT * FROM topics WHERE MATCH(content, name) AGAINST(? IN BOOLEAN MODE)
    While it works, it definitely can be improved primary for the following:
    • No ranking for relevance. Maybe the number or matches can be used, or maybe I should add a "keyword" column?
    • Doesn't handle related words. For instance, searching for "dog" doesn't match "dogs".

    Now, I am not trying to invent Google, but need to do better. Any suggestions, strategies, advice, etc, etc?
  2. #2
  3. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    IN BOOLEAN MODE removes the sorting. You need to add it back in, per the manual:
    Code:
    SELECT *, MATCH(content, name) AGAINST(? IN BOOLEAN MODE) AS `score` FROM topics WHERE MATCH(content, name) AGAINST(? IN BOOLEAN MODE) ORDER BY score
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Thanks Dan,

    I don't know why I was using IN BOOLEAN MODE in the first place.

    Any other ideas?

    What about the keyword column? Will work the same way, but I can specifically list words I want.

    Also, do you think taking account internal links from one page to another will be too complicated? I can create a CRON job which will crawl my little site daily and create some sort of index.

    Lastly, how about dealing with singular and plural? For instance, "Kitten" and "Kittens" are totally different. Note that "Dog" doesn't even registrar as the default minimum characters is 4.
  6. #4
  7. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    Any other ideas about...what?

    Singulars and plurals are harder, especially irregular plurals. You can fix the normal plurals by putting an asterisk on the end of every word, but "cactus -> cacti" can't be fixed without implementing a stemming algorithm
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Originally Posted by ManiacDan
    Any other ideas about...what?
    Making a reasonably good and reasonably simple search engine.

    Good idea about fixing normal plurals with an asterisk. Before doing so, I probably should crop any "s"'s off the search word, no? Unless the word is only 4 characters which probably shouldn't be cropped. I feel that the additional complexity for accounting for cacti and moose out way the benefits.

    Any other ideas like your asterisk idea?

    Thank you
  10. #6
  11. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    There are plenty of words which legitimately end in S without being plurals. You used one in your post: Unless. Others just on this page: Miscellaneous, cactus, removes, contains, matches, etc.

    The reasonably good search engine I made did the following:
    1) Added an asterisk after every word
    2) Added a plus before every word
    3) Allowed for groupings using quotes and parens (mysql now does this automatically)
    4) For each separate group of terms in the query, also return a score for that specific term, then suggest alternate searches by removing words (like Ebay does) which don't match anything,
    5) use a separate system to suggest alternate spellings which do return results, especially since there were a lot of proper nouns in my database.

    My very first devshed threads were about these problems.

    Comments on this post

    • NotionCommotion agrees
    • c-cat agrees
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Perfect. Eactly the kind of ideas I was looking for. If anyone else has other ideas, please pipe in.

    Originally Posted by ManiacDan
    My very first devshed threads were about these problems.
    I took a look. Ahh, the early years of the world renown ManiacDan
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Sep 2013
    Posts
    11
    Rep Power
    0
    I may be way off base here, but what about throwing in the old simple method of something like this...

    Code:
    $type_array = $_SESSION['type'];
    $type_implode = implode("%' OR events.EVTYPE LIKE '%",$type_array);
    $type = "(events.EVTYPE LIKE '%".$type_implode."%')";
    
    $cnty = "events.EVCNTY LIKE '%".$_SESSION['cnty']."%'";
    
    $city_array = $_SESSION['city'];
    $city_implode = implode("%' OR events.EVCITY LIKE '%",$city_array);
    $city = "(events.EVCITY LIKE '%".$city_implode."%')";
    
    $sql1 = "SELECT 
    events.D,events.E,events.V,events.start,events.end,events.evdesc,
    e_profile.ent,e_profile.eloc,
    v_profile.ven,v_profile.vloc
    FROM events,e_profile,v_profile
    WHERE events.E = e_profile.E
    AND events.V = v_profile.V
    AND $cnty
    AND $city
    AND $type
    ORDER BY start
    LIMIT 0,1";
    ...for your variations on key words.
    I realize you'll end up with results pertaining to things like dogma and catsup, but it depends on how close to perfect you want to get. It's slightly less complicated.
    I really like ManiacDan's #4!
  16. #9
  17. Sarcky
    Devshed Supreme Being (6500+ posts)

    Join Date
    Oct 2006
    Location
    Pennsylvania, USA
    Posts
    10,846
    Rep Power
    6351
    Because LIKE is orders of magnitude slower than MATCH AGAINST. It's seriously, painfully slow.
    HEY! YOU! Read the New User Guide and Forum Rules

    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin

    "The greatest tragedy of this changing society is that people who never knew what it was like before will simply assume that this is the way things are supposed to be." -2600 Magazine, Fall 2002

    Think we're being rude? Maybe you asked a bad question or you're a Help Vampire. Trying to argue intelligently? Please read this.

IMN logo majestic logo threadwatch logo seochat tools logo