November 5th, 2013, 03:36 PM
Simple search engine for a help manual
I have the table which represents a help manual for an application:
- id (int)
- parent_id (int) (references id)
- topic (varchar)
- keywords (text?)
- content (text)
The user will provide one or more search words, and I would like to return the matches in the order based on the following criteria.
- Best match on keywords
- On tie, best match on topic
- On tie, best match on content
It should not be case sensitive. Wildcard should be used to allow a search word of “auto” to match “automotive”, however, if fairly easy, I would like the full word to have a better rank than just the start of a word. Best match for keywords and topics is based on the number of user provided search words that are also included in the keyword and topic columns, respectively, and if fairly easily, I would like phrases to have more weight. Best match for content will be more based on the number of times the search word is included in the content.
Is it possible to do such a query? If so, how? If not, how would you recommend doing in multiple queries with application logic?
MATCH/AGAIST should surely be used for checking search words against content. Should it also be used for searching against keywords and title, or should I use LIKE?
Also, should I just add keywords space separated in the keywords column, or put them in a separate table and JOIN them?
November 6th, 2013, 01:39 AM
Usually when you need full-text search over your database, you need not implement it from scratch. It is because you will find it necessary to implement significant number of features, complex queries, indexing etc.
Instead there exist some search engines which you can use. Some of them are free, for example Sphinx - it would take some time to understand how to use it - but if you need search over your manuals - it is most preferable approach.
November 6th, 2013, 09:38 PM
For probably silly reasons, I still would like to do the query directly. Any thoughts how to do so?
November 7th, 2013, 03:33 AM
Well. You obviously can search for a single exact substring with "LIKE". However, if you have any reasonable number of pages, it would not be effective.
So you need to create basic search engine yourself. You'll need to start with indexing your data - collecting all words in a single list and appending to each the IDs of documents where these words exist.
Then if you are given a query of two words for example, you get the list of documents for both of them and calculate intersection or union of these document sets depending on aggregation function.
You can also use inexact matching with words in a list or search for word fragments in this index - it will work anyway.
The core idea is that you are not searching through your documents, but instead use prepared index. Of course index should be updated each time when you add a new document.
Shortly speaking that is how all search engines work - either web-scale ones or small in-app variants.
Comments on this post