#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2009
    Posts
    21
    Rep Power
    0

    MYSQL with MATCH AGAINST() to return set of relevant substrings, not the entire row


    Help please, I desperately need some help with this.

    I am doing a full text search in MySql, someting like this:

    SELECT postcontent,
    MATCH(posttitle, postcontent) AGAINST ('set of keywords' IN BOOLEAN MODE) as relevance
    FROM table
    WHERE MATCH(posttitle, postcontent) AGAINST ('set of keywords' IN BOOLEAN MODE)
    ORDER BY relevance DESC;

    I need to get not the entire postcontent result but a set of substrings that match the keywords. It should either be a stored function or do it inline the query.

    For example the result I'm looking for a function to do something like this :
    lets say the keywords are: ' used "jaguar xjr" '.

    Lets say the existing query(above) returns the following content: "blah blah on sale for great price used cars blah blah....special offer including a jaguar xjr which blah blah".

    The function should cut this string into substrings like:
    'great price used cars' 'offer including a jaguar xjr'


    Any help would be really appreciated!Thanks.Steve
  2. #2
  3. data transfer technician
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2006
    Location
    Halifax, NS
    Posts
    564
    Rep Power
    134
    This is a problem I'd approach in the application layer rather than the database layer. Essentially, for each keyword or keyphrase, you'd construct a regex that selects X characters on either side of the match to the nearest word border.

    The reason I suggest doing it this way is that if your match hits on more than one keyword for a particular record, you'd either end up with multiple rows (one for each hit), or you'd have group the rows. Either way I think it might make it difficult to have useful relevance scores.
    planning takes time... not planning takes more time.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2009
    Posts
    21
    Rep Power
    0
    Originally Posted by dnagirl
    This is a problem I'd approach in the application layer rather than the database layer. Essentially, for each keyword or keyphrase, you'd construct a regex that selects X characters on either side of the match to the nearest word border.

    The reason I suggest doing it this way is that if your match hits on more than one keyword for a particular record, you'd either end up with multiple rows (one for each hit), or you'd have group the rows. Either way I think it might make it difficult to have useful relevance scores.
    Thanks DnaGirl, I cannot figure how to construct such a regex,please can you give an example?Thanks
  6. #4
  7. data transfer technician
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2006
    Location
    Halifax, NS
    Posts
    564
    Rep Power
    134
    Code:
    /\b.{0,15}\b$keyword\b.{0,15}\b/
    Have a look here for some regex tutorials
    planning takes time... not planning takes more time.

IMN logo majestic logo threadwatch logo seochat tools logo