Page 1 of 7 123 ... Last
  • Jump to page:
    #1
  1. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Pretty simple but ...


    I was toying with what you have for one of my sites (http://www.puregaming.net) since the entire site is generated based upon several mysql tables..the problem is its several mysql tables ;) have any ideas for a more broad database wide searching...like if i have a "review" table for game reviews and have atable for game "previews" the citations would be different because each table has different row names and etc as well
    <br>
    <br>
    Any input would be awsome : )
    <br>
    <br>
    Eric
  2. #2
  3. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Pretty simple but ...


    I have that too. For instance, at KnowPost, I have one table for questions, and another for answers, but I index them both. I put the words from both in the same table though, and add another column. So instead of:
    word|id#
    its
    word|id#|q(if it came from a question)
    and
    word|id#|a(if it came from an answer)

  4. #3
  5. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Pretty simple but ...


    I recently developed a database project consisting of articles that the client posts through an online administartion tool. The article titles, links, etc. are stored within a database, but rather than try to store random amounts of text (inefficiently, I suspect) in a database, I store a unique article ID number that points to a raw txt file in a protected directory on the site. This pointer is simply a 12 digit unique ID.
    <br>
    <br>
    When the articles get displayed (ie: articles.html?id=12345) the relevant information for article 12345 is pulled up and the file named "12345" (no need for suffixes) in the protected directory gets dumped onto the screen.
    <br>
    <br>
    I still haven\'t delved completely into ht://dig, but if your articles/stoires are displayed using a simple GET method through a display script (a la the articles.html?id=xx construct mentioned above), you could write a script that generates links to all of the articles in the database, or articles within a subsection of the database. You could then point ht://dig at that (hidden if need be) generated page and it would then follow the links.
    <br>
    <br>
    If you articles are pulled up in a different fashion, however, this may complicate things, and require further parsing.
    <br>
    <br>
    Just a few thoughts...
    <br>
    <br>
    -- Chris
  6. #4
  7. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Performance problem - and solution


    When using the described approach with plenty of data, the number of records in the table will quickly become huge, one record for every (non-noise) word in every article...

    To index 28000 articles with approximately 64 relevant words in each the word table would contain 1.8 million records and even mySQL would take some time to search this table (despite indexing).

    Many records contain the same word and to get rid of these doublettes, another table is added. In short my solutions contain:

    - one word list table with every unique word and a unique id for each word

    - one word-id-to-article-ids table where the article id\'s consist of 32-bit article id\'s stored as an array of binary values in a blob (not suitable for PHP but works very well with a small search application written in C and run from PHP3 using system)

    The result is a very quick search function and the expected time complexity when the number of articles grow (ordo) is O(log n) instead of O(n) (linear).

    On a measly 140 MHz Ultra Sparc the search completes in 0.05 seconds with a word list containing 66551 unique words from 29648 telegrams. I discard too common words but even with this there would have been 1 898 959 records with the solution described in the article.

    My search program also perform wildcard searches and multiple search words and exclusions. With a separate word list and article id table, article titles can be searched.

    Guess systems like AltaVista use even more sophisticated algorithms, but the critical point is to switch from O(n) to O(log n) and I\'ve succeeded in doing that with this still rather simple search function.

    Regards,
    Erik L.
  8. #5
  9. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Pretty simple but ...


    That is basically what I did...and the htdig works well but its something I still end up running from crontab nightly (hence the search engine is updated nightly) rather than being truly dynamically generated based upon the content-I have writers that write or add things via my backend during different points in the day and It would be great to set it up so I could in one select get all matches from across tables rather tahn doing a select and matching % for each table and making it so i would always have previews listed first then reviews, then hardware etc..and have some kind of order of matches or etc

    Eric
  10. #6
  11. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Performance problem - and solution


    I agree that your approach is the answer for large databases with huge numbers of keywords repeated over and over in different records. Can you give us a database and code example for your solution? Thanks
  12. #7
  13. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Performance problem - and solution


    Hi,
    <br>
    <br>
    About a year ago, I ran across a similar problem in terms of indexing a keyword search across about 2GB of text data... So, I determined that it would be a good idea to build an index of the unique, important words in the body of text. I ended up with a very similar solution - at the time I was using mSQL (miniSQL) with it's built in server-side scripting engine, Lite.
    <br>
    <br>
    The table that I ended up with for indexing looked more like this, with 2 fields in the table.
    <br>
    <br>
    word - varchar(50)
    <br>
    memo
    <br>
    <br>
    The word was essentially the index (I didn't see as assigning a UID to each unique, important word would make a difference), and the memo field contained a comma delimited list of the linked records in which that word appeared. When I did a query, I supported an ANY or ALL radio button on the HTML form. This caused my script on the server to handle the search results in two different ways:
    <br>
    <br>
    1> ANY - just query for the memo fields attached to that word, and combine the lists - stripping out duplicate entries (though I suppose, I could have counted duplicates, and used it as a weighting for relevance).
    <br>
    <br>
    2> ALL - looped thru the lists, and returned only links to records in which the ID appeared in all the memo fields.
    <br>
    <br>
    It seemed to work pretty good, was fairly fast, and it wasn't difficult to add something new to the index:
    <br>
    <br>
    Parse the article word by word, make a query against the index table, to see if the current word is in it, if it is, append the ID of the article (or record) to the memo field - else, create a new record and start the memo field with one value.
    <br>
    <br>
    - PW
    <br>
    <br>
  14. #8
  15. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Performance problem - and solution


    Sam, I've implemented the technique I described as two C-programs for a customer. I've been thinking of make it a bit more general and then "productifying" it, probably GPL:ed or something but haven't got around to that yet...
  16. #9
  17. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Performance problem - and solution


    Hi Porter, Your scheme sounds almost identical. The reason I use separate tables for word+uid and uid+blob instead of a single table for word+blob might be due to a misconception. I thought it would take longer time to locate matching words in a large table where there are blob fields. With intelligent storage of blobs (like a pointer into a separate blob-data-file) a search not including the blob would be quick anyway.

    However, mySQL doesn\'t store blobs in a spearate file, so maybe some time is wasted on seeking past plenty of blob data and the disc has to spin further.

    Of course, just looking up a word is not a matter of searching as the index is used (and each index is stored in a separate file thereby avoiding any blob handling), but I also allow words with wildcards using like-searching in the word list and there are cases where the index can\'t be used. A small word list table is good in these cases, but otherwise a single table solution like Yours would be simpler and give the same performance.

    Regards,
    Erik L.
  18. #10
  19. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    error in ereg_replace ?cannot add ?



    <!-- Code -->
    <p><pre><font color=#000000><xmp>
    $querywords = ereg_replace("?","",$querywords);
    $querywords = ereg_replace("(","",$querywords);
    </xmp></font></pre><p>
    <!-- Code -->

    Output as below : ?

    Warning: REG_BADRPT in /var/apache/www/htdocs/khng/makesearch.php on line 43

    Warning: REG_EPAREN in /var/apache/www/htdocs/khng/makesearch.php on line 44

    i am trying to run the code as it is given.
    however, the error above appeared.
    is this caused by server setting or ?
  20. #11
  21. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Array of qids...


    I think I'm going to implement something like you've set forth in this article for my discussion board product.
    <br>
    <br>
    As you know, discussion boards are notoriously text-heavy, for good reason! I think what I will do instead of a unique row for each word/qid pair, I may have one unique row for each unique word and pair each unique word with a comma or pipe delimited list of matching qids.
    <br>
    <br>
    I can search for the word(s), have the engine return the lists which I'll explode into an array(s). Then I can compare arrays and give higher scores to those that contain more than one word of a phrase.
    <br>
    <br>
    Would this work? Obviously handling a search engine this way is no good for common phrases and doesn't weight for higher frequency of a word in the same post either.
    <br>
    <br>
    But it should be quick and not grow too too fast... I hope. :)
    <br>
    <br>
    I don't know much about MYSQL indexing- would my schema not work with indexing? I'm thinking it would require two blob or text fields- can they be indexed?
    <br>
  22. #12
  23. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: error in ereg_replace ?cannot add ?


    Hi,
    <br>
    <br>
    I don't think so, I have same problem.
    <br>
    Does anyone know answer?
    <br>
    <br>
    Regards
    <br>
    <br>
    Bartek 'Milar' Milewski
    <br>
  24. #13
  25. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: error in ereg_replace ?cannot add ?


    Try using something like the following:
    <br>
    <br>
    $querywods = ereg_replace("[\?\(\)\.\^\$]","",$querywords);
    <br>
    <br>
    This will replace the lines in the example that check for the ?, (, ), ., ^, and $.
    <br>
    <br>
    The problem is that many characters are special characters in reqular expression syntax (namely the ?, parenthesis (), period ., carrot ^, dollar $ and probably a few others). These characters must be "escaped" by preceding them with backslashes so they are interpreted literally and not as special characters.
    <br>
    <br>
    Wrapping the whole thing in brackets ( the [ and ]), is a standard way to format a reqular expression so that any of the characters will match. An example:
    <br>
    <br>
    $querywords = ereg_replace("[wxyz]", "A", $querywords);
    <br>
    <br>
    will replace any of the characters w x y or z with a capital A in the string $querywords.
  26. #14
  27. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Re: Performance problem - and solution


    Hello !

    Ive just tryed both types with php3 and mysql.
    first i tried to make a word table:
    (word varchar(50), wordid int)
    and a reference table:
    (wordid, reference)

    but reaching over 1 million entries in the reference table slowed the search too much. so i tried to combine the two tables into one:
    (word varchar(50), referencelist longtext)

    this speeded up the search nearly 4 times.

    so sorting and decoding the referencelist with php3 gives much more performance than doing this with an sql query in a seperate table with mysql.

    ciao
    stefan
  28. #15
  29. No Profile Picture
    guest
    Guest
    Devshed Newbie (0 - 499 posts)

    Counting search




    How can i make a mysql-query that counts some word from a string.

    For ex. if i got a string "guru guru guru guru" and i want to count how many times that "guru" is on that string. How can i do this?

    I already tried function count(*) but it only give eather 0 or 1.

    I would love to have any help for this!
Page 1 of 7 123 ... Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo