#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2005
    Posts
    125
    Rep Power
    10

    Review posts when created. Bayesian filter?


    Hey peeps,

    I've been working on a site what allows users to post data and I was trying to find a good way to setup a post review system that will flag inappropriate posts. I've had a look around and was thinking of trying to use a bayesian filter? Is this for more like spam or would it serve as a good filter for bad posts. My understanding of these filters is they get smarter over time as you flag more words. Can anyone recommend a good implementation or even a better idea.

    What i hope to happen is. User submits post. Filter runs if it doesn't pass it gets flagged for a manual review. If the manual review fails then add its words to the database etc for for future posts.

    Is this normally how this process works with reviewing systems?

    Thanks in advance.
  2. #2
  3. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    I've had a look around and was thinking of trying to use a bayesian filter? Is this for more like spam or would it serve as a good filter for bad posts.
    A bayesian filter doesn't necessarily look for spam, it just looks for "similar" messages. So if your bad posts are similar to one another and not too similar to good posts, then a bayesian filter may work well for you, otherwise it probably won't.

    However, that depends on your definition of good and bad. For example, if your application consisted of users asking a math problem, like 5+3, and other users submitting answers in the form of numbers, like 8, then a bayesian filter would be utterly useless in identifying incorrect answers. You could however use it to filter answers that were not numbers.

    My understanding of these filters is they get smarter over time as you flag more words.
    A bayesian filter does "learn" over time, but you flag entire messages, not individual words.

    Is this normally how this process works with reviewing systems?
    No, reviewing systems are usually manual, or at least partially manual, because the error rate of most automated systems is too high.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2005
    Posts
    125
    Rep Power
    10
    Originally Posted by E-Oreo
    A bayesian filter doesn't necessarily look for spam, it just looks for "similar" messages. So if your bad posts are similar to one another and not too similar to good posts, then a bayesian filter may work well for you, otherwise it probably won't.

    However, that depends on your definition of good and bad. For example, if your application consisted of users asking a math problem, like 5+3, and other users submitting answers in the form of numbers, like 8, then a bayesian filter would be utterly useless in identifying incorrect answers. You could however use it to filter answers that were not numbers.


    A bayesian filter does "learn" over time, but you flag entire messages, not individual words.


    No, reviewing systems are usually manual, or at least partially manual, because the error rate of most automated systems is too high.
    Thanks E-Oreo, I was thinking of setting up the filter which will put content up for manual review if it doesn't pass. Thanks for the input. Would you just flag on a bad word list?
  6. #4
  7. --
    Devshed Expert (3500 - 3999 posts)

    Join Date
    Jul 2012
    Posts
    3,957
    Rep Power
    1046
    Hi,

    Bayesian filters aren't exactly state of the art, even though they're still being used heavily. You should look into advanced classifiers like CRM114, which use the Markovian approach and yield very good resuts (> 99%) after a short amount of learning time. That together with a black list of "bad words" should be very reliable.
    The 6 worst sins of security ē How to (properly) access a MySQL database with PHP

    Why canít I use certain words like "drop" as part of my Security Question answers?
    There are certain words used by hackers to try to gain access to systems and manipulate data; therefore, the following words are restricted: "select," "delete," "update," "insert," "drop" and "null".

IMN logo majestic logo threadwatch logo seochat tools logo