#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2010
    Posts
    162
    Rep Power
    63

    Mod_log_sql - Handling all that data?!


    Evening all!

    I recently started using mod_log_sql and I'm already worried by the alarming rate at which my database is being populated. According to current trends, it will add about 1,000,000 rows every 11 days. Just to let you know, this is mostly bot traffic haha. I know mysql has no problem handling millions of rows, even billions in some cases; parsing is a different story.

    My question is, how does everyone else handle the large amount of data put out by this?

    A better question may be, is there a good way to tell it to ignore bots, or even a referrer match of *bot*? I already have it ignoring picture files, javascript files, and css files. I'm only interested in human interaction, such as a page view counter for specific pages, or search query analysis (for populating a search bar auto-suggest, and populating a 'most used queries' area)

    Thoughts? Experience?

    Thanks!
  2. #2
  3. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,316
    Rep Power
    7171
    There isn't a fool proof method of identifying a specific request as originating from a bot. Some bots, mostly those of major search engines, have particular user agent strings that identify them as bots; but you would probably need separate filter rules for each one since there isn't a standard. I'm not sure whether it is possible to filter logging based on user agent.

    I'm only interested in human interaction, such as a page view counter for specific pages, or search query analysis (for populating a search bar auto-suggest, and populating a 'most used queries' area)
    There are generally better ways of doing this than using MySQL for your Apache logs. You would probably want to implement something at your application layer or use a software package that analyses the Apache log files after they've been written and then strips out and stores the important data into a database.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around

IMN logo majestic logo threadwatch logo seochat tools logo