July 30th, 2012, 12:55 AM
Mod_log_sql - Handling all that data?!
I recently started using mod_log_sql and I'm already worried by the alarming rate at which my database is being populated. According to current trends, it will add about 1,000,000 rows every 11 days. Just to let you know, this is mostly bot traffic haha. I know mysql has no problem handling millions of rows, even billions in some cases; parsing is a different story.
My question is, how does everyone else handle the large amount of data put out by this?
August 1st, 2012, 12:00 AM
There isn't a fool proof method of identifying a specific request as originating from a bot. Some bots, mostly those of major search engines, have particular user agent strings that identify them as bots; but you would probably need separate filter rules for each one since there isn't a standard. I'm not sure whether it is possible to filter logging based on user agent.
There are generally better ways of doing this than using MySQL for your Apache logs. You would probably want to implement something at your application layer or use a software package that analyses the Apache log files after they've been written and then strips out and stores the important data into a database.