Discuss archiving statistics in the PHP Development forum on Dev Shed. archiving statistics PHP Development forum discussing coding practices, tips on PHP, and other PHP-related topics. PHP is an open source scripting language that has taken the web development industry by storm.
The ASP Free website provides in-depth information on the latest developer tools available from Microsoft. Our cadre of writers, highly experienced industry experts, reveals the best ways to use established technologies as well as new and emerging technologies. Our coverage of Microsoft's development and administration technologies is among the most respected in the IT industry today.
ASP Free and Iron Speed Designer are giving away $5,500+ in FREE licenses. Iron Speed's RAD CASE toolset can save up to 80% of your coding time. One free license per week, one perpetual license per month! Download and Activate to enter!
Intel® Graphics Performance Analyzers is a powerful tool suite for analyzing and optimizing your games, media, and graphics-intensive applications. Used by some of the best developers on the planet, Intel GPA lets you maximize your app’s performance.
Posts: 13
Time spent in forums: 2 h 12 m 5 sec
Reputation Power: 0
archiving statistics
hia guys,
here's my problem: the cms i'm working on has an included statistics system, which gathers A LOT of data every day.
There's one huge table cc_visitors, which logs each unique visitor and gathers data such as duration of stay, ip address, maybe some js-params such as flash enabled, resolution or whatever (not important right now). Then there's the pageview's table which has a row for each page a visitor viewed which contains basically the url, the title of the page and the visitor id.
Of course this is too much data, so I need to somehow archive it, but I don't know how/when to do it.
I am required to always provide a VERY detailed report for the current day, including all visitors and I have to be able to trace their pages viewed. Then I need general data for each day, like e.g. the number of visitors etc.. Then I need reports for weeks/months/years.. How am I supposed to do this? I can't use cron, which means, that if a user visits the page I need to check if a new day has come and then archive the data of the previous day.
Now I have several problems: what do I do if the page is NOT visited every day, then I need to archive many days in one go.
Damn, generally it's like damn-clear to me what I need to do, I just don't know how/where to start ... please help me :P
ok, edit: this is in the php-section because i use php and most ppl read this board. i guess it's more a design-question, tho
Posts: 1,572
Time spent in forums: 1 Week 6 Days 15 h 37 m 59 sec
Reputation Power: 436
Curk,
make a numerical list of your desired information, for example:
1. Number of total visits a day to site
2. Number of IE users to site
3. Number of American Visitors to our balloon page.
And then gives us a bit more information about your database topography; what tables do you have and how are they structured? Which fields are being used to store information?
This seems like a large, but simple project. The only complicating thing to this - seemingly - is the volume if information you are handling.
When seeking help:
Submit clear questions and code. Use syntax highlighting.
Be patient, and respectful. Don't abuse your access to
professional guidance - nobody here is obligated to answer
your questions, be thankful that they do.
Posts: 273
Time spent in forums: 5 Days 11 h 34 m 3 sec
Reputation Power: 10
Well, on the main page part of the adding to the database can check the time and all and then archive using a system command (some mysql options). Or, you can do the harder, lower level thing which would be to .tar.bz2 the table's files which is a real pain, otherwise you can export logs everytime.
Posts: 27
Time spent in forums: 18 h 33 m 42 sec
Reputation Power: 0
Data Warehouse
Start thinking about the 'granularity' of the information you need to report in the historical reports. Do you simply need summary information (total visitors per page per day) or do you really need the detailed historical information (what did a particular user view on a Friday two months ago)?
For your regular (weekly, monthly, etc.) reports, you can create a set of tables to hold summary information. Then use a query to update the summary tables once a day.
Keep the detailed information for as long as necessary/practical. A big thing now is SarbOx law, which requires publically-held companies to maintain sufficent archives of information. You'll need to check how long that detailed information will need to be kept.
I've seen one system that uses history tables. Current information is moved to history tables on a monthly basis (the structure of the history tables mirror the current tables). The history tables are then used for reporting purposes, while the current tables are kept to a manageable size (speeding performance).
The history tables can then be archived before ancient information is purged from them.
Posts: 13
Time spent in forums: 2 h 12 m 5 sec
Reputation Power: 0
well, thanks guys!
here are a few problems i've come up with:
is it better to have tables for annual, monthly, weekly and daily data or schould i create a table for days only and then dynamically select the information for the reports? or is that to slow for e.g. the overall report, which then needs to add up the information for ALL days ever ...
because of the law mentioned and for security reasons i need to backup the detailled data somewhere and if the detailed report for day xyz is requested the data is supposed to be extracted if still available. how can i do that?
is it necessary to be able to display the detailed report in different timezones, or can i simply say "all times are gmt"? might be much easier i think
here's another question: i'm searching for a VERY lightweight and simple graph-plotting lib for php which is GPL'ed. any ideas?
Posts: 27
Time spent in forums: 18 h 33 m 42 sec
Reputation Power: 0
Storing Summary Information: Deciding whether to keep summaries for weekly, monthly and annual periods is up to you. It really depends on how often you'll be calculating those totals.
1) If your app is going to display weekly and monthly totals on some kind of dashboard that several managers or whoever are going to view constantly throughout the day, then yes, you'd want to store those values in your database.
2) If your app is going to produce reports once a week/month, that will be printed and distributed, then no. You'll be summarizing the historical data only once in a while, which will hardly be a performance hit.
Archiving Data: The data should be moved from high accessiblity to low accessibility formats as time goes by.
Current tables - Data is being added/modified/accessed on a near-constant basis.
History tables: Data is no longer being changed, but will be accessed for reports. Several years of data may be kept in this form.
Archives: Data is rarely accessed at this point. It is compressed, and no longer accessible. Will need to be restored to history tables before being used. Can be stored on disk or tape.
I'm not familiar with the archiving process in MySql. I would probably:
1) move the data to export from the current/history tables to a set of 'archive tables',
2) export the archive tables to a format that can easily be imported at a later date,
3) compress and store the data, then
4) drop the archive tables.
How you're going to display the time is really between you and your customers. Displaying all the times in some particular timezone (GMT, whatever) is not unusual. Whatever you choose, you should let the viewer know what timezone the times are coming from.
Hmm, haven't done any charting in PHP. I'd be curious to know too.