PHP Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsProgramming LanguagesPHP Development
The ASP Free website provides in-depth information on the latest developer tools available from Microsoft. Our cadre of writers, highly experienced industry experts, reveals the best ways to use established technologies as well as new and emerging technologies. Our coverage of Microsoft's development and administration technologies is among the most respected in the IT industry today.

ASP Free and Iron Speed Designer are giving away $5,500+ in FREE licenses. Iron Speed's RAD CASE toolset can save up to 80% of your coding time. One free license per week, one perpetual license per month!
Download and Activate to enter!

Intel® Graphics Performance Analyzers is a powerful tool suite for analyzing and optimizing your games, media, and graphics-intensive applications. Used by some of the best developers on the planet, Intel GPA lets you maximize your app’s performance.


Tutorials
| Forums

Download to Enter
| Contest Rules

DOWNLOAD INTEL® GPA FOR FREE

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
  #1  
Old September 3rd, 2005, 01:47 PM
curk curk is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2005
Location: localhost
Posts: 13 curk User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 12 m 5 sec
Reputation Power: 0
Send a message via ICQ to curk Send a message via MSN to curk
archiving statistics

hia guys,

here's my problem: the cms i'm working on has an included statistics system, which gathers A LOT of data every day.

There's one huge table cc_visitors, which logs each unique visitor and gathers data such as duration of stay, ip address, maybe some js-params such as flash enabled, resolution or whatever (not important right now). Then there's the pageview's table which has a row for each page a visitor viewed which contains basically the url, the title of the page and the visitor id.

Of course this is too much data, so I need to somehow archive it, but I don't know how/when to do it.

I am required to always provide a VERY detailed report for the current day, including all visitors and I have to be able to trace their pages viewed. Then I need general data for each day, like e.g. the number of visitors etc.. Then I need reports for weeks/months/years.. How am I supposed to do this? I can't use cron, which means, that if a user visits the page I need to check if a new day has come and then archive the data of the previous day.
Now I have several problems: what do I do if the page is NOT visited every day, then I need to archive many days in one go.

Damn, generally it's like damn-clear to me what I need to do, I just don't know how/where to start ... please help me :P

ok, edit: this is in the php-section because i use php and most ppl read this board. i guess it's more a design-question, tho

Reply With Quote
  #2  
Old September 3rd, 2005, 01:59 PM
jsampsonPC's Avatar
jsampsonPC jsampsonPC is offline
stick a scissor in you eye
Dev Shed Intermediate (1500 - 1999 posts)
 
Join Date: Jun 2004
Location: Pensacola, Florida
Posts: 1,572 jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level)jsampsonPC User rank is Lieutenant Colonel (40000 - 50000 Reputation Level) 
Time spent in forums: 1 Week 6 Days 15 h 37 m 59 sec
Reputation Power: 436
Curk,

make a numerical list of your desired information, for example:

1. Number of total visits a day to site
2. Number of IE users to site
3. Number of American Visitors to our balloon page.

And then gives us a bit more information about your database topography; what tables do you have and how are they structured? Which fields are being used to store information?

This seems like a large, but simple project. The only complicating thing to this - seemingly - is the volume if information you are handling.

Regards,
Jonathan Sampson
__________________
FREE Web Development Videos | PHP Reference | MySQL Reference | jQuery Documentation

Markup Validation | CSS Validation | Web Developers, get FireBug & IE Developer Toolbar

When seeking help:
Submit clear questions and code. Use syntax highlighting.
Be patient, and respectful. Don't abuse your access to
professional guidance - nobody here is obligated to answer
your questions, be thankful that they do.

Are you a good person?

Reply With Quote
  #3  
Old September 3rd, 2005, 02:01 PM
E D D ! 3 E D D ! 3 is offline
Contributing User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Jun 2005
Location: /home/eddie/.secret
Posts: 273 E D D ! 3 User rank is Corporal (100 - 500 Reputation Level)E D D ! 3 User rank is Corporal (100 - 500 Reputation Level)E D D ! 3 User rank is Corporal (100 - 500 Reputation Level)E D D ! 3 User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 5 Days 11 h 34 m 3 sec
Reputation Power: 10
Send a message via AIM to E D D ! 3
Well, on the main page part of the adding to the database can check the time and all and then archive using a system command (some mysql options). Or, you can do the harder, lower level thing which would be to .tar.bz2 the table's files which is a real pain, otherwise you can export logs everytime.

Reply With Quote
  #4  
Old September 3rd, 2005, 04:05 PM
SnakePicard SnakePicard is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2005
Location: Salinas, CA
Posts: 27 SnakePicard User rank is Corporal (100 - 500 Reputation Level)SnakePicard User rank is Corporal (100 - 500 Reputation Level)SnakePicard User rank is Corporal (100 - 500 Reputation Level)SnakePicard User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 18 h 33 m 42 sec
Reputation Power: 0
Data Warehouse

Start thinking about the 'granularity' of the information you need to report in the historical reports. Do you simply need summary information (total visitors per page per day) or do you really need the detailed historical information (what did a particular user view on a Friday two months ago)?

For your regular (weekly, monthly, etc.) reports, you can create a set of tables to hold summary information. Then use a query to update the summary tables once a day.

Keep the detailed information for as long as necessary/practical. A big thing now is SarbOx law, which requires publically-held companies to maintain sufficent archives of information. You'll need to check how long that detailed information will need to be kept.

I've seen one system that uses history tables. Current information is moved to history tables on a monthly basis (the structure of the history tables mirror the current tables). The history tables are then used for reporting purposes, while the current tables are kept to a manageable size (speeding performance).

The history tables can then be archived before ancient information is purged from them.

Hope I've given you some options.

Reply With Quote
  #5  
Old September 4th, 2005, 06:41 AM
curk curk is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2005
Location: localhost
Posts: 13 curk User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 h 12 m 5 sec
Reputation Power: 0
Send a message via ICQ to curk Send a message via MSN to curk
well, thanks guys!

here are a few problems i've come up with:
  • is it better to have tables for annual, monthly, weekly and daily data or schould i create a table for days only and then dynamically select the information for the reports? or is that to slow for e.g. the overall report, which then needs to add up the information for ALL days ever ...
  • because of the law mentioned and for security reasons i need to backup the detailled data somewhere and if the detailed report for day xyz is requested the data is supposed to be extracted if still available. how can i do that?
  • is it necessary to be able to display the detailed report in different timezones, or can i simply say "all times are gmt"? might be much easier i think

here's another question: i'm searching for a VERY lightweight and simple graph-plotting lib for php which is GPL'ed. any ideas?

regards, jonas

Reply With Quote
  #6  
Old September 5th, 2005, 01:37 PM
SnakePicard SnakePicard is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Aug 2005
Location: Salinas, CA
Posts: 27 SnakePicard User rank is Corporal (100 - 500 Reputation Level)SnakePicard User rank is Corporal (100 - 500 Reputation Level)SnakePicard User rank is Corporal (100 - 500 Reputation Level)SnakePicard User rank is Corporal (100 - 500 Reputation Level) 
Time spent in forums: 18 h 33 m 42 sec
Reputation Power: 0
Storing Summary Information: Deciding whether to keep summaries for weekly, monthly and annual periods is up to you. It really depends on how often you'll be calculating those totals.
1) If your app is going to display weekly and monthly totals on some kind of dashboard that several managers or whoever are going to view constantly throughout the day, then yes, you'd want to store those values in your database.
2) If your app is going to produce reports once a week/month, that will be printed and distributed, then no. You'll be summarizing the historical data only once in a while, which will hardly be a performance hit.

Archiving Data: The data should be moved from high accessiblity to low accessibility formats as time goes by.
  1. Current tables - Data is being added/modified/accessed on a near-constant basis.
  2. History tables: Data is no longer being changed, but will be accessed for reports. Several years of data may be kept in this form.
  3. Archives: Data is rarely accessed at this point. It is compressed, and no longer accessible. Will need to be restored to history tables before being used. Can be stored on disk or tape.

I'm not familiar with the archiving process in MySql. I would probably:
1) move the data to export from the current/history tables to a set of 'archive tables',
2) export the archive tables to a format that can easily be imported at a later date,
3) compress and store the data, then
4) drop the archive tables.

How you're going to display the time is really between you and your customers. Displaying all the times in some particular timezone (GMT, whatever) is not unusual. Whatever you choose, you should let the viewer know what timezone the times are coming from.

Hmm, haven't done any charting in PHP. I'd be curious to know too.

Good luck!

Reply With Quote
Reply

Viewing: Dev Shed ForumsProgramming LanguagesPHP Development > archiving statistics


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 


Powered by: vBulletin Version 3.0.5
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.

© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 1 - Follow our Sitemap