#16
  1. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Thanks Jacques1,

    Let me make sure I understand your recommendations.

    1. Get the query. You indicate that this will be the most difficult part, and you give three solutions to do so.
    2. Parses the query and obtain the tables, columns, values, PK. You provided a link for one possible parser.
    3. Turn the output from the parser into a log entry stating the changes.


    Okay, I think I get Steps 1 and 2. Please elaborate on Step 3.
  2. #17
  3. No Profile Picture
    I haz teh codez!
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2003
    Posts
    2,547
    Rep Power
    2337
    This is where a framework comes in handy. To implement this behavior I pretty much just has to drop this bad boy in -- with a few tweaks -- and bada-bing.
    I ♥ ManiacDan & requinix

    This is a sig, and not necessarily a comment on the OP:
    Please don't be a help vampire!
  4. #18
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Never used Cake before. Does it bring along much baggage?
  6. #19
  7. No Profile Picture
    I haz teh codez!
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2003
    Posts
    2,547
    Rep Power
    2337
    Lots. It's very heavyweight. But stuff like this makes it worth it.
    I ♥ ManiacDan & requinix

    This is a sig, and not necessarily a comment on the OP:
    Please don't be a help vampire!
  8. #20
  9. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    I am more inclined to create my own. Do you know if the strategy behind this audit trail plugin is documented?
  10. #21
  11. No Profile Picture
    I haz teh codez!
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Dec 2003
    Posts
    2,547
    Rep Power
    2337
    Well it's a CakePHP behavior, which inserts itself into the the associated model's callbacks (beforeSave/afterSave, beforeDelete/afterDelete).

    It wouldn't be simple to duplicate.
    I ♥ ManiacDan & requinix

    This is a sig, and not necessarily a comment on the OP:
    Please don't be a help vampire!
  12. #22
  13. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    No, I do not want to recreate Cake.

    I want to recreate just the basic audit functionality.
  14. #23
  15. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    Audit systems are actually not that simple to design or build. There isn't one accepted approach for it.

    Personally, I'm rather sceptical about the database approach. Even if you get the user ID through the session variable hack, you still have to track the changes. Looping through OLD and NEW isn't easy. If it's even possible, you'll need yet another hack.
    I don't understand how using a session variable for its intended purpose is a hack. However, building a dynamic audit system at the database level is almost impossible. You will end up having to separately write, at the very least, the triggers for each table and then update them later if the table structure changes. So that's the major disadvantage of an audit system at the database level. The major advantage is that its not tightly coupled to your application and you don't have to mess around with query parsing at all because the database has already done it (correctly, 100% of the time).

    ----

    There are really two problems here: how to capture the data and how to store it.

    For capturing data, you either do it at the application level or at the database level.

    If you're using an ORM layer it should be fairly straight forward to do this at the application level. If you're not, then it won't be. Parsing the queries yourself sounds like a major pain to me, as well as a major performance hit (although, if you're building an audit system, you will see a major performance hit regardless of which approach you take since you are at least going to double your database writes).

    If my application had an ORM layer and my database was only used by a single application, then I would probably not use triggers. The amount of effort required to maintain the database code is too much greater than the amount of effort required to build it at the application level.

    If I were building one for a database used by multiple applications or if my application didn't have any sort of ORM layer, then I definitely would use triggers. At that point, the effort required to maintain two separate audit systems becomes greater than the effort required to maintain one at the database level.

    For storing the data, you are essentially just versioning the records. There are at least three common and accepted ways of doing this.

    The first is to store a separate row for each version, and use a secondary table to indicate the current version:
    Code:
    book [pkey on id]
    --------------------
    | id | revision_id |
    --------------------
    |  1 |           3 |
    |  2 |           6 |
    --------------------
    
    book_data [pkey on revision_id]
    -------------------------------------------------------
    | id | revision_id | book_name | user_id | created_at |
    -------------------------------------------------------
    |  1 |           1 |     name0 |       5 |        xxx |
    |  1 |           2 |     name1 |       1 |        xxx |
    |  1 |           3 |   name1-c |       2 |        xxx |
    |  2 |           6 |     name2 |       1 |        xxx |
    -------------------------------------------------------
    
    books [a view]
    SELECT book_data.* FROM book INNER JOIN book_data ON book_data.revision_id = book.revision_id
    Use of the view here is actually optional if your select queries are written to be aware of the versioning system.

    Advantages of this approach:
    * You can change the structure of the table without having to update any other dependent tables.
    * It's really easy to see who changed what when.
    * You could easily revert to or view a previous revision.

    Disadvantages of this approach:
    * It uses a lot of disk space.
    * Writes are complicated and require transactions (but most applications that would use versioning are read-heavy)
    * Reads are a bit more expensive because of the JOIN (and most applications that use versioning are read-heavy)

    The second is to have two tables; one for storing current records and one, a "history" table, for storing old versions:
    Code:
    books [pkey on id]
    ------------------
    | id | book_name |
    ------------------
    |  1 |   name1-c |
    |  2 |     name2 |
    ------------------
    
    books_history [pkey on revision_id]
    -------------------------------------------------------
    | id | revision_id | book_name | user_id | created_at |
    -------------------------------------------------------
    |  1 |           1 |     name0 |       5 |        xxx |
    |  1 |           2 |     name1 |       1 |        xxx |
    -------------------------------------------------------
    Advantages of this approach:
    * Read performance is better because you're not using the view and have fewer records in the main table
    * It's fairly easy to see who changed what when.
    * You could easily revert to a previous revision.
    * You can use sparse rows to reduce disk space usage (ie: store null in the columns that didn't change from the previous revision); however this makes reversion harder.

    Disadvantages of this approach:
    * Any structural updates to books have to be duplicated to books_history
    * It uses a lot of disk space if you don't use sparse rows
    * Writes are complicated and require transactions (but most applications that would use versioning are read-heavy)

    The third approach is to use what is essentially an EAV structure (which database people tend hate):
    Code:
    books [pkey on id]
    ------------------
    | id | book_name |
    ------------------
    |  1 |   name1-c |
    |  2 |     name2 |
    ------------------
    
    auditlog
    -------------------------------------------------------------------------------------------
    | id | book_id | user_id | timestamp | table |    column | action | old_value | new_value |
    -------------------------------------------------------------------------------------------
    |  1 |       1 |       5 |       xxx | books | book_name | update |     name0 |     name1 |
    |  1 |       1 |       1 |       xxx | books | book_name | update |     name1 |   name1-c |
    -------------------------------------------------------------------------------------------
    Advantages of this approach:
    * Read performance is good
    * It uses minimal disk space
    * You can change the structure of the table without having to update any other dependent tables.

    Disadvantages:
    * Seeing who changed what when is complicated
    * Reverting to a previous revision is complicated (you basically have to select the entire history of the object)
    * Writes are complicated and require transactions (but most applications that would use versioning are read-heavy)

    Comments on this post

    • NotionCommotion agrees : Thank you! Exactly what I needed
    Last edited by E-Oreo; April 6th, 2013 at 11:43 AM.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  16. #24
  17. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Thanks E-Oreo, great post!

    Maybe a 4th option? I am trying to implement it, so if you think I am whacked, please let me know soon!

    Code:
    CREATE  TABLE IF NOT EXISTS `audits` (
      `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
      `users_id` INT UNSIGNED NOT NULL ,
      `ip` VARCHAR(45) NOT NULL ,
      `tableName` VARCHAR(45) NOT NULL ,
      `tablePK` VARCHAR(45) NOT NULL ,
      `dateChanged` DATETIME NOT NULL ,
      PRIMARY KEY (`id`) ,
      INDEX `fk_audits_users1_idx` (`users_id` ASC) ,
      CONSTRAINT `fk_audits_users1`
        FOREIGN KEY (`users_id` )
        REFERENCES `users` (`id` )
        ON DELETE NO ACTION
        ON UPDATE NO ACTION)
    ENGINE = InnoDB;
    
    
    -- -----------------------------------------------------
    -- Table `auditFields`
    -- -----------------------------------------------------
    CREATE  TABLE IF NOT EXISTS `auditFields` (
      `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
      `audits_id` INT UNSIGNED NOT NULL ,
      `fieldName` VARCHAR(45) NOT NULL ,
      `oldValue` VARCHAR(45) NULL ,
      `newValue` VARCHAR(45) NULL ,
      PRIMARY KEY (`id`, `fieldName`) ,
      INDEX `fk_auditFields_audits1_idx` (`audits_id` ASC) ,
      CONSTRAINT `fk_auditFields_audits1`
        FOREIGN KEY (`audits_id` )
        REFERENCES `audits` (`id` )
        ON DELETE NO ACTION
        ON UPDATE NO ACTION)
    ENGINE = InnoDB;
    
    
    DELIMITER $$
    
    CREATE TRIGGER tg_contacts_upd AFTER UPDATE ON contacts
    FOR EACH ROW 
    BEGIN
        IF NEW.date_modified <> OLD.date_modified THEN
    		SET @AID = 0;
    		SET @users_id = 1; //Get from session
    		SET @ip = '11.111.111.111'; //Get from sessio
    		IF NEW.firstname <> OLD.firstname THEN
    			IF @AID = 0 THEN
    				INSERT INTO audits(tableName,tablePK,users_id,ip,dateChanged) VALUES ('contacts', NEW.id,@users_id,@ip,NOW());
    				SELECT LAST_INSERT_ID() INTO @AID;
    			END IF;
    			INSERT INTO auditFields(audits_id,fieldName,oldValue,newValue) VALUES (@AID,'firstname',OLD.firstname,NEW.firstname);
    		END IF;
    		IF NEW.firstname <> OLD.firstname THEN
    			IF @AID = 0 THEN
    				INSERT INTO audits(tableName,tablePK,users_id,ip,dateChanged) VALUES ('contacts', NEW.id,@users_id,@ip,NOW());
    				SELECT LAST_INSERT_ID() INTO @AID;
    			END IF;
    			INSERT INTO auditFields(audits_id,fieldName,oldValue,newValue) VALUES (@AID,'lastname',OLD.lastname,NEW.lastname);
    		END IF;
                    //Do the same for any other fields which you want to audit
    	END IF;
    END$$
  18. #25
  19. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    Essentially that's the third approach but with a slightly different table structure; it will have the same properties.

    The structure of your trigger is a little odd. It seems like it would make more sense to create your audit record at the beginning rather than checking to see whether you should create it for every field.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  20. #26
  21. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    The structure of your trigger is a little odd. It seems like it would make more sense to create your audit record at the beginning rather than checking to see whether you should create it for every field.
    Originally, I thought that storing a record where the data didn't change would initiate a trigger, however, upon testing, I found it didn't. Will still be helpful if I only want to create an audit for some of the given table's fields. Maybe a better way would be an IF statement which tests OR between all desired fields, and if TRUE, creates the audit record, and then tests each field for differences to determine if a auditFields record should be inserted.

    Lastly, what do you think about normalizing tableName and fieldName and taking them out of the audit table and auditFields, respectively? This will make changing the table and field names in the future much easier. It also requires either a JOIN or coding with surrogates in my triggers which is not ideal.

    PS. Yes, sorry, I know this has gotten too MySQLish and less PHPish, but I am just about done. I originally wanted to know whether PHP or SQL should be used and if PHP how which is appropriate in this forum, but got a little side tracked.....
    Last edited by NotionCommotion; April 6th, 2013 at 05:57 PM.
  22. #27
  23. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    This will make changing the table and field names in the future much easier.
    Do you frequently change the names of tables and fields after your code is in production? Maybe a better question is, have you ever changed the name of an existing table or field after the code has gone into production?
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  24. #28
  25. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Originally Posted by E-Oreo
    Do you frequently change the names of tables and fields after your code is in production? Maybe a better question is, have you ever changed the name of an existing table or field after the code has gone into production?
    No, I absolutely never do and never have.

    But then again, I have never dealt with code in production, this is soon to be my first time, and I am rather terrified

    I am a little conflicted. Your earlier post indicated that an ORM layer made it easier to maintain the audit system in the application if (when) the structure changes. But your last post indicated that it is very rare for the DB schema to change once in production. Sounds like you recommend if not using an ORM layer and the database schema is well thought out (which I hope mine is), NOT to normalize tableName and fieldName. Right?
    Last edited by NotionCommotion; April 7th, 2013 at 08:24 AM.
  26. #29
  27. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,317
    Rep Power
    7170
    I am a little conflicted. Your earlier post indicated that an ORM layer made it easier to maintain the audit system in the application if (when) the structure changes. But your last post indicated that it is very rare for the DB schema to change once in production.
    It's not at all rare to make additions to the DB schema, but it is rare to change the names of existing tables and fields. The name of a table/field only affects semantics, once a DB is in production changing the name of a table or field is a high-risk low-reward move.

    I do recommend not normalizing the table and field names.

    The ORM layer isn't really related to that; an ORM layer just makes it a lot easier to implement the audit system in PHP rather than using triggers. From the perspective of a PHP developer, database triggers have a heavy "tech debt" for an application - they are a pain to maintain and require extra documentation and experience since they're not frequently used.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  28. #30
  29. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Sep 2006
    Posts
    1,917
    Rep Power
    533
    Thanks again E-Oreo. I really appreciate your help.

    This dialog has confirmed my belief that much knowledge is required before going into production. I am thinking of creating a new post asking something like what are the steps to do before launching a production website. My difficulty is the question is so broad, and I don't even know all the topics. For instance, below is just a partial list of items that are nagging me.

    I am not asking for answers to any of these topics, but would happily accept any thoughts for how to structure this future topic.



    Checklist of topics before launching a production website
    • Database considerations
    • SEO
    • accessibility
    • legal issues
    • Intellectual property (both my stuff, and users that upload data)
    • copyright issues
    • tax issues
    • What operating system to use
    • version control
    • backup strategies
    • Defending against hacks
    • Server configurations, clustering, etc
    • Code documentation (specifically how .htaccess, triggers, etc are documented)
    • Knowing that it is high-risk low-reward to change table names
    • How to find partners
    • How to sub out specific scopes of work
    • etc, etc, etc

IMN logo majestic logo threadwatch logo seochat tools logo