#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0

    Multi threading and xml


    Hi

    I have an xml file that has to be written by multiple threads running in parallel. How can we ensure that the structural integrity of the xml file i.e. many threads writing into the file can spoil the structure of the xml. One way is to make the write method synchronized, but that is a very high level approach with which only one thread may write at a time. So the other threads are ready with their data but cannot write until the lock is released. Is there a better way to do this?
  2. #2
  3. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,238
    Rep Power
    9400
    a) Don't use a file
    b) Keep a master thread which is the only one that writes to (and possibly reads from) the file
    c) Lock on the file but collect a few things to write at once, thus reducing how often the file needs to be used (assuming the work the threads do takes longer than the time needed to write the changes)

    Comments on this post

    • pa7751 agrees
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0
    Ok i guess a little more explanation on the scenario can help. There are a series of tasks that are nothing but Linux commands that need to be executed. Each of these tasks are tags in the xml e.g.,
    Code:
    <task name="copy" command="scp src dest" user="root" host="machineIP" resumeable="true" />.
    These tasks are grouped as activities. The idea is to execute multiple such tasks(commands) in parallel and in case of failure we can resume for the last executed task and not the beginning e.g. if input commands.xml has suppose 100 tags and at 20th task, a failure happens and that task is resumable, then when the program is started again, I start execution from 20th step and not 1st. So every task is recorded in resume.xml with its status (began or completed). For the first 20 tasks, resume.xml will have task status=complete. The status of task#21 would be="begin", so I will begin execution from there. Hence at every task, I need to record the status of that task. So I would say that the time taken to write to resume.xml is more than the time taken to execute the task as such. Also before starting execution, I have to check the first "begin" also. In short I have to parse this file first, reach the point of restore, then again start recording tasks with their statuses as I progress. Hence there are many edits happening by parallel running threads to the same file
  6. #4
  7. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,238
    Rep Power
    9400
    Then it's a typical... what's it called... worker-thread pattern? Thread pool?

    Go with option b. Two variants: you have a number of child threads which communicate with the master daemon to get a task/activity to run and report the status when completed, or the daemon starts a child for each task and there's essentially just one thread of yours running at a time. Which one you choose depends on the nature of the "activities", like whether their tasks can be run independently or are related to each other.

    Comments on this post

    • pa7751 agrees
  8. #5
  9. No Profile Picture
    Lost in code
    Devshed Supreme Being (6500+ posts)

    Join Date
    Dec 2004
    Posts
    8,316
    Rep Power
    7170
    An XML file is really not very appropriate for this sort of thing, but in addition to the recommendation requinix already made, if you change your statuses so they are all the same length (ie:
    Code:
    begin
    cmplt
    pausd
    etc..
    Then you can perform an in-place write rather than having to rewrite the entire file every time a change is made.
    PHP FAQ

    Originally Posted by Spad
    Ah USB, the only rectangular connector where you have to make 3 attempts before you get it the right way around
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0
    Originally Posted by requinix
    Which one you choose depends on the nature of the "activities", like whether their tasks can be run independently or are related to each other.
    The tasks that are parallel will be independent of each other and they cannot even share any data amongst themselves. This will be given to me. Also I will not have many threads running in parallel at a time, so I can assume an unlimited thread pool. Consider a worst case scenario like 100 parallel tasks, now I am still unclear as to what is your suggestion to write teh statuses of these tasks in parallel in the output xml file. Could you please help me understand?
  12. #7
  13. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,238
    Rep Power
    9400
    Make one master thread do the writing and have the child threads communicate with it. All they really need to do is say that the task has completed, right?
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Oct 2012
    Posts
    4
    Rep Power
    0
    Ya so basically there are 3 steps:
    1. Parse existing xml to resume from a state that failed previously
    2. Spawn threads
    3. Write new status to file for each task


    So I guess what best can be done in case of parallel threads would be to do 1&3 in synchronized block and 2 as parallel
  16. #9
  17. No Profile Picture
    Permanently Banned
    Devshed Newbie (0 - 499 posts)

    Join Date
    Apr 2012
    Posts
    6
    Rep Power
    0
    Groovy XmlSlurper is a nice tool to parse XML documents, mostly because of the elegant GPath dot-notation.
  18. #10
  19. Transforming Moderator
    Devshed Supreme Being (6500+ posts)

    Join Date
    Mar 2007
    Location
    Washington, USA
    Posts
    14,238
    Rep Power
    9400
    Originally Posted by Morningwalker
    Groovy XmlSlurper is a nice tool to parse XML documents, mostly because of the elegant GPath dot-notation.
    OP's problem is with the multithreadedness, not with parsing XML.

IMN logo majestic logo threadwatch logo seochat tools logo