October 22nd, 2012, 04:18 AM
Multi threading and xml
I have an xml file that has to be written by multiple threads running in parallel. How can we ensure that the structural integrity of the xml file i.e. many threads writing into the file can spoil the structure of the xml. One way is to make the write method synchronized, but that is a very high level approach with which only one thread may write at a time. So the other threads are ready with their data but cannot write until the lock is released. Is there a better way to do this?
October 22nd, 2012, 12:13 PM
a) Don't use a file
b) Keep a master thread which is the only one that writes to (and possibly reads from) the file
c) Lock on the file but collect a few things to write at once, thus reducing how often the file needs to be used (assuming the work the threads do takes longer than the time needed to write the changes)
Comments on this post
October 22nd, 2012, 01:36 PM
Ok i guess a little more explanation on the scenario can help. There are a series of tasks that are nothing but Linux commands that need to be executed. Each of these tasks are tags in the xml e.g.,
These tasks are grouped as activities. The idea is to execute multiple such tasks(commands) in parallel and in case of failure we can resume for the last executed task and not the beginning e.g. if input commands.xml has suppose 100 tags and at 20th task, a failure happens and that task is resumable, then when the program is started again, I start execution from 20th step and not 1st. So every task is recorded in resume.xml with its status (began or completed). For the first 20 tasks, resume.xml will have task status=complete. The status of task#21 would be="begin", so I will begin execution from there. Hence at every task, I need to record the status of that task. So I would say that the time taken to write to resume.xml is more than the time taken to execute the task as such. Also before starting execution, I have to check the first "begin" also. In short I have to parse this file first, reach the point of restore, then again start recording tasks with their statuses as I progress. Hence there are many edits happening by parallel running threads to the same file
<task name="copy" command="scp src dest" user="root" host="machineIP" resumeable="true" />.
October 22nd, 2012, 01:59 PM
Then it's a typical... what's it called... worker-thread pattern? Thread pool?
Go with option b. Two variants: you have a number of child threads which communicate with the master daemon to get a task/activity to run and report the status when completed, or the daemon starts a child for each task and there's essentially just one thread of yours running at a time. Which one you choose depends on the nature of the "activities", like whether their tasks can be run independently or are related to each other.
Comments on this post
October 22nd, 2012, 05:55 PM
An XML file is really not very appropriate for this sort of thing, but in addition to the recommendation requinix already made, if you change your statuses so they are all the same length (ie:
Then you can perform an in-place write rather than having to rewrite the entire file every time a change is made.
October 23rd, 2012, 12:31 AM
The tasks that are parallel will be independent of each other and they cannot even share any data amongst themselves. This will be given to me. Also I will not have many threads running in parallel at a time, so I can assume an unlimited thread pool. Consider a worst case scenario like 100 parallel tasks, now I am still unclear as to what is your suggestion to write teh statuses of these tasks in parallel in the output xml file. Could you please help me understand?
Originally Posted by requinix
October 23rd, 2012, 02:12 AM
Make one master thread do the writing and have the child threads communicate with it. All they really need to do is say that the task has completed, right?
October 23rd, 2012, 04:27 AM
Ya so basically there are 3 steps:
- Parse existing xml to resume from a state that failed previously
- Spawn threads
- Write new status to file for each task
So I guess what best can be done in case of parallel threads would be to do 1&3 in synchronized block and 2 as parallel
November 2nd, 2012, 05:35 AM
Groovy XmlSlurper is a nice tool to parse XML documents, mostly because of the elegant GPath dot-notation.
November 2nd, 2012, 12:45 PM
OP's problem is with the multithreadedness, not with parsing XML.
Originally Posted by Morningwalker