#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    21
    Rep Power
    0

    Working with serveral processes


    I'm a bit lost here. I would like to run several scripts in parallel on a cluster, each script sending back a float number.
    I'm searching on the internet and found serveral things:
    - multithreading (which I don't really know what it does)
    - the os module allowing stuff as popen or spawn enabling to run other scripts as new processes

    Which one you recommend? Do you have links to web sites explaining a little how threading works (I have some trouble because the usual python.org and equivalent sites don't present a lot of example of scripts...)?

    Thanks
    erwin
  2. #2
  3. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    21
    Rep Power
    0
    Hi again,
    I'm desperately looking for an efficient way of dealing with my problem, and I found this on a web site. It seems to imply that multithreading doesn't run several processes in parallel.
    How would you run another script in parrallel, without having the father process stoping?

    This doesn't mean that you can't make good use of Python on multi-CPU
    machines! You just have to be creative with dividing the work up
    between multiple *processes* rather than multiple *threads*.
    thanks
  4. #3
  5. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Your correct using the thread or threading just runs some code in your script in a different thread of execution. It is not running another script.

    http://forums.devshed.com/t122818/s....ight=threading

    The above example happens to use the threading module to run multiple pinger class objects. These objects also happen to call an external program.

    I guess your problem is - regardless of the language - how do you launch multiple programs on multiple machines? What is the typical way of controlling this in the cluster you are using?

    You could adapt the script to start your processes - I guess they use local storage in which case they could write local results files that could be read by the controlling process.
    grim
  6. #4
  7. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    21
    Rep Power
    0
    Not exactly in fact, the cluster I'm working on sends processes by its self to other CPU's. In fact when you connect to the cluster you connect to a virtual machine and you don't even know on which macine you are really working.

    My great problem is calling another script and running let's say 5 or 6 of the same scripts at the same time on the same computer, BUT the os needs to see theses as new processes in order to send them to other CPU's.

    I have tried some of the threding examples for instance this one:
    Code:
    import threading, time
    
    def thread_task(name, n):
        time.sleep(1)
        for i in range(n): print name, i
    
    for i in range(5):
        T = threading.Thread(target=thread_task, args=(str(i), i))
        T.run()
    And each thread is run in order, they are not working at the same time in parallel...

    But I don't know how to make my main process continue doing things, such as running another script for example?

    If you have an idea, i'd be more than very grateful!
    thanks
    erwin
  8. #5
  9. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    You did look at the link I posted? It uses threading to run multiple processes... While the threads may be time sliced the processes they manage (using os.system for example) would not be.
  10. #6
  11. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    21
    Rep Power
    0
    Yes thanks a lot, it took some time caus i'm not used to this, but I managed...
    The thing is (as I understood but maybe i'm wrong) that the threading module doesn't execute threads in parallel whereas the thread module does (but isn't as handy to use since it's low level threading...)
  12. #7
  13. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    threading is just a wrapper class to thread.

    Threads are managed by python itself and effectively do round-robin processing. Somewhere in the docs it mentions the number of bytecodes executed before moving on to the next thread (100 I think).

    But why does it matter - you have launched other programs in each thread which will occupy their own processes and run independently.
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    21
    Rep Power
    0
    Yes you're right, but I was quite intriged by the example in the beggining since I don't understand why are the threads executed one after the other and not at the same time. Is it because there is not enough bytecode in each thread so that one thread is finished before the python interpreter gives the processor time to the next thread?
  16. #9
  17. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    I found the details -

    Secition 8.1 of the Python C API reference manual.

    The Python interpreter is not fully thread safe. In order to support multi-threaded Python programs, there's a global lock that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice.

    Therefore, the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions. In order to support multi-threaded Python programs, the interpreter regularly releases and reacquires the lock -- by default, every 100 bytecode instructions (this can be changed with sys.setcheckinterval()). The lock is also released and reacquired around potentially blocking I/O operations like reading or writing a file, so that other threads can run while the thread that requests the I/O is waiting for the I/O operation to complete.
    The for loop in your example is small and the print statement would be a good example of file IO I think. Here is the dis assembly of the thread_task function:
    Code:
      3           0 LOAD_GLOBAL              0 (time)
                  3 LOAD_ATTR                1 (sleep)
                  6 LOAD_CONST               1 (1)
                  9 CALL_FUNCTION            1
                 12 POP_TOP             
    
      4          13 SETUP_LOOP              29 (to 45)
                 16 LOAD_GLOBAL              2 (range)
                 19 LOAD_FAST                1 (n)
                 22 CALL_FUNCTION            1
                 25 GET_ITER            
            >>   26 FOR_ITER                15 (to 44)
                 29 STORE_FAST               2 (i)
                 32 LOAD_FAST                0 (name)
                 35 PRINT_ITEM          
                 36 LOAD_FAST                2 (i)
                 39 PRINT_ITEM          
                 40 PRINT_NEWLINE       
                 41 JUMP_ABSOLUTE           26
            >>   44 POP_BLOCK           
            >>   45 LOAD_CONST               0 (None)
                 48 RETURN_VALUE
    If you want to interleave the threads more closely then you could sprinkle the function with time.sleep commands. I've seen suggestions that playing with sys.setcheckinterval is not a good idea.

    Grim;
    Last edited by Grim Archon; June 10th, 2004 at 10:12 AM.
  18. #10
  19. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Actually, there are several "bugs" in the example that prevent it working properly .

    You should call the start method and not the run method - it is the start method that actually launches a new thread.

    The loop first loop would define the number of threads not the number of times the worker loops.

    Here is a working version showing things more separate with the sleep moved to allow the each thread a byte of the cherry:
    Code:
    import threading, time
    
    def thread_task(name, n): 
        for i in range(n): 
            print name, i
            time.sleep(0.1)
    
    T = {}
    for i in range(5): 
        T[i] = threading.Thread(target = thread_task, args = (str(i), 10))
        
    for i in range(5): 
        T[i].start()
    Grim
  20. #11
  21. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    This version prevents threads from breaking up strings on screen so it is easier to view. The change is to print one string rather than three (name, number and newline):
    Code:
    import threading, time
    
    def thread_task(name, n): 
        time.sleep(1)
        for i in range(n): 
            print "[%s %s] \n"%(name, i), 
            time.sleep(0.1)
    
    T = {}
    for i in range(5): 
        T[i] = threading.Thread(target = thread_task, args = ("thread "+str(i), 10))
        
    for i in range(5): 
        T[i].start()
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2004
    Posts
    21
    Rep Power
    0
    I tested your example and it does work perfectly thanks!
    I still have one question now, is there some special reason for you to make two for loops or is it possible to put both in the same loop?
  24. #13
  25. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    Originally Posted by winwin
    I tested your example and it does work perfectly thanks!
    I still have one question now, is there some special reason for you to make two for loops or is it possible to put both in the same loop?
    I put it in two for loops just to show clear separation between creating the object and starting/creating the thread.
    You can of course combine them in one for loop.

    grim

IMN logo majestic logo threadwatch logo seochat tools logo