#1
  1. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2002
    Posts
    32
    Rep Power
    13

    handling arbitrary number of pipes


    i'm trying to write a simple shell and i've hit a stumbling block in trying to allow an arbitrary number of pipes: something like
    Code:
    cat somefile | grep searchword | wc -l
    or even more (arbirary many, at least until the character limit for a line is reached)

    how can this be achieved? i know that for just 1 pipe, as in
    Code:
    ls | less
    i have to do something like this:
    Code:
    int p[2];
    
    /* create pipe */
    if ( pipe(p) == -1 )
    {
      perror("pipe");
      exit(1);
    }
    
    if ( fork() )
    // parent process
    {
      dup2(p[1], 1);
      close(p[0]);
      close(p[1]);
      exec("ls");
    }
    else
    // child process
    {
      dup2(p[0], 0);
      close(p[0]);
      close(p[1]);
      exec("less");
    }
    but this too is incorrect... as you notice, i'd to fork off another process and had a parent-child relationship between the ls and less processes when they should both really be children of the shell... how do i do this for an arbitrary number of commands (or much less, for the 2-command case above)?

    thanks for any advice at all! :)
  2. #2
  3. *bounce*
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2002
    Location
    Delft, The Netherlands
    Posts
    514
    Rep Power
    42
    Well, the parent process should never exec() anything, since that would end the shell process!

    So each program that you want to run should be executed from a child process, which would give something like the following pseudo-code:

    ...acquire pipes...

    fork();
    if (is child) {
    set up file descriptors
    exec program one
    }

    fork();
    if (is child) {
    set up file descriptors
    exec program two
    }

    ...wait for children to finish...

    If you want to be flexible in the amount of pipes you want to handle, you'd just stuff both the pipe acquisition and fork-and-exec code each into a loop.

    Hope this helps.
    "A poor programmer is he who blames his tools."
    http://analyser.oli.tudelft.nl/
  4. #3
  5. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2002
    Posts
    32
    Rep Power
    13
    thanks! that did help :)

    how do i handle, say 2 pipes then? as in 'ls -l | grep myfile | wc -l'?

    i've no idea how to create more than 1 pipe and how to set up the file desciptors appropriately at all... i know i should use a loop but how do i create an arbitrary number of pipes in C?
  6. #4
  7. *bounce*
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2002
    Location
    Delft, The Netherlands
    Posts
    514
    Rep Power
    42
    i know i should use a loop but how do i create an arbitrary number of pipes in C?
    Well, assuming you've determined how many pipes you need (by parsing the user's command-line), you can call malloc() to allocate a large enough array, and then use a for-loop to populate it with pipe-file descriptors.

    After that, you have to set up the file descriptors properly for each program, but -after- each fork(). Here's an example program that basically executes the equivalent of "ls | cat | wc". I'll leave the handlling of command-line options as an exercise ;)

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    
    char *programs[] = { "/bin/ls", "/bin/cat", "/bin/wc" };
    
    
    #define NUM(a) (sizeof(a) / sizeof(a[0]))
    #define NUMPIPES (NUM(programs) - 1)
    
    
    #define READ_FD 0
    #define WRITE_FD 1
    
    
    void
    run_program (char *path, int read_fd, int write_fd)
    {
            int pid;
    
            if ( (pid = fork()) < 0) {
                    perror("fork");
                    exit(EXIT_FAILURE);
            }
    
            if (pid) {
                    /* parent (the shell) */
                    if (read_fd) {
                            close(read_fd);
                    }
                    if (write_fd) {
                            close(write_fd);
                    }
                    return;
            }
            if (read_fd) {
                    dup2(read_fd, STDIN_FILENO);
            }
            if (write_fd) {
                    dup2(write_fd, STDOUT_FILENO);
            }
    
            execl(path, path, NULL);
    }
    
    
    int
    main (int argc, char **argv)
    {
            int (*pipeTab)[2];
            int i;
    
            if ( (pipeTab = malloc(sizeof(*pipeTab) * NUMPIPES)) == NULL) {
                    perror("malloc");
                    exit(EXIT_FAILURE);
            }
    
            for (i = 0; i < NUMPIPES; i++) {
                    if (pipe(pipeTab[i]) < 0) {
                            perror("pipe");
                            exit(EXIT_FAILURE);
                    }
            }
            
            for (i = 0; i < NUM(programs); i++) {
                    if (i == 0) {
                            run_program(programs[i], 0, pipeTab[i][WRITE_FD]);
                    } else if (i < NUM(programs) - 1) {
                            run_program(programs[i], pipeTab[i-1][READ_FD], pipeTab[i][WRITE_FD]);
                    } else {
                            run_program(programs[i], pipeTab[i-1][READ_FD], 0);
                    }
            }
    }
    If you have any questions about the code, don't hesitate to ask.
    "A poor programmer is he who blames his tools."
    http://analyser.oli.tudelft.nl/
  8. #5
  9. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2002
    Posts
    32
    Rep Power
    13
    you're a lifesaver man! your code works beautifully and i've managed somewhat to integrate it into my shell program... i do have several questions for you though regarding the parts i don't understand :D

    i've embedded my queries in the code itself (they start with //!):
    Code:
    void
    run_program (char *path, int read_fd, int write_fd)
    {
            int pid;
    
            if ( (pid = fork()) < 0) {
                    perror("fork");
                    exit(EXIT_FAILURE);
            }
    
            if (pid) {
                    /* parent (the shell) */
                    if (read_fd) {
                            close(read_fd);     //! you close the read_fd in the parent if it's not 0... why?
                    }
                    if (write_fd) {
                            close(write_fd);    //! same for write_fd
                    }
                    return;
            }
            if (read_fd) {
                    dup2(read_fd, STDIN_FILENO);    //! make stdin act like read_fd? why not the other way around?
            }
            if (write_fd) {
                    dup2(write_fd, STDOUT_FILENO);
            }
    
            execl(path, path, NULL);
    }
    
    int
    main (int argc, char **argv)
    {
            int (*pipeTab)[2];
            int i;
    
    // snip
            
            for (i = 0; i < NUM(programs); i++) {
                    if (i == 0) {
                            run_program(programs[i], 0, pipeTab[i][WRITE_FD]);  //! why 0 for read_fd?
                    } else if (i < NUM(programs) - 1) {
                            run_program(programs[i], pipeTab[i-1][READ_FD], pipeTab[i][WRITE_FD]);
                    } else {
                            run_program(programs[i], pipeTab[i-1][READ_FD], 0); //! why 0 for write_fd?
                    }
            }
    }
    basically, i'm confused about how the read_fd and write_fd parameters work...

    thanks for your help thus far! i'm in your debt :)
  10. #6
  11. *bounce*
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2002
    Location
    Delft, The Netherlands
    Posts
    514
    Rep Power
    42
    Ok, I should've thrown in a comment about the read_fd and write_fd parameters :)

    Say that you have N programs, strung together with (obviously) N-1 pipes (eg, 3 programs need 2 pipes). So except for the first and last program, the general rule is that for every program M, it reads from pipe M - 1 and writes to pipe M:

    Code:
    run_program(programs[i], pipeTab[i-1][READ_FD], pipeTab[i][WRITE_FD]);
    The first program in the sequence shouldn't read from a pipe though, but from stdin, right? So its stdin file descriptor should not be tied to the read-end of a pipe. I chose to use 0 to indicate that a stream (be it stdin or stdout) should not be redirected. And that's why the last program's write_fd is 0, since its output should just go to the screen instead of another pipe.

    NOTE: some purists will probably argue, and rightfully so, that 0 implies stdin, and that I thus shouldn't have used it to indicate a non-descriptor value. True, I probably should've used an illegal descriptor value like -1 :) Still, stdin is generally redirected to, not from, so I thought it a safe value.

    Ok, about closing the file descriptors in the parent. As you might know, a file (or stream or whatever) isn't closed until all the descriptors that refer to it have been closed. So if we want the pipes to be discarded after all the piped-together programs (all the cihildren in this case) have finished, we need to ensure that the parent doesn't have any related descriptors still open. This is another reason why I checked the value of both read_fd and write_fd; if I'd -always- called close(read_fd) and close(write_fd), I would've closed stdin! (see the note).

    Code:
    dup2(read_fd, STDIN_FILENO);    //! make stdin act like read_fd? why not the other way around?
    Read the man page entry for dup2() again:

    int dup2(int oldfd, int newfd);

    dup2 makes newfd be the copy of oldfd, closing newfd first if necessary.
    We want file descriptor 0 to refer to the read-end of some pipe, instead of stdin. So with dup2() we close() it, and make it point to the same read-end of the pipe as read_fd does.

    Hopefully this clarifies things a bit.
    "A poor programmer is he who blames his tools."
    http://analyser.oli.tudelft.nl/
  12. #7
  13. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2002
    Posts
    32
    Rep Power
    13
    thanks for the excellent explanation :) i've got it now, i think...

    now i've got yet another question: now i want to redirect the output of each command, including the intermediate commands to a log file. for example, if i do a ls -l|grep sh|wc, the log file should contain the output from ls -l, grep sh and wc... strangely enough it works only partially...

    for a ls -l|grep sh|wc, i manage to get the output of ls -l but not that of grep sh and that of wc gives '0 0 0' when it should be something like '7 70 461'...

    i've posted the relevant code below... can you spot where i've gone wrong? i'd thought that simply closing the stdout before exec()'ing would do the trick but apparently not...

    Code:
    // snip
    else if ( pid == 0 )
    // child process: exec() command
    {
        /* duplicate stdin as a copy of read_fd, so that input is redirected from read_fd */
        if (read_fd)
        {
            dup2(read_fd, STDIN_FILENO);
        }
    
        /* duplicate stdout as a copy of write_fd, so that output is redirected to write_fd */
        if (write_fd)
        {
            dup2(write_fd, STDOUT_FILENO);
        }
    
        /* setup file descriptors for output logging if enabled */
        if ( logging == TRUE )
        {
            /* close stdout */
            close(STDOUT_FILENO);
    
            /* open output log, which gets file descriptor of 1, effectively replacing stdout */
            if ( (outlog_fd = open(OUTPUTLOG, O_WRONLY | O_CREAT | O_APPEND)) == -1 )
                perror("open");
        }
    
        /* execute command, searching for the executable file in the environment PATH variable if necessary */
        if ( execvp(command_argv[0], command_argv) == -1 )
        {
            printf("%s: command not found\n", command_argv[0]);
            exit(1);
        }
    }
  14. #8
  15. *bounce*
    Devshed Novice (500 - 999 posts)

    Join Date
    Jan 2002
    Location
    Delft, The Netherlands
    Posts
    514
    Rep Power
    42
    Code:
        if ( logging == TRUE )
        {
            /* close stdout */
            close(STDOUT_FILENO);
    
            /* open output log, which gets file descriptor of 1, effectively replacing stdout */
            if ( (outlog_fd = open(OUTPUTLOG, O_WRONLY | O_CREAT | O_APPEND)) == -1 )
                perror("open");
        }
    Ok, I'm assuming you want to capture the output as well as send it through the pipe, right?

    That means you'll have to write some code that duplicates a stream and redirects it elsewhere:

    Code:
    program         +------> to pipe
     output  >------+
                    +------> to logfile
    Also, if you want to log all output, do you mean to say you want it all in the same file? Because that will get ugly, since you can't predict how the different streams will be mixed; you could get a couple of lines from the first program, then some from the second, then more from the first, and then some from the third; you'd end up with garbage!

    So if you really -do- want to log things, you'd be better off creating one logfile per process.

    Anyway, if you're still interested in logging all the different outputs, you might consider tee. It basically reads from stdin and writes to stdout, just like cat, but you can also instruct to copy the stream to one or more files:

    Code:
    ls -l | tee ls.log
    will write the expected output to stdout, but also print it to ls.log.

    You can use that for logging, simply by adding them to your list of commands. I'm not sure what datastructures you're using to store the parsed command-line, but the idea is this. Take a command-line:

    Code:
    ls -l | grep sh | wc
    Now, if logging is on, simply pretend that you really got this command line:

    Code:
     ls -l | tee ls.log | grep sh | tee grep.log | wc | tee wc.log
    (obviously you'd have to come up with better names for log files, since a command can be used more than once in a command line.)

    So basically, after parsing the command line, and after noticing that logging is enabled, just stuff the necessary tee commands into the required data structures and you're set.

    Alternatively, you could use different parsing functions depending on wether you want logging or not.

    Good luck!
    "A poor programmer is he who blames his tools."
    http://analyser.oli.tudelft.nl/
  16. #9
  17. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jan 2002
    Posts
    32
    Rep Power
    13
    unfortunately, i can't depend on the tee executable being there... i guess i'll have to figure out how to redirect the output to 2 streams at once..

    thanks for all your help :) appreciate it

IMN logo majestic logo threadwatch logo seochat tools logo