Page 1 of 2 12 Last
  • Jump to page:
    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0

    Extract directory structure from output of 7za l command


    Hi All,

    I am trying to parse ouput of 7za l (zip list ) to parse subdirectories. this below is output of zip file .

    The output of 7za l is as below


    7-Zip (A) [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
    p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

    Listing archive: 093624942528_01_DDP.zip

    --
    Path = 093624942528_01_DDP.zip
    Type = zip
    Physical Size = 694397157

    Date Time Attr Size Compressed Name
    ------------------- ----- ------------ ------------ ------------------------
    2006-01-17 12:30:18 ..... 128 128 093624942528_01_DDP/DDPID
    2006-01-17 20:30:18 ..... 384 384 093624942528_01_DDP/DDPMS
    2006-01-17 20:30:18 ..... 2368 2368 093624942528_01_DDP/DDPPQ
    2006-01-17 20:34:20 ..... 666291024 666291024 093624942528_01_DDP/IC01.TRK
    2006-01-17 20:34:30 ..... 24023040 24023040 093624942528_01_DDP/IC02.TRK
    2006-01-17 20:34:32 ..... 1267841 1267841 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] ISRC.TIF
    2006-01-17 20:34:32 ..... 274944 274944 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] Master QA Report.doc
    2006-01-17 20:34:32 ..... 1267841 1267841 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] PQ 1.TIF
    2006-01-17 20:34:32 ..... 1267841 1267841 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] PQ 2.TIF
    ------------------- ----- ------------ ------------ ------------------------
    694395411 694395411 9 files, 0 folders



    I want to parse "name" above to get directory list without extracting. Please help in perl


    Thanks
    VKC
    Last edited by vkchaitanya; August 11th, 2012 at 04:09 AM. Reason: more information
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    Hi,

    there are numerous ways to do this.

    An example:

    Perl Code:
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
         $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to true when you reach the second line containing "-----"
         next if $skip or /\-{5}/;
         my @splitted_line = split; # splits $_ (input line) on spaces
         my $name = pop @splitted_line;
         print "$name \n";
    }


    Using your output as input to the code above, this prints:

    Code:
    093624942528_01_DDP/DDPID
    093624942528_01_DDP/DDPMS
    093624942528_01_DDP/DDPPQ
    093624942528_01_DDP/IC01.TRK
    093624942528_01_DDP/IC02.TRK
    ISRC.TIF
    Report.doc
    1.TIF
    2.TIF
    Well, looking back at your data and my output, I see now that this is not quite right, because I did not see that your file names have sometimes spaces in them; in this case, the code above only picks up the last word in the file name.

    The
    Perl Code:
    my $name = pop @splitted_line;

    line needs to be changed. You need to remove the first 6 fields of the @splitted_line array and then join the rest. You can do that with the splice and join functions. This is fairly easy, I leave it to you to change that.

    Don't hesitate to ask if you don't succeed.
  4. #3
  5. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0
    Originally Posted by Laurent_R
    Hi,

    there are numerous ways to do this.

    An example:

    Perl Code:
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
         $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to true when you reach the second line containing "-----"
         next if $skip or /\-{5}/;
         my @splitted_line = split; # splits $_ (input line) on spaces
         my $name = pop @splitted_line;
         print "$name \n";
    }


    Using your output as input to the code above, this prints:

    Code:
    093624942528_01_DDP/DDPID
    093624942528_01_DDP/DDPMS
    093624942528_01_DDP/DDPPQ
    093624942528_01_DDP/IC01.TRK
    093624942528_01_DDP/IC02.TRK
    ISRC.TIF
    Report.doc
    1.TIF
    2.TIF
    Well, looking back at your data and my output, I see now that this is not quite right, because I did not see that your file names have sometimes spaces in them; in this case, the code above only picks up the last word in the file name.

    The
    Perl Code:
    my $name = pop @splitted_line;

    line needs to be changed. You need to remove the first 6 fields of the @splitted_line array and then join the rest. You can do that with the splice and join functions. This is fairly easy, I leave it to you to change that.

    Don't hesitate to ask if you don't succeed.


    Hi ,

    Please help how to get all columns ( this looks like ls -l ) output. i'm newbie in perl .please help .
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    Well, I have also been a newbie in Perl, just as everyone else on this forum. But if I can help you now, it is because I once decided to read tutorials or books on the subject and to start writing Perl code without asking someone else to do it for me.

    You should probably try to code by yourself.

    I give you the solution below to help you, but I am not really sure I'm doing you a favor.

    Perl Code:
    open INPUT, "<", "zip_input.txt" or die "could not open zip_input.txt $! \n" ;
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
         $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to true when you reach the second line containing "-----"
         next if $skip or /\-{5}/;
         my @splitted_line = split; # splits $_ (input line) on spaces
         splice @splitted_line, 0, 5;
         $name = join ' ', @splitted_line;
         print "$name \n";
    }


    This gives this input:

    Code:
    093624942528_01_DDP/DDPID
    093624942528_01_DDP/DDPMS
    093624942528_01_DDP/DDPPQ
    093624942528_01_DDP/IC01.TRK
    093624942528_01_DDP/IC02.TRK
    093624942528_01_DDP/One Tree Hill 2-49425 [ECD] ISRC.TIF
    093624942528_01_DDP/One Tree Hill 2-49425 [ECD] Master QA Report.doc
    093624942528_01_DDP/One Tree Hill 2-49425 [ECD] PQ 1.TIF
    093624942528_01_DDP/One Tree Hill 2-49425 [ECD] PQ 2.TIF
    I understand that is what you wanted, just the file names and path.

    Or did I misunderstand?

    If you want to print the full lines, this is far shorter and easier:

    Perl Code:
    open INPUT, "<", "zip_input.txt" or die "could not open zip_input.txt $! \n" ;
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
         $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to true when you reach the second line containing "-----"
         print unless $skip or /\-{5}/;
    }


    Which produces:

    Code:
    2006-01-17 12:30:18 ..... 128 128 093624942528_01_DDP/DDPID
    2006-01-17 20:30:18 ..... 384 384 093624942528_01_DDP/DDPMS
    2006-01-17 20:30:18 ..... 2368 2368 093624942528_01_DDP/DDPPQ
    2006-01-17 20:34:20 ..... 666291024 666291024 093624942528_01_DDP/IC01.TRK
    2006-01-17 20:34:30 ..... 24023040 24023040 093624942528_01_DDP/IC02.TRK
    2006-01-17 20:34:32 ..... 1267841 1267841 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] ISRC.TIF
    2006-01-17 20:34:32 ..... 274944 274944 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] Master QA Report.doc
    2006-01-17 20:34:32 ..... 1267841 1267841 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] PQ 1.TIF
    2006-01-17 20:34:32 ..... 1267841 1267841 093624942528_01_DDP/One Tree Hill 2-49425 [ECD] PQ 2.TIF

    Comments on this post

    • keath agrees : Just have to make a little effort.
  8. #5
  9. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0
    Thank you .that was good.

    Last question , I would i add to ur reputation. it shown 0 in add to reputation drop down .
    Last edited by vkchaitanya; August 11th, 2012 at 06:46 AM. Reason: rating the reply
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    I have no idea on how this reputation works.

    Thanks for trying anyway.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    Code:
    open INPUT, "<", "zip_input.txt" or die "could not open zip_input.txt $! \n" ;
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
         $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to true when you reach the second line containing "-----"
         next if $skip or /\-{5}/;
         my @splitted_line = split; # splits $_ (input line) on spaces
         splice @splitted_line, 0, 5;
         $name = join ' ', @splitted_line;
         print "$name \n";
    If you don't mind a little code critique.

    Using the 3 arg form of open is good, but the use of a bareword filehandle is considered outdated/depreciated with the exception of the few built-in filehandles (STDIN, STDOUT, STDERR).

    It is preferable to use a var for the filename to catch possible typos when used in multiple places.

    One of the current community best practices is to use a lexical var in loops rather than the default global $_ var. However, in this case, since the loop is so small, using $_ would be "ok".

    Instead of using a $skip var for a "flip-flop" Boolean, you could simply use Perl's built-in .. flip-flop/range operator.

    Jumping through the hoops of split/splice/join is unnecessary and very inefficient in this case especially when all that's needed is a simple print statement.

    So, this is how it would look when using the flip-flop operator.
    Code:
    my $file = 'zip_input.txt';
    open my $zip_fh, '<', $file or die "could not open '$file' <$!>";
    
    while (<$zip_fh>) {
        if ( /\d+ [.]{5} \d+/ .. /-+ -{5} -+/ ) {
            print unless /-+ -{5} -+/;
        }
    }
    close $zip_fh;
    Since the flip-flop operator is not even needed in this case, I'd reduce that while loop to this:
    Code:
    while (<$zip_fh>) {
        print if /\d+ [.]{5} \d+/;
    }
  14. #8
  15. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0
    tnaks.. How do I extract upto one sub directory ? I tried different patterns but i'm not able to extract please help...

    I should be able to extract 075678342424_01_ODP/075678342424_AEC in order to get code "AEC" its last 3 letter code ... please help ...
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    You don't have any lines in your sample data like that, so I'm not sure what you want to extract.
  18. #10
  19. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0
    please find data as below

    Date Time Attr Size Compressed Name
    ------------------- ----- ------------ ------------ ------------------------
    2012-03-23 12:22:08 D.... 0 0 075678342424_01_ODP
    2011-10-17 12:04:14 D.... 0 0 075678342424_01_ODP/075678342424_AEC
    2011-10-05 13:18:24 ..... 6148 178 075678342424_01_ODP/075678342424_AEC/.DS_Store
    2012-03-23 17:48:54 D.... 0 0 __MACOSX
    2012-03-23 17:48:54 D.... 0 0 __MACOSX/075678342424_01_ODP
    2012-03-23 17:48:54 D.... 0 0 __MACOSX/075678342424_01_ODP/075678342424_AEC
    2011-10-05 13:18:24 ..... 82 40 __MACOSX/075678342424_01_ODP/075678342424_AEC/._.DS_Store
    2011-10-17 12:04:14 ..... 12445 3545 075678342424_01_ODP/075678342424_AEC/075678342424.eps
    2011-10-17 12:04:14 ..... 49326 3808 __MACOSX/075678342424_01_ODP/075678342424_AEC/._075678342424.eps
    2011-10-14 11:02:14 D.... 0 0 075678342424_01_ODP/075678342424_AEC/075678342424_booklet
    2011-10-10 17:10:10 ..... 6471105 6327034 075678342424_01_ODP/075678342424_AEC/075678342424_booklet/075678342424_booklet.pdf
    2012-03-23 17:48:54 D.... 0 0 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_booklet
    2011-10-10 17:10:10 ..... 82 47 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_booklet/._075678342424_booklet.pdf
    2011-10-28 16:13:16 D.... 0 0 075678342424_01_ODP/075678342424_AEC/075678342424_cover
    2011-10-28 16:13:16 ..... 6416264 6121768 075678342424_01_ODP/075678342424_AEC/075678342424_cover/075678342424.tif
    2012-03-23 17:48:56 D.... 0 0 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_cover
    2011-10-28 16:13:16 ..... 117556 98796 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_cover/._075678342424.tif
    2011-10-14 11:02:16 D.... 0 0 075678342424_01_ODP/075678342424_AEC/075678342424_disc
    2011-10-05 17:13:44 ..... 3773060 446198 075678342424_01_ODP/075678342424_AEC/075678342424_disc/075678342424_disc.bmp
    2012-03-23 17:48:56 D.... 0 0 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_disc
    2011-10-05 17:13:44 ..... 198603 85486 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_disc/._075678342424_disc.bmp
    2011-10-14 11:02:18 D.... 0 0 075678342424_01_ODP/075678342424_AEC/075678342424_tray
    2011-10-05 15:04:26 ..... 1430703 1353361 075678342424_01_ODP/075678342424_AEC/075678342424_tray/075678342424_tray.pdf
    2012-03-23 17:48:56 D.... 0 0 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_tray
    2011-10-05 15:04:26 ..... 82 47 __MACOSX/075678342424_01_ODP/075678342424_AEC/075678342424_tray/._075678342424_tray.pdf
    ------------------- ----- ------------ ------------ ------------------------
    18475456 14440308 12 files, 13 folders
  20. #11
  21. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    Originally Posted by FishMonger
    If you don't mind a little code critique.
    You are certainly welcome to do it, although I do not necessarily write my own code the same way as I post a quick example on a forum, just as I don't code the same way a Perl one-liner, a use-only-once 10-line script and a full-fledge real program. That's sort of answer in particular to your remark on file handles: here we have only one, used only once in the script, to me it is good enough in such a case. But, I perfectly understand your point and use vars for file handles when I open multiple files, read from them or write to them in multiple places in a program having several hundred or more lines.

    Originally Posted by FishMonger
    One of the current community best practices is to use a lexical var in loops rather than the default global $_ var. However, in this case, since the loop is so small, using $_ would be "ok".
    Also agreed, but here, that enables to simplify the syntax of the lines coming immediately after (the two regexes and the split). Again, it is a quick solution, not a full fledged program.

    Originally Posted by FishMonger
    Instead of using a $skip var for a "flip-flop" Boolean, you could simply use Perl's built-in .. flip-flop/range operator.
    Yes, it can be done, but I don't find your syntax to be clearer than mine.

    Originally Posted by FishMonger
    Jumping through the hoops of split/splice/join is unnecessary and very inefficient in this case especially when all that's needed is a simple print statement.
    I used in the original code that because I understood that the original poster wanted to have only the file name. I no longer used the split/splice/join in the code where I just wanted to print the whole line.

    Originally Posted by FishMonger
    Since the flip-flop operator is not even needed in this case, I'd reduce that while loop to this:
    Code:
    while (<$zip_fh>) {
        print if /\d+ [.]{5} \d+/;
    }
    I did not want to use a regular expression on "....." (file attributes) because I did not think it was reliable, there may be cases where some of the dots would be replaced by something else.

    And it appears, from the new data posted by the original poster, that this is indeed the case, sometimes there is a directory with the string "D...." instead of ".....".

    To the original poster: either use the last code snippet I posted, which will work on your newly posted data, or change Fishmonger's proposal to:

    Perl Code:
    while (<$zip_fh>) {
        print if /\d+ [D.]{5} \d+/;
    }
    Last edited by Laurent_R; August 11th, 2012 at 12:05 PM.
  22. #12
  23. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0
    hi Laurent,

    This print entire structure, Now from here I want to take code like AEC (or any code ) in my second example. I want to extract that directory from root. Please how do i traverse or get that sub-directory which has this code AEC / AMZ etc.,


    I tried in this way

    open INPUT, "<", "command.txt" or die "could not open command.txt $! \n" ;
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
    $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to

    #true when you reach the second line containing "-----"
    next if $skip or /\-{5}/;
    my @splitted_line = split; # splits $_ (input line) on spaces
    splice @splitted_line, 0, 5;
    #$name = join ' ', @splitted_line;
    #print "$name \n";
    foreach $i(@splitted_line) {
    if ($i= ~ /AEC/ ) {
    Print $i."\n";
    }
    }

    }
    Last edited by vkchaitanya; August 11th, 2012 at 09:02 PM. Reason: my attempt
  24. #13
  25. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2012
    Posts
    8
    Rep Power
    0
    Also tried this way but i'm not getting anything from print statement.


    open INPUT, "<", "command.txt" or die "could not open command.txt $! \n" ;
    my $skip = 1; # a "flip-flop" Boolean that will enable to skip the introductory and concluding lines
    while (<INPUT>) {
    my $count =0;
    my @dir_name;
    $skip = not $skip if /\-{5}/; # setting $skip to false when you reach the first line containing "-----", and back to

    #true when you reach the second line containing "-----"
    next if $skip or /\-{5}/;
    my @splitted_line = split; # splits $_ (input line) on spaces
    splice @splitted_line, 0, 5;
    $name = join ' ', @splitted_line;
    $dir_name[$count] = $name;
    $count = $count +1;
    }

    foreach $i(@dir_name) {
    print $i;
    }
  26. #14
  27. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    I see two errors in your first attempt (Post # 12):

    Perl Code:
    if ($i= ~ /AEC/ ) {
    Print $i."\n";


    Remove the space between the = and the ~, space rarely matters in Perl, but here, it does. Second, make the first letter of print lower case. Something like this:

    Perl Code:
    foreach $i (@splitted_line) { 
         if ($i =~ /AEC/ ) {
         print $i."\n";
    }
    # ...



    It seems to me that will do what you want, i.e. print the file names and directories containing AEC (if I understood correctly your requirement).
    Last edited by Laurent_R; August 12th, 2012 at 04:15 AM.
  28. #15
  29. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    Hi,

    Your second attempt (Post # 13) seems to be trying to do something else, it looks like you are trying to print every file name and directory path in the list.

    One error is that $count is reset to 0 at each iteration through the while loop, so that you put each $name into the first element of the array, each time overriding what you had put there previously. The second error is that the @dir_name array is declared within the while loop, so that it is lexically scoped within the while loop. When you exit the loop and start your foreach loop, the array is no longer in scope (basically, you could say it is empty at that point, though it would be more correct to say that is is undefined). So, you would have to declare the @dir_name array and the $count var before entering the while loop. I think this would work properly (but I haven't tries these changes, there maybe something else wrong that I did not see).

    But, in fact, you don't really need to store your values in the @dir_name array. It is simpler to change this:

    Perl Code:
    #...
         splice @splitted_line, 0, 5;
         $name = join ' ', @splitted_line;
         $dir_name[$count] = $name;
         $count = $count +1;
    }
     
    foreach $i(@dir_name) {
         print $i;
    }


    To:

    Perl Code:
    #...
        splice @splitted_line, 0, 5;
        $name = join ' ', @splitted_line;
        print $name, "\n";
    }


    Actually, you don't even really need the $name var. So you could shorten it even more to:

    Perl Code:
    #...
        splice @splitted_line, 0, 5;
        print join ' ', @splitted_line, "\n";
    }
    Last edited by Laurent_R; August 12th, 2012 at 04:45 AM. Reason: Correcting a couple of typos
Page 1 of 2 12 Last
  • Jump to page:

IMN logo majestic logo threadwatch logo seochat tools logo