#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    39
    Rep Power
    2

    Problems handling huge hex data


    Hello,
    I am operating on this huge .dat file(number of characters on conversion to ascii : 110165103)
    This is the code I have written so far

    My problem :
    1.When I write the contents to $concatenatedstring and print it's contents, i am able to print the entire string. However, the commented out code results in "Out of memory!".
    a)Why is that ? The while loop runs char-by-char, but Isn't printing also done char-by-char ?
    b)And if it's related to storage, then even at the time of printing the string, the variable $concatenatedstring holds entire string, correct ?

    2.My main requirement is to navigate to some particular bytecount and get some data. I had initially thought of writing the entire hex file to asciii using unpack and then parse the string. But now since that doesn't work, can you suggest me something. Would you suggest me to parse in hex itself, if yes, can you point me to a link regarding this ?

    Please advise,
    Thanks.

    Code:
    #!/usr/local/bin/perl
    
    open (FILE,'myhexfile.dat') or die "could not open file";
    
    $count = 0;
    $charcount = 0;
    $linecount = 0;
    my $concatenatedstring;
    
    while($line = <FILE>)
    {
     $stringthisline = unpack "H*",$line;
     $concatenatedstring = $concatenatedstring.$stringthisline;
     $linecount++;
    }
    print "total lines read == [$linecount]\n";
    print "concatenatedstring == [$concatenatedstring]\n";
    print "finished printing the string";
    
    q{
    my @chars = split(//,$concatenatedstring);
    foreach $char(@chars)
    {
     $charcount++;
     print "total chars read so far == [$charcount]\n";
    }}
  2. #2
  3. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,262
    Rep Power
    1810
    1.When I write the contents to $concatenatedstring and print it's contents, i am able to print the entire string. However, the commented out code results in "Out of memory!".
    a)Why is that ?
    $concatenatedstring contains the entire, very large file.

    Code:
    my @chars = split(//,$concatenatedstring);
    Now, @chars hold the entire file, along with the overhead of the array and string components inside of it, AND $concatenatedstring still contains the entire, very large file. You have more than doubled the storage requirements of the program.

    A few other things: $. is a special perl variable that hold the line count of a file. No need to manually count that yourself. Also, rather than splitting concatenatedstring to count the number of characters in it, you could just ask for length(), but better to add up the size as you go in the first while loop.

    But in a data file, what is a line? Do those even really exist, or is it just a big block of data? Just because there are characters seen as a newline character, doesn't mean they were intended to be ASCII or UTF8 newlines denoting the end of a line of text.

    Only relevent link I can find at the moment is this similar question from StackOverflow. You can set unpack to skip a certain number of bytes.
    Last edited by keath; June 1st, 2013 at 09:14 AM.
  4. #3
  5. !~ /m$/
    Devshed Specialist (4000 - 4499 posts)

    Join Date
    May 2004
    Location
    Reno, NV
    Posts
    4,262
    Rep Power
    1810
    One possibility would be to try the perl read function.

    Since the file is large, and you only need some part of it, you could use read with offset to put just a small part of the file into a scalar and work with that.

    Might be a good example here: http://www.cs.cf.ac.uk/Dave/PERL/node73.html
    Last edited by keath; June 1st, 2013 at 09:41 AM.
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    What does "huge" mean to you? How big is the file in terms of Mega Bytes?

    110165103 is about 100MB which is not huge in my mind.

    Your script is missing 2 very important use statements.
    Code:
    use strict;
    use warnings;
    Add those and fix the problems that they point out.
  8. #5
  9. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    39
    Rep Power
    2
    Originally Posted by keath
    Since the file is large, and you only need some part of it, you could use read with offset to put just a small part of the file into a scalar and work with that.
    Originally Posted by FishMonger
    What does "huge" mean to you? How big is the file in terms of Mega Bytes?
    110165103 is about 100MB which is not huge in my mind.
    Thanks for the replies.
    I am just digging into read, seek, tell etc and trying to do it using the offset method.

    But isn't there any way I can read the contents of my .dat file -> convert it to ascii -> and then store it in a variable in my perl program. My .dat file is 55MB. ? I ask this, because, I would still like to parse a text file than a .dat file
  10. #6
  11. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Apr 2009
    Posts
    1,940
    Rep Power
    1225
    But isn't there any way I can read the contents of my .dat file -> convert it to ascii -> and then store it in a variable in my perl program.
    Sure, use the read function to read-in 4k of data at a time and convert it and append it to a var. Each iteration of the loop will clear out the old data in the var that holds the hex data rather than appending to it as you're currently doing. You'll end up with 1 var that holds the entire converted data.
  12. #7
  13. No Profile Picture
    Contributing User
    Devshed Novice (500 - 999 posts)

    Join Date
    Jun 2012
    Posts
    836
    Rep Power
    496
    In addition to all what has already been said, you should probably use the read and seek functions to read a binary file. The <> diamond operator is not really fit for binary data (although it might work in some cases). I did it recently on a binary file, it worked perfectly.

    55 MB is certainly not huge, but just relatively big.

    Your best approach may be to read your binary file by chunks of, say, 10 kB or 100 kB, to unpack these chunks to ASCII and write the result to another file. Then you can play with the temporary file.
  14. #8
  15. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    39
    Rep Power
    2
    Originally Posted by FishMonger
    Sure, use the read function to read-in 4k of data at a time and convert it and append it to a var. Each iteration of the loop will clear out the old data in the var that holds the hex data rather than appending to it as you're currently doing. You'll end up with 1 var that holds the entire converted data.
    FishMonger, I shall put in some effort and get back to you.
    But please can you explain to me the last line once again,Each iteration of the loop will clear out the old data in the var that holds the hex data but it's also said,You'll end up with 1 var that holds the entire converted data.

    I couldn't understand this clearly. if the old data is cleared from the var, how do we end up with the var containing the entire converted data?
  16. #9
  17. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2013
    Posts
    39
    Rep Power
    2
    Originally Posted by keath
    you could use read with offset to put just a small part of the file into a scalar and work with that.
    Originally Posted by FishMonger
    Sure, use the read function
    Originally Posted by Laurent_R
    In addition to all what has already been said, you should probably use the read and seek functions to read a binary file.
    Thanks a lot keath, FishMonger ,Laurent_R for your replies, I got it done using read, seek and unpack for converting and printing as all of you suggested.


    Originally Posted by Laurent_R
    55 MB is certainly not huge, but just relatively big.
    Absolutely, the mistake I was doing initially was, I was reading in the entire hex file, converting to ascii and then trying to parse in ascii. That would result in Out of memory! and made me feel the file was huge. But now that I'm using seek and read, I too feel the size is not that big . Thanks once again.

IMN logo majestic logo threadwatch logo seochat tools logo