#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Mar 2013
    Posts
    1
    Rep Power
    0

    Non-programmer needs help converting a python script


    I have a pair of python scripts that work with an obscure type of uncompressed archive, called *.arc files. Arc files look like this:
    Code:
    Address : 00 01 02 03
    0x0000  : 02 00 00 00 // this is the number of subfiles minus 1
    0x0004  : 10 00 00 00 // this is the pointer to the first file
    0x0008  : 18 00 00 00 // this is the pointer to the second file
    0x000C  : 20 00 00 00 // this is the pointer to the third file
    0x0010  : 14 18 85 36 // this is the first file
    0x0014  : 98 17 22 95
    0x0018  : 00 11 FF 02 // this is the second file
    0x001C  : 00 00 00 00
    0x0020  : FF FF FF FF // this is the third file
    0x0024  : FF FF FF FF
    Notable features of *.arc files:
    * They are little-endian.
    * They begin with a 4 byte header which is the number of sub-files in the archive, minus one. This means that a value of 00 00 00 00 equals 1 subfile. This makes sense because obviously there will be at least one sub-file in the archive, so it starts at 0 instead of at 1.
    *Following the 4 byte header is a series of pointers. These pointers are also 4 bytes long.
    * That's it.

    I'm not a programmer, and I'm too busy taking liberal arts requirements to teach myself programming right now (I will take some CS classes next fall though). I want someone to point out the parts of the python scripts that delineate the header and pointer structures, so I can change the python scripts to work with other types of uncompressed archives.

    Specifically, *.pep files. Pep files have the following features:
    * Little endian.
    * Doesn't have a header that tells you how many sub-files are present.
    * Always has 6 pointers. If the archive has less than 6 subfiles, then the last pointers point to zero.
    * Each pointer is two bytes long.
    * The fourth sub-file actually begins 4 bytes after the address listed in the pointer table, for some reason.

    As you can see, *.pep files are much simpler than *.arc files. I'll post the python scripts now.

    ****

    This script splits a *.arc file into it's component subfiles. Consider the example arc file I posted earlier:
    Code:
    EXAMPLE.ARC
    Address : 00 01 02 03
    0x0000  : 02 00 00 00 // this is the header
    0x0004  : 10 00 00 00 // this is the pointer to the first file
    0x0008  : 18 00 00 00 // this is the pointer to the second file
    0x000C  : 20 00 00 00 // this is the pointer to the third file
    0x0010  : 14 18 85 36 // this is the first file
    0x0014  : 98 17 22 95
    0x0018  : 00 11 FF 02 // this is the second file
    0x001C  : 00 00 00 00
    0x0020  : FF FF FF FF // this is the third file
    0x0024  : FF FF FF FF
    This script will turn that arc file into the following files:

    Code:
    EXAMPLE.ARC.part1.bin
    0x0000  : 14 18 85 36 // this is the first file
    0x0004  : 98 17 22 95
    Code:
    EXAMPLE.ARC.part2.bin
    0x0000  : 00 11 FF 02 // this is the second file
    0x0004  : 00 00 00 00
    Code:
    EXAMPLE.ARC.part3.bin
    0x0000  : FF FF FF FF // this is the third file
    0x0004  : FF FF FF FF
    Here's the python script:
    Code:
    saga_split.py
    
    from struct import unpack
    from io import FileIO
    import sys,os,glob
    
    
    def split_file(fd):
        datafile = FileIO(fd,"r")
        num_pointers = unpack("<I",datafile.read(4))
        pointer_fmt = "<%dI"%num_pointers[0]
        pointers = unpack(pointer_fmt,datafile.read(num_pointers[0]*4))
        pointers+=(os.stat(datafile.name).st_size,)
        i = 0
        datafile.seek(pointers[0],0)
        while i < (num_pointers[0]):
       
            splitname = "%s_file%u.bin"%(datafile.name,i)
            with FileIO(splitname,"w") as splitfile:
                splitfile.write(datafile.read(pointers[i+1]-pointers[i]))
            i+=1
    
    for arg in sys.argv[1:]:
        for files in glob.glob(arg):
            split_file(files)
    ****

    The next script will print off some info about a specified sub-file. Namely, the starting address of the sub-file within the arc file, the length of the subfile, and the name of the arc file in which it is found.

    Code:
    saga_print.py
    
    from struct import unpack
    from io import FileIO
    import sys,os,glob
    
    
    def print_file(fd):
        datafile = FileIO(fd,"r")
        num_pointers = unpack("<I",datafile.read(4))
        pointer_fmt = "<%dI"%num_pointers[0]
        pointers = unpack(pointer_fmt,datafile.read(num_pointers[0]*4))
        pointers+=(os.stat(datafile.name).st_size,)
        print "%s\t0x%X\t%s"%(datafile.name,pointers[1],pointers[2]-pointers[1])
    if len(sys.argv) <2:
        print "no files given"
        sys.exit(0) 
    for arg in sys.argv[1:]:
        print "file\t\toffset\tlength"
        for files in glob.glob(arg):
           
            print_file(files)
    You can specify which sub-file you want this information on by changing the following line of the above script:
    Code:
        print "%s\t0x%X\t%s"%(datafile.name,pointers[8],pointers[9]-pointers[8])
    Change it as such, where you replace SUBFILE# with the number of the specific sub-file you want this info on:
    Code:
        print "%s\t0x%X\t%s"%(datafile.name,pointers[SUBFILE#],pointers[SUBFILE#+1]-pointers[SUBFILE#])
  2. #2
  3. Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    May 2012
    Location
    39N 104.28W
    Posts
    158
    Rep Power
    3
    Originally Posted by xibalba
    I have a pair of python scripts that work with an obscure type of uncompressed archive, called *.arc files. Arc files look like this:
    Code:
    Address : 00 01 02 03
    0x0000  : 02 00 00 00 // this is the number of subfiles minus 1
    0x0004  : 10 00 00 00 // this is the pointer to the first file
    0x0008  : 18 00 00 00 // this is the pointer to the second file
    0x000C  : 20 00 00 00 // this is the pointer to the third file
    0x0010  : 14 18 85 36 // this is the first file
    0x0014  : 98 17 22 95
    0x0018  : 00 11 FF 02 // this is the second file
    0x001C  : 00 00 00 00
    0x0020  : FF FF FF FF // this is the third file
    0x0024  : FF FF FF FF
    Notable features of *.arc files:
    * They are little-endian.
    * They begin with a 4 byte header which is the number of sub-files in the archive, minus one. This means that a value of 00 00 00 00 equals 1 subfile. This makes sense because obviously there will be at least one sub-file in the archive, so it starts at 0 instead of at 1.
    *Following the 4 byte header is a series of pointers. These pointers are also 4 bytes long.
    * That's it.

    I'm not a programmer, and I'm too busy taking liberal arts requirements to teach myself programming right now (I will take some CS classes next fall though). I want someone to point out the parts of the python scripts that delineate the header and pointer structures, so I can change the python scripts to work with other types of uncompressed archives.

    Specifically, *.pep files. Pep files have the following features:
    * Little endian.
    * Doesn't have a header that tells you how many sub-files are present.
    * Always has 6 pointers. If the archive has less than 6 subfiles, then the last pointers point to zero.
    * Each pointer is two bytes long.
    * The fourth sub-file actually begins 4 bytes after the address listed in the pointer table, for some reason.

    As you can see, *.pep files are much simpler than *.arc files. I'll post the python scripts now.

    ****

    This script splits a *.arc file into it's component subfiles. Consider the example arc file I posted earlier:
    Code:
    EXAMPLE.ARC
    Address : 00 01 02 03
    0x0000  : 02 00 00 00 // this is the header
    0x0004  : 10 00 00 00 // this is the pointer to the first file
    0x0008  : 18 00 00 00 // this is the pointer to the second file
    0x000C  : 20 00 00 00 // this is the pointer to the third file
    0x0010  : 14 18 85 36 // this is the first file
    0x0014  : 98 17 22 95
    0x0018  : 00 11 FF 02 // this is the second file
    0x001C  : 00 00 00 00
    0x0020  : FF FF FF FF // this is the third file
    0x0024  : FF FF FF FF
    This script will turn that arc file into the following files:

    Code:
    EXAMPLE.ARC.part1.bin
    0x0000  : 14 18 85 36 // this is the first file
    0x0004  : 98 17 22 95
    Code:
    EXAMPLE.ARC.part2.bin
    0x0000  : 00 11 FF 02 // this is the second file
    0x0004  : 00 00 00 00
    Code:
    EXAMPLE.ARC.part3.bin
    0x0000  : FF FF FF FF // this is the third file
    0x0004  : FF FF FF FF
    Here's the python script:
    Code:
    saga_split.py
    
    from struct import unpack
    from io import FileIO
    import sys,os,glob
    
    
    def split_file(fd):
        datafile = FileIO(fd,"r")
        num_pointers = unpack("<I",datafile.read(4))
        pointer_fmt = "<%dI"%num_pointers[0]
        pointers = unpack(pointer_fmt,datafile.read(num_pointers[0]*4))
        pointers+=(os.stat(datafile.name).st_size,)
        i = 0
        datafile.seek(pointers[0],0)
        while i < (num_pointers[0]):
       
            splitname = "%s_file%u.bin"%(datafile.name,i)
            with FileIO(splitname,"w") as splitfile:
                splitfile.write(datafile.read(pointers[i+1]-pointers[i]))
            i+=1
    
    for arg in sys.argv[1:]:
        for files in glob.glob(arg):
            split_file(files)
    ****

    The next script will print off some info about a specified sub-file. Namely, the starting address of the sub-file within the arc file, the length of the subfile, and the name of the arc file in which it is found.

    Code:
    saga_print.py
    
    from struct import unpack
    from io import FileIO
    import sys,os,glob
    
    
    def print_file(fd):
        datafile = FileIO(fd,"r")
        num_pointers = unpack("<I",datafile.read(4))
        pointer_fmt = "<%dI"%num_pointers[0]
        pointers = unpack(pointer_fmt,datafile.read(num_pointers[0]*4))
        pointers+=(os.stat(datafile.name).st_size,)
        print "%s\t0x%X\t%s"%(datafile.name,pointers[1],pointers[2]-pointers[1])
    if len(sys.argv) <2:
        print "no files given"
        sys.exit(0) 
    for arg in sys.argv[1:]:
        print "file\t\toffset\tlength"
        for files in glob.glob(arg):
           
            print_file(files)
    You can specify which sub-file you want this information on by changing the following line of the above script:
    Code:
        print "%s\t0x%X\t%s"%(datafile.name,pointers[8],pointers[9]-pointers[8])
    Change it as such, where you replace SUBFILE# with the number of the specific sub-file you want this info on:
    Code:
        print "%s\t0x%X\t%s"%(datafile.name,pointers[SUBFILE#],pointers[SUBFILE#+1]-pointers[SUBFILE#])
    Code:
    for arg in sys.argv[1:]:
        for files in glob.glob(arg):
            split_file(files)
    [/CODE]
    This just takes the arguments (directory names from the look of things) from the command line and calls the main work function, split_file, for each file found inside each (that's the "glob" part).

    Code:
    def split_file(fd):
        datafile = FileIO(fd,"r")
        num_pointers = unpack("<I",datafile.read(4))
        pointer_fmt = "<%dI"%num_pointers[0]
        pointers = unpack(pointer_fmt,datafile.read(num_pointers[0]*4))
        pointers+=(os.stat(datafile.name).st_size,)
        i = 0
        datafile.seek(pointers[0],0)
        while i < (num_pointers[0]):
       
            splitname = "%s_file%u.bin"%(datafile.name,i)
            with FileIO(splitname,"w") as splitfile:
                splitfile.write(datafile.read(pointers[i+1]-pointers[i]))
            i+=1
    So this is what you're asking about. It looks to me that you read the first 4 bytes (the header) of the presumptive .arc file,
    num_pointers = unpack("<I",datafile.read(4))
    as a list of unsigned integers.
    Then you use the first of those integers to build the format string,
    pointer_fmt = "<%dI"%num_pointers[0]
    Then you read the pointers themselves in the correct format (meaning unsigned integers of the correct width) into a list,
    pointers = unpack(pointer_fmt,datafile.read(num_pointers[0]*4))
    ...But if that's true, I don't understand this:
    pointers+=(os.stat(datafile.name).st_size,)
    anyway...
    Then you have an inefficient loop through the number of pointers, using "seek()" to get to each archived file. Inside the loop you create a file name,
    splitname = "%s_file%u.bin"%(datafile.name,i)
    into which you write the data between pointers,
    splitfile.write(datafile.read(pointers[i+1]-pointers[i]))

    Does that help?

IMN logo majestic logo threadwatch logo seochat tools logo