#1
  1. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Posts
    154
    Rep Power
    14

    Using zlib and cpickle


    I'm trying to serialise some objects using cpickle and zlib. Although I'm able to save these files, I'm having difficulty loading them. This could be because the format in which they've been saved and written to a file is unreadable or because the load method doesn't correctly cast to the write format or 'maybe both'. Here is the snippet of code I'm using:

    Code:
    try:
    
    
    trainedunigramtagger = open("trainedunigramtagger.p","r")
    trainedbrilltagger = open("trainedbrilltagger.p","r")
    trainedunigramtaggertext = ""
    trainedbrilltaggertext = ""
    trainedunigramtaggertext = str(trainedunigramtagger.read())
    trainedbrilltaggertext = str(trainedbrilltagger.read())
    postagger = cPickle.loads(zlib.decompress(trainedunigramtaggertext))
    brillrules = cPickle.loads(zlib.decompress(trainedbrilltaggertext))
    trainedunigramtagger.close()
    trainedbrilltagger.close()
    except IOError:
    postagger.train(train_tokens)
    trainedunigramtagger = open("trainedunigramtagger.p","w")
    trainedunigramtagger.write(str(zlib.compress(cPickle.dumps(postagger, 0, 5)))
    trainedunigramtagger.close()
    brillrules = brilltrainer.train(train_tokens, max_rules=10, min_score=2)
    trainedbrilltagger = open("trainedbrilltagger.p","w")
    trainedbrilltagger.write(str(zlib.compress(cPickle.dumps(brillrules, 0), 5)))
    trainedbrilltagger.close()

    The traceback error I receive is:

    postagger = cPickle.loads(zlib.decompress(trainedunigramtaggertext))
    zlib.error: Error -5 while decompressing data


    Could do with some urgent advice from anyone who can spot the mistake(s) from the above code.

    Thanks in advance,

    Mark
  2. #2
  3. Hello World :)
    Devshed Frequenter (2500 - 2999 posts)

    Join Date
    Mar 2003
    Location
    Hull, UK
    Posts
    2,537
    Rep Power
    69
    I think we can rule out your loading procedure since it looks to be fine to me, however I'm not so sure about some of the lines inside the except IOError block.

    Anyway what seems to be happening is that zlib is chocking while trying to decompress the string from the file, before it even gets to the cPickle.loads() method. What this means is that your data is likely being corrupted or saved in the wrong way. This could be being caused by a bad call to dumps() or compress():

    Code:
    trainedunigramtagger.write(str(zlib.compress(cPickle.dumps(postagger, 0, 5)))
    Since dumps only accepts two arguments; although I would have expected this to raise an error about the number of arguments being passed to the function :rolleyeys.

    It does also seem like your program could be trying to use an object [postagger] that failed to load from the pickled file due to the IOError however without seeing the rest of your code I can't really tell for sure .

    You might want to look back at your code since there seems to be a lot of unwanted code complicating the matter i.e. calling str() on the string being read from a file.

    Hope this helps

    Mark.
    programming language development: www.netytan.com Hula

  4. #3
  5. No Profile Picture
    Contributing User
    Devshed Intermediate (1500 - 1999 posts)

    Join Date
    Feb 2004
    Location
    London, England
    Posts
    1,585
    Rep Power
    1373
    Are you running this on Windows? You need to open the files with the binary flag set, otherwise the OS will add/remove carriage returns when you read/write to the file. This is fine for a text file, but will corrupt a binary file. Use:

    Code:
    open(...,"rb") #read binary mode 
    
    and
    
    open(...,"wb")    #write binary mode
    Dave - The Developers' Coach
  6. #4
  7. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Feb 2003
    Posts
    154
    Rep Power
    14
    Hi guys,

    thanks for all your help. Unfortunately, I have to press on and so can't spend much more time with this interesting problem. I was pleasantly supprised about how cPickle's files size increases logarithmically when increasing the amount of data to be stored. As such, I have now abandoned zlib!!!

    Just to answer the question about o/s I'm using, I'm programming and running the python program remotely (from a windows machine), but through a remote unix shell!!!

    Thanks for the advice. If I get time towards the end of this project, I'll try out what you suggested and see what progress I can make.

    Mark
  8. #5
  9. Mini me.
    Devshed Novice (500 - 999 posts)

    Join Date
    Nov 2003
    Location
    Cambridge, UK
    Posts
    783
    Rep Power
    13
    A bit late but my 2 cents....

    A simple typo provides legal but ambiguous code:
    Code:
    cPickle.dumps(postagger, 0, 5)
    Format is dumps( object[, protocol[, bin]])
    where protocol = 0 means ASCII (and is the default)
    bin != 0 indicates binary

    The bin flag is deprecated.

    I'm not sure how cPickle would cope

    The following would pickle in ASCII format and zip with compression level 5 (I guess it was what was originally wanted):
    Code:
    trainedunigramtagger.write(zlib.compress(cPickle.dumps(postagger, 0), 5))
    (Dropped the redundant str() )

    grim

IMN logo majestic logo threadwatch logo seochat tools logo