#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Aug 2013
    Posts
    1
    Rep Power
    0

    Parsing a peculiar float format


    I need to parse an odd float format in output files from a program written in fortran (called CNS). The numbers are generally easy to parse, e.g. 0, 2186.54, -5681.27, etc. but numbers with (?large) exponents seem to be mangled e.g. 1.661811-152, having no e/E to denote the start of the exponent. Generally these numbers are unimportant as they are either rounding errors of numbers that should be zero, but they cause the python code that tries to read them with a float() cast to barf.

    I cooked up the following code which seems to do the job of reformatting these numbers to make them easy for float() to comprehend, but wondered whether I was missing anything smarter?

    Code:
    import re
    
    def CNSfloatToFloat(CNSfloat):
      p = re.compile('(?P<sign>-?)(?P<value>\d+(\.\d+)?)(?P<exponent>[+-]\d+)?')
      m = p.match(CNSfloat)
      
      if m.group('exponent'):
        f =  m.group('sign') + m.group('value') + "E" + m.group('exponent')
      else:
        f =  m.group('sign') + m.group('value')
      
      return f
  2. #2
  3. No Profile Picture
    Contributing User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2010
    Posts
    153
    Rep Power
    5
    Is the exponent consistently delimited from the mantissa by a + or - sign?

    What you've got works fine, though if you want to avoid using a regex you might be able to get away with just grabbing the index of the first + or - and, if the index isn't zero, inserting an 'E' in front of it, e.g.:

    Code:
    def CNSfloatToFloat(CNSfloat):
        exp_neg = CNSfloat.find('-', 1)
        exp_pos = CNSfloat.find('+', 1)
        exp = (exp_neg > 0 and exp_neg) or (exp_pos > 0 and exp_pos) or None
        if exp:
            return "{}E{}".format(CNSfloat[:exp], CNSfloat[exp:])
        else:
            return CNSfloat
  4. #3
  5. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,995
    Rep Power
    481
    For some reason this problem of run on numbers is common. FORTRAN output files were created with fixed width fields. Given the format specifier FORTRAN transfers data correctly. But come on, guys, you couldn't stuff a space character in between separate numbers?
    Originally Posted by computer museum or something
    1956: The era of magnetic disk storage dawned with IBM´s shipment of a 305 RAMAC to Zellerbach Paper in San Francisco. The IBM 350 disk file served as the storage component for the Random Access Method of Accounting and Control. It consisted of 50 magnetically coated metal platters with 5 million bytes of data. The platters, stacked one on top of the other, rotated with a common drive shaft.
    I didn't look at your program. Deal with fixed width or known format specification for the line to solve the problem.
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo