1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Rep Power

    Quick Regex - need help

    I want to split a string in following manner. Here are example strings:
    "Hello this is a string.-2.34 This is an example1 string."
    "Hello this is another string.3.53 This is an example2 string."
    Please note that "" is a U+F8FF unicode character and the type of the string is Unicode.

    I want to have 3 broken parts of the string:
    "Hello this is a string.","-2.34"," This is an example1 string."
    "Hello this is another string.","3.53","This is an example2 string."

    I have written a regex to split the string but using this I cannot get the numeric part that I want. (-2.34 in first string)
    My code:
    import re
    import os
    from django.utils.encoding import smart_str, smart_unicode

    text = open(r"C:\data.txt").read()
    text = text.decode('utf-8')

    pat = re.compile(u"\uf8ff-*\d+\.*\d+")
    newpart = pat.split(text)
    firstpart = newpart[::1]

    print ("first part of the string ----")
    for f in firstpart:
    f = smart_str(f)
    print ("-----")
    print f

    I am still a learner and I'm sure there is a better way of doing this. Please help!
  2. #2
  3. Contributing User

    Join Date
    Aug 2011
    Rep Power
    django, os, re not required.

    >>> a = u'Hello this is a string.\uf8ff-2.34 This is an example1 string.'
    >>> fracture = a.split(u'\uf8ff')
    >>> print(fracture)
    [u'Hello this is a string.', u'-2.34 This is an example1 string.']
    >>> print(fracture[1])
    -2.34 This is an example1 string.
    >>> b = fracture[1].split()
    >>> print(b)
    [u'-2.34', u'This', u'is', u'an', u'example1', u'string.']
    >>> print(b[0])
    >>> print(float(b[0]))
    >>> print(999+float(b[0]))
    >>> float(a.split(u'\uf8ff')[1].split()[0])
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo