#1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Nov 2012
    Posts
    4
    Rep Power
    0

    Quick Regex - need help


    Hello,
    I want to split a string in following manner. Here are example strings:
    -----
    "Hello this is a string.-2.34 This is an example1 string."
    "Hello this is another string.3.53 This is an example2 string."
    -----
    Please note that "" is a U+F8FF unicode character and the type of the string is Unicode.

    I want to have 3 broken parts of the string:
    "Hello this is a string.","-2.34"," This is an example1 string."
    "Hello this is another string.","3.53","This is an example2 string."

    I have written a regex to split the string but using this I cannot get the numeric part that I want. (-2.34 in first string)
    ---------
    My code:
    ------------------------
    import re
    import os
    from django.utils.encoding import smart_str, smart_unicode

    text = open(r"C:\data.txt").read()
    text = text.decode('utf-8')
    print(smart_str(text))

    pat = re.compile(u"\uf8ff-*\d+\.*\d+")
    newpart = pat.split(text)
    firstpart = newpart[::1]

    print ("first part of the string ----")
    for f in firstpart:
    f = smart_str(f)
    print ("-----")
    print f


    ---------
    I am still a learner and I'm sure there is a better way of doing this. Please help!
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,711
    Rep Power
    480
    django, os, re not required.


    Code:
    >>> a = u'Hello this is a string.\uf8ff-2.34 This is an example1 string.'
    >>> fracture = a.split(u'\uf8ff')
    >>> print(fracture)
    [u'Hello this is a string.', u'-2.34 This is an example1 string.']
    >>> print(fracture[1])
    -2.34 This is an example1 string.
    >>> b = fracture[1].split()
    >>> print(b)
    [u'-2.34', u'This', u'is', u'an', u'example1', u'string.']
    >>> print(b[0])
    -2.34
    >>> print(float(b[0]))
    -2.34
    >>> print(999+float(b[0]))
    996.66
    >>> float(a.split(u'\uf8ff')[1].split()[0])
    -2.34
    >>>
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo