Thread: Regex help

    #1
  1. No Profile Picture
    Registered User
    Devshed Newbie (0 - 499 posts)

    Join Date
    Jul 2012
    Posts
    12
    Rep Power
    0

    Regex help


    Hi,
    I have a dump from the web which I have to read and get the key value pairs present. But I am not able to find out the most optimum/(speediest) way to get this done.

    The input data looks like the following:

    And currently I am splitting based on the parameter name, and then looping over to concatenate the values. Could somebody help in a regular expression or a more speed/CPU efficient way of getting the values. For e.g, at the end of parsing; parameter Build will have a concatenated value as shown below.

    Build = "R_Fzzz_v1, R_Fxxx_v1, R_Fyyy_v1"

    Data Text:

    <input name="Build" type="hidden" value="R_Fzzz_v1">
    <input name="Build" type="hidden" value="R_Fxxx_v1">
    <input name="Build" type="hidden" value="R_Fyyy_v1">
    <input name="SDChangeNote" type="hidden" value="">
    <input name="$SDTestResponsiblePersons" type="hidden" value="">
    <input name="$SDTLStates" type="hidden" value="Passed">
    <input name="$SDTLBuilds" type="hidden" value="">
    <input name="$SDTLCases" type="hidden" value="">
    <input name="Versions" type="hidden" value="SS1 SS.1.5">
    <input name="Versions" type="hidden" value="SS2 SS4.26">
    <input name="Versions" type="hidden" value="SS1 SS_4.28">
    <input name="Versions" type="hidden" value="SS1 SS4.28">
    <input name="Group" type="hidden" value="team1">
    <input name="Group" type="hidden" value="team2">
    <input name="SDType" type="hidden" value="Release">
  2. #2
  3. Contributing User
    Devshed Demi-God (4500 - 4999 posts)

    Join Date
    Aug 2011
    Posts
    4,837
    Rep Power
    480
    If this program is not fast enough, I can provide a significantly faster code using flex.
    Lambert Electronics, USA. NY.
    b49p23tivg at stny.rr.com

    Code:
    data = '''
        <input name="Build" type="hidden" value="R_Fzzz_v1">
        <input name="Build" type="hidden" value="R_Fxxx_v1">
        <input name="Build" type="hidden" value="R_Fyyy_v1">
        <input name="SDChangeNote" type="hidden" value="">
        <input name="$SDTestResponsiblePersons" type="hidden" value="">
        <input name="$SDTLStates" type="hidden" value="Passed">
        <input name="$SDTLBuilds" type="hidden" value="">
        <input name="$SDTLCases" type="hidden" value="">
        <input name="Versions" type="hidden" value="SS1 SS.1.5">
        <input name="Versions" type="hidden" value="SS2 SS4.26">
        <input name="Versions" type="hidden" value="SS1 SS_4.28">
        <input name="Versions" type="hidden" value="SS1 SS4.28">
        <input name="Group" type="hidden" value="team1">
        <input name="Group" type="hidden" value="team2">
        <input name="SDType" type="hidden" value="Release">
    '''
    
    import collections, re, pprint
    
    result = collections.defaultdict(list)
    
    findall = re.compile('"[^"]*"').findall   # is pattern sufficiently general?
    
    for line in data.split('\n'):
        line = line.strip()
        if line.startswith('<input name=') and (' value="' in line):
            strings = findall(line)
            key = strings[0][1:-1]
            value = strings[-1][1:-1]
            result[key].append(value)
    
    print('**** displaying the dictionary determined from your data****')
    pprint.pprint(result)  # It seems that this dictionary is what you should actually want as output.
    
    print('\n'*3+'**** displaying the environment you request****')
    your_environment = {key:', '.join(value) for (key,value,) in result.items()}
    pprint.pprint(your_environment)
    
    print('\n'*3+'****use your parameter? I assumed you mean "variable" ****')
    exec('print("the value of variable Versions is "+Versions)',your_environment) # run statements in your_environment

    Output for you lazy heads who won't bother to run it:
    Code:
    $ python p.py
    **** displaying the dictionary determined from your data****
    defaultdict(<type 'list'>, {'$SDTLBuilds': [''], 'SDType': ['Release'], 'Group': ['team1', 'team2'], '$SDTestResponsiblePersons': [''], 'Versions': ['SS1 SS.1.5', 'SS2 SS4.26', 'SS1 SS_4.28', 'SS1 SS4.28'], 'SDChangeNote': [''], 'Build': ['R_Fzzz_v1', 'R_Fxxx_v1', 'R_Fyyy_v1'], '$SDTLCases': [''], '$SDTLStates': ['Passed']})
    
    
    
    **** displaying the environment you request****
    {'$SDTLBuilds': '',
     '$SDTLCases': '',
     '$SDTLStates': 'Passed',
     '$SDTestResponsiblePersons': '',
     'Build': 'R_Fzzz_v1, R_Fxxx_v1, R_Fyyy_v1',
     'Group': 'team1, team2',
     'SDChangeNote': '',
     'SDType': 'Release',
     'Versions': 'SS1 SS.1.5, SS2 SS4.26, SS1 SS_4.28, SS1 SS4.28'}
    
    
    
    ****use your parameter? I assumed you mean "variable" ****
    the value of variable Versions is SS1 SS.1.5, SS2 SS4.26, SS1 SS_4.28, SS1 SS4.28
    Last edited by b49P23TIvg; September 17th, 2012 at 11:07 AM. Reason: Added the point of the message
    [code]Code tags[/code] are essential for python code and Makefiles!

IMN logo majestic logo threadwatch logo seochat tools logo