The Shed is going Social! Join us on FaceBook and Twitter and chime in on the conversation.
|
 |
|
Dev Shed Forums
> Programming Languages
> Python Programming
|
Regex help
Discuss Regex help in the Python Programming forum on Dev Shed. Regex help Python Programming forum discussing coding techniques, tips and tricks, and Zope related information. Python was designed from the ground up to be a completely object-oriented programming language.
|
|
 |
|
|
|
|

Dev Shed Forums Sponsor:
|
|
|

September 17th, 2012, 06:00 AM
|
|
Registered User
|
|
Join Date: Jul 2012
Posts: 12
Time spent in forums: 4 h 37 m 41 sec
Reputation Power: 0
|
|
|
Regex help
Hi,
I have a dump from the web which I have to read and get the key value pairs present. But I am not able to find out the most optimum/(speediest) way to get this done.
The input data looks like the following:
And currently I am splitting based on the parameter name, and then looping over to concatenate the values. Could somebody help in a regular expression or a more speed/CPU efficient way of getting the values. For e.g, at the end of parsing; parameter Build will have a concatenated value as shown below.
Build = "R_Fzzz_v1, R_Fxxx_v1, R_Fyyy_v1"
Data Text:
<input name="Build" type="hidden" value="R_Fzzz_v1">
<input name="Build" type="hidden" value="R_Fxxx_v1">
<input name="Build" type="hidden" value="R_Fyyy_v1">
<input name="SDChangeNote" type="hidden" value="">
<input name="$SDTestResponsiblePersons" type="hidden" value="">
<input name="$SDTLStates" type="hidden" value="Passed">
<input name="$SDTLBuilds" type="hidden" value="">
<input name="$SDTLCases" type="hidden" value="">
<input name="Versions" type="hidden" value="SS1 SS.1.5">
<input name="Versions" type="hidden" value="SS2 SS4.26">
<input name="Versions" type="hidden" value="SS1 SS_4.28">
<input name="Versions" type="hidden" value="SS1 SS4.28">
<input name="Group" type="hidden" value="team1">
<input name="Group" type="hidden" value="team2">
<input name="SDType" type="hidden" value="Release">
|

September 17th, 2012, 11:05 AM
|
 |
Contributing User
|
|
|
|
If this program is not fast enough, I can provide a significantly faster code using flex.
Lambert Electronics, USA. NY.
b49p23tivg at stny.rr.com
Code:
data = '''
<input name="Build" type="hidden" value="R_Fzzz_v1">
<input name="Build" type="hidden" value="R_Fxxx_v1">
<input name="Build" type="hidden" value="R_Fyyy_v1">
<input name="SDChangeNote" type="hidden" value="">
<input name="$SDTestResponsiblePersons" type="hidden" value="">
<input name="$SDTLStates" type="hidden" value="Passed">
<input name="$SDTLBuilds" type="hidden" value="">
<input name="$SDTLCases" type="hidden" value="">
<input name="Versions" type="hidden" value="SS1 SS.1.5">
<input name="Versions" type="hidden" value="SS2 SS4.26">
<input name="Versions" type="hidden" value="SS1 SS_4.28">
<input name="Versions" type="hidden" value="SS1 SS4.28">
<input name="Group" type="hidden" value="team1">
<input name="Group" type="hidden" value="team2">
<input name="SDType" type="hidden" value="Release">
'''
import collections, re, pprint
result = collections.defaultdict(list)
findall = re.compile('"[^"]*"').findall # is pattern sufficiently general?
for line in data.split('\n'):
line = line.strip()
if line.startswith('<input name=') and (' value="' in line):
strings = findall(line)
key = strings[0][1:-1]
value = strings[-1][1:-1]
result[key].append(value)
print('**** displaying the dictionary determined from your data****')
pprint.pprint(result) # It seems that this dictionary is what you should actually want as output.
print('\n'*3+'**** displaying the environment you request****')
your_environment = {key:', '.join(value) for (key,value,) in result.items()}
pprint.pprint(your_environment)
print('\n'*3+'****use your parameter? I assumed you mean "variable" ****')
exec('print("the value of variable Versions is "+Versions)',your_environment) # run statements in your_environment
Output for you lazy heads who won't bother to run it:
Code:
$ python p.py
**** displaying the dictionary determined from your data****
defaultdict(<type 'list'>, {'$SDTLBuilds': [''], 'SDType': ['Release'], 'Group': ['team1', 'team2'], '$SDTestResponsiblePersons': [''], 'Versions': ['SS1 SS.1.5', 'SS2 SS4.26', 'SS1 SS_4.28', 'SS1 SS4.28'], 'SDChangeNote': [''], 'Build': ['R_Fzzz_v1', 'R_Fxxx_v1', 'R_Fyyy_v1'], '$SDTLCases': [''], '$SDTLStates': ['Passed']})
**** displaying the environment you request****
{'$SDTLBuilds': '',
'$SDTLCases': '',
'$SDTLStates': 'Passed',
'$SDTestResponsiblePersons': '',
'Build': 'R_Fzzz_v1, R_Fxxx_v1, R_Fyyy_v1',
'Group': 'team1, team2',
'SDChangeNote': '',
'SDType': 'Release',
'Versions': 'SS1 SS.1.5, SS2 SS4.26, SS1 SS_4.28, SS1 SS4.28'}
****use your parameter? I assumed you mean "variable" ****
the value of variable Versions is SS1 SS.1.5, SS2 SS4.26, SS1 SS_4.28, SS1 SS4.28
__________________
[code] Code tags[/code] are essential for python code!
Last edited by b49P23TIvg : September 17th, 2012 at 11:07 AM.
Reason: Added the point of the message
|
Developer Shed Advertisers and Affiliates
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Rate This Thread |
Linear Mode
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|