|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
1200+ fellow developers rate and compare features of the top IDEs, like Visual Studio, Eclipse, RAD, Delphi and others, across 13 categories. Enjoy this FREE Download of the IDE User Satisfaction Study by Evans Data Corporation. Download Now!
|
|
#1
|
|||
|
|||
|
Traversing XML file using AWK/SED
Traversing xml file
Purpose: I need to get a set of strings of format (take the above example) e.g. 8281 1,2; the enumeration set is stored in a list separated by commas from the below XML file XML file: <value name="internalType">instance</value> <value name="ct_class">com.wm.lang.schema.datatype.WmString</value> <value name="name">Type8281</value> <array name="parent-ancestors" type="record" depth="1"> <record javaclass="com.wm.util.Values"> <value name="xmlns">http://www.w3.org/1999/XMLSchema</value> <value name="ncName">urSimpleType</value> </record> <record javaclass="com.wm.util.Values"> <value name="xmlns">http://www.w3.org/1999/XMLSchema</value> <value name="ncName">string</value> </record> </array> <record name="baseType" javaclass="com.wm.util.Values"> <value name="contentType">1</value> <value name="internalType">instance</value> <value name="ct_class">com.wm.lang.schema.datatype.WmString</value> <value name="name">string</value> <value name="whiteSpace">none</value> </record> <array name="enumeration" type="value" depth="1"> <value>1</value> <value>2</value> </array> <value name="whiteSpace">none</value> <value name="classname">com.wm.lang.schema.datatype.WmString</value> My pseudocode for extracting the data: 1. search for last appearance of the pattern <value name="name">TypeABCD</value>, where ABCD is any numeric 2. cut till </value>, store the type name ABCD to the first field of a record in the result file 3. get next line and 4 a) if the pattern <value name="name">TypeABCD</value> comes up again, replace the new ABCD with the old one 4 b) if the pattern <array name="enumeration" comes up, store next line onwards until </array> line 5. store all the <value>EFGH</value> where EFGH is any code value of any length 6. cut <value> and </value>, cut new lines between each EFGH and append a comma in between each code, with no space in between 7. repeat step 1 until the end of the given file 8. the end result would be a file that contains the following: 8281 1,2 6167 1,2,3,4,5,6,7,8,9,10 5125 AAA,AAB,AAC,AAD,CAL,INF,INV,XXX,YYY 4043 AG,BG,BR,CN,DE,DI,JB,MF,OE,RS,RT,ST,WH,WS 8275 1,2,3,4,5,6,7,8 7233 34,35,36,37,38,39,40,41,42,43,44,45,60,61,62,63,66 Questions: 1/ is this awk capable? 2/ how do I concat <value>1</value> <value>2</value> <value>3</value> into 1,2,3; this includes getting rid of carriage returns (not Control-M characters). Thanks in advance. |
![]() |
| Viewing: Dev Shed Forums > Operating Systems > UNIX Help > Traversing XML file using AWK/SED |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|