UNIX Help
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Dev Shed ForumsOperating SystemsUNIX Help

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Dev Shed Forums Sponsor:
1200+ fellow developers rate and compare features of the top IDEs, like Visual Studio, Eclipse, RAD, Delphi and others, across 13 categories. Enjoy this FREE Download of the IDE User Satisfaction Study by Evans Data Corporation. Download Now!
  #1  
Old October 19th, 2004, 11:18 PM
jersun68 jersun68 is offline
Registered User
Dev Shed Newbie (0 - 499 posts)
 
Join Date: Oct 2004
Posts: 2 jersun68 User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Traversing XML file using AWK/SED

Traversing xml file

Purpose:
I need to get a set of strings of format (take the above example) e.g. 8281 1,2; the enumeration set is stored in a list separated by commas from the below XML file


XML file:
<value name="internalType">instance</value>
<value name="ct_class">com.wm.lang.schema.datatype.WmString</value>
<value name="name">Type8281</value>
<array name="parent-ancestors" type="record" depth="1">
<record javaclass="com.wm.util.Values">
<value name="xmlns">http://www.w3.org/1999/XMLSchema</value>
<value name="ncName">urSimpleType</value>
</record>
<record javaclass="com.wm.util.Values">
<value name="xmlns">http://www.w3.org/1999/XMLSchema</value>
<value name="ncName">string</value>
</record>
</array>
<record name="baseType" javaclass="com.wm.util.Values">
<value name="contentType">1</value>
<value name="internalType">instance</value>
<value name="ct_class">com.wm.lang.schema.datatype.WmString</value>
<value name="name">string</value>
<value name="whiteSpace">none</value>
</record>
<array name="enumeration" type="value" depth="1">
<value>1</value>
<value>2</value>
</array>
<value name="whiteSpace">none</value>
<value name="classname">com.wm.lang.schema.datatype.WmString</value>

My pseudocode for extracting the data:
1. search for last appearance of the pattern <value name="name">TypeABCD</value>, where ABCD is any numeric
2. cut till </value>, store the type name ABCD to the first field of a record in the result file
3. get next line and
4 a) if the pattern <value name="name">TypeABCD</value> comes up again, replace the new ABCD with the old one
4 b) if the pattern <array name="enumeration" comes up, store next line onwards until </array> line
5. store all the <value>EFGH</value> where EFGH is any code value of any length
6. cut <value> and </value>, cut new lines between each EFGH and append a comma in between each code, with no space in between
7. repeat step 1 until the end of the given file
8. the end result would be a file that contains the following:

8281 1,2
6167 1,2,3,4,5,6,7,8,9,10
5125 AAA,AAB,AAC,AAD,CAL,INF,INV,XXX,YYY
4043 AG,BG,BR,CN,DE,DI,JB,MF,OE,RS,RT,ST,WH,WS
8275 1,2,3,4,5,6,7,8
7233 34,35,36,37,38,39,40,41,42,43,44,45,60,61,62,63,66

Questions:
1/ is this awk capable?
2/ how do I concat

<value>1</value>
<value>2</value>
<value>3</value>

into 1,2,3; this includes getting rid of carriage returns (not Control-M characters).


Thanks in advance.

Reply With Quote
Reply

Viewing: Dev Shed ForumsOperating SystemsUNIX Help > Traversing XML file using AWK/SED


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 1 hosted by Hostway