Riaan Lehmkuhl's Blog

Subversion, Progamming, Tips & Tricks and whatever else springs to mind.
13Aug

Convert XML elements and attributes to lowerCaSe using awk

13 August 2010 21:50 by Riaan Lehmkuhl

Install xmlstarlet (download from http://xmlstar.sourceforge.net/ or on ubuntu sudo apt-get install xmlstarlet).

Let's look at a simple XML file (menu.xml):

<BREAKFAST_MENU>
<FOOD NAME="Belgian Waffles" PRICE="$5.95">
<DESCRIPTION>two of our famous Belgian Waffles with plenty of real maple syrup</DESCRIPTION>
<CALORIES>650</CALORIES>
</FOOD>
<FOOD NAME="Strawberry Belgian Waffles" PRICE="$7.95">
<DESCRIPTION>light Belgian waffles covered with strawberries and whipped cream</DESCRIPTION>
<CALORIES>900</CALORIES>
</FOOD>
</BREAKFAST_MENU>

Convert the XML to PYX:

xmlstarlet pyx menu.xml > menu.pyx

The new file will now look like this:

(BREAKFAST_MENU
-\n\t
(FOOD
ANAME Belgian Waffles
APRICE $5.95
-\n\t\t
(DESCRIPTION
-two of our famous Belgian Waffles with plenty of real maple syrup
)DESCRIPTION
-\n\t\t
(CALORIES
-650
)CALORIES
-\n\t
)FOOD
-\n\t
(FOOD
ANAME Strawberry Belgian Waffles
APRICE $7.95
-\n\t\t
(DESCRIPTION
-light Belgian waffles covered with strawberries and whipped cream
)DESCRIPTION
-\n\t\t
(CALORIES
-900
)CALORIES
-\n\t
)FOOD
-\n
)BREAKFAST_MENU

Now we need to convert the lines beginning with "(" and ")" to lower case, and the first word of the lines beginning with an "A" (except the first "A").

cat menu.pyx | awk '{if (/^\(/) print tolower($0); else if (/^\)/) print tolower($0); else if (/^A/) print substr($0, 1, 1) tolower(substr($0, 2, index($0, " ")-1)) substr($0, index($0, " ")+1); else print $0; }' > menu.tmp

We now have the converted PYX file and this can be convert back to XML

xmlstarlet p2x menu.tmp > menu.xml

That's it, we have the converted XML file:

<breakfast_menu>
<food name="Belgian Waffles" price="$5.95">
<description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories>
</food>
<food name="Strawberry Belgian Waffles" price="$7.95">
<description>light Belgian waffles covered with strawberries and whipped cream</description>
<calories>900</calories>
</food>
</breakfast_menu>

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
29Dec

Remove the thousand separator from CSV files using AWK

29 December 2006 21:26 by Riaan Lehmkuhl
This will make fields like "11,368.35" look like this "11368.35".

awk -F, -v quo='"' '/"[0-9,.]*"/ { startq=0;str="";for (i=1;i<=NF;i++) {if($i ~ quo) { startq=!startq;sub(quo,"",$i)} str=str""$i; if(!startq && (i<NF)) str=str","} print str;next} {print}' OLDFILE > NEWFILE

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Riaan Lehmkuhl


Me, a disorder of the brain that results in a disruption in a person's thinking, mood, and ability to relate to others.

Recent comments

Comment RSS

Thingies

Calendar And Month List

<<  September 2010  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910

View posts in large calendar

Disclaimer & Privacy

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Privacy:
We use third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

Most comments

Cool Quote

I know that you believe that you understood what you think I said, but I am not sure you realize that what you heard is not what I meant. - Robert McCloskey