Human Readable XML

Extensible Markup Language (XML) defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It conforms to the standard produced by the W3C (World Wide Web Consortium). However, according to all known laws of computing a machine does not care how the XML documents are formatted; contrary to this humans do!
More often than not you may see XML documents that are well-formed (readable by a machine); but, unreadable to a human. Take the following example:
<breakfast_menu><food><name>Belgian Waffles</name><price>$5.95</price><description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
<calories>650</calories></food><food><name>Strawberry Belgian Waffles</name><price>$7.95</price><description>light Belgian waffles covered with strawberries and whipped cream</description><calories>900</calories></food><food><name>Berry-Berry Belgian Waffles</name><price>$8.95</price><description>light Belgian waffles covered with an assortment of fresh berries and whipped cream</description><calories>900</calories></food><food><name>French Toast</name><price>$4.50</price><description>thick slices made from our homemade sourdough bread</description><calories>600</calories>
</food><food><name>Homestyle Breakfast</name><price>$6.95</price><description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description><calories>950</calories></food></breakfast_menu>
And the same data formatted:
<breakfast_menu>
	<food>
		<name>Belgian Waffles</name>
		<price>$5.95</price>
		<description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
		<calories>650</calories>
	</food>
	<food>
		<name>Strawberry Belgian Waffles</name>
		<price>$7.95</price>
		<description>light Belgian waffles covered with strawberries and whipped cream</description>
		<calories>900</calories>
	</food>
	<food>
		<name>Berry-Berry Belgian Waffles</name>
		<price>$8.95</price>
		<description>light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
		<calories>900</calories>
	</food>
	<food>
		<name>French Toast</name>
		<price>$4.50</price>
		<description>thick slices made from our homemade sourdough bread</description>
		<calories>600</calories>
	</food>
	<food>
		<name>Homestyle Breakfast</name>
		<price>$6.95</price>
		<description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
		<calories>950</calories>
	</food>
</breakfast_menu>
The two are worlds apart as far a human readability goes! But, both are equally valid. In fact, the first example includes less characters; thus, the argument for removing formatting characters, such as tabs and carriage returns is a feasible one. For example, imagine if you are transmitting XML data over an expensive communication medium. When I say expensive I mean in both dollar cost and delivery speed. If each character costs 1$ and the number of messages sent per year is over a million, then you would save a few million dollars by removing non-essential characters. This means software vendors may intentionally use unformatted XML data; thus, decreasing the readability for the occasional human who wants to read it. This raises the question, can we easily format unformatted XML?


Introducing the xmllint program. The xmllint program parses one or more XML files, specified on the command line. It prints various types of output, depending upon the options selected. It is useful for detecting errors both in XML code and in the XML parser itself. It can also format XML using the –format option. An example of how to do this follows:
xmllint --format foo.xml
This example reads the content of the file ‘fool.xml’. It then formats the contents and outputs to the terminal. Brilliant! For more information about the xmllint program take a look >here
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s