Friday, February 03, 2006

XML

I mentioned XML in a recent message sent to the Second Site News mailing list, and that generated some questions from subscribers, including "What is it?" and "Why would I use it?" It is beyond the scope of a blog entry to explain XML in great detail, but I can provide you with a brief description and some reasons why XML might become more important to you in the future.

XML stands for "eXtensible Markup Language". XML is a text processing standard, a set of rules and other criteria that define how applications store and process data stored in text files. XML is very useful for exchanging data between applications because the standard is fairly simple, and very precise, and that reduces the cost of implementing the importing and exporting components of a program.

XML is eXtensible because users can define the data elements they need for their application. Data can be exchanged between applications when multiple developers agree on a set of data elements.

Here's an example of some text data, and the same data in XML. I've chosen some data elements that represent the typical fields in a standard U.S. address.

Text XML
John Doe, 15 Main Street, Anytown, MA 99999
<name>John Doe</name>
<address>
    <street-address>15 Main Street<street-address>
    <city>Anytown</city>
    <state>MA</state>
    <zipcode>99999</zipcode>
</address>

If you know HTML, you may notice that HTML looks similar to XML. In fact, there is a version of HTML that is XML. It's called XHTML. HTML and XML are similar because they are both based on an older text standard.

There are many uses for text documents, and XML is now the format of choice for many of those uses. For example, many applications write user preferences to XML files.

XML is a popular standard, and so there are many tools available for developers and for end-users. One tool is XSLT, a tool for converting XML documents to some other format. Second Site uses XSLT to convert TMG data to web pages. Second Site converts the TMG data to XML, and then passes the XML to an XSLT stylesheet, which converts the data to HTML. By leveraging XML and XSLT, I saved time and effort implementing Second Site, and the users benefit, too. The transformation facility in Second Site is more powerful and flexible than it would have been if I had to develop it all myself, and users can learn about XML and XSLT on the web and in books. Anything they learn about XML or XSLT from customizing Second Site will transfer to similar projects with other tools that use XML and XSLT.

You don't have to know anything about XML in order to benefit from it. Software developers understand the value of using XML to exchange data, among other things, and so they are building XML capabilities into their applications. As time goes on, you'll see more and more applications using XML.

XML is not ideal for all text applications. One downside to XML is that it is verbose. There are 43 characters in the text-only version of the address data above. There are 147 in the XML version if you don't count the indentation and line breaks. (They aren't necessary in XML; I added them to make the structure clearer.) The overhead varies dramatically based on how much structure the data requires.

There has been a lot of talk in genealogical circles about XML replacing GEDCOM. As of February, 2005, there is little or no support for XML in any of the widely-used applications. That's unfortunate, as GEDCOM needs to be replaced.

I am adding an XML Export feature to TMG Utility so that users can export a subset of their TMG project data to XML. I am including some stylesheets so that data can be converted to web pages. The XSLT Stylesheets I provide will also serve as examples for people who want to write their own... more on that later.

In the initial release, the XML Export facility will support these data items:

  • DNA Log
  • Exhibit Log
  • Master Source List

If you have ever wanted to create a custom report for one of the data items above, but you were unable to get the results you want using one of TMG's built-in reports, then you might want to consider using TMG Utility's XML Export feature. XSLT Stylesheets are not for the technically naive, but they are easier than writing a database application to produce a report. If you can't write an XSLT Stylesheet to produce the report you want, maybe a cousin, friend, or power-user can.

No comments: