XML Information Page



This page is intended to give you some assistance with using XML in your project and give you information to help you decide if that is the direction you want to go.

What is XML?

XML stands for eXtensible Markup Language is an extensible text format for encoding information. The fact that it is extensible matters because that allows it to be used with many different types of data. When you first see XML it will probably look somewhat like HTML to you. The structure is similar. However, XML has no predefined elements and is more rigid about the structure. XML documents can be validated against certain rule sets that specify what elements and and attributes can appear in different relationships to one another.

An XML document looks something like this:

<?xml version="1.0">
<!DOCTYPE Student SYSTEM "StudentFormat.dtd">

<Student name="Mark Lewis" institution="A Place of Higher Learning">

<Schedule>

<Class name="Effective Programming" department="CSCI" number="4311" />
<Class name="Program Analysis" department="CSCI" number="4323" />
<Class name="Compilers" department="CSCI" number="4324" />
<Class name="Simulation Theory" department="CSCI" number="3321" />
<Class name="Planetary Dynamics" department="PHYS" number="4310" />

</Schedule>
<Address>

Department of Computer Science
One Trinity Place
San Antonio, TX 78212-7200

</Address>

</Student>

The top two lines are header information for this file. Below that we find the data. Every element has an end. The "Class" elements might not appear to, but they use a shorthand notation where the "/" is placed at the end. This can be done for any element that doesn't contain anything. Each element can also have one or more attributes associated with it. The value of the attribute in XML must be enclosed in quotation marks. The DOCTYPE in the header specifies that the root node is Student and that the format of this file is specified by the Data Type Definition (DTD) "StudentFormat.dtd". The DTD for that might look like this.

<!ELEMENT Student (Schedule,Address)>
<!ELEMENT Schedule (Class*)>
<!ELEMENT Address (#PCDATA)>

There are lots of books that you could buy that go into great detail about XML and many of the technologies associated with it. You don't need them for this class at all, even if you use XML. If you want to be able to understand an XML document you might consider getting "XML Pocket Reference" by Eckstein published by O'Reilly. It's small and inexpensive, but even that don't be required. The reason it isn't required is that you are going to be using Java and C++ to both read in and write out XML documents and there are certainly web references you can use to help you along.

For the purposes of this class what matters is that XML is a text format way for you to represent data in a nice tree type structure. The power lies in having parsers that know the format so you don't have to reinvent the wheel every time you want a new type of file. Being extensible means that it can handle virtually any type of data you want to throw at it. Granted, it might not be the most efficient way of handling it, but it will be able to handle it.

Why use XML for the project?

In many ways, the simplest way to jump into this project at the beginning will be to write separate Java programs and C++ programs and have the Java programs write out text files that the C++ programs read in. Early on, the messages that you are passing between the two will be quite simple and this won't seem like such a difficult task. However, as time passes and the messages get bigger and more complex, you are likely to find that you are spending a lot of your time redoing text writing and parsing routines to handle the new types of data. Using XML means that you will work at a higher level in your code. You won't be working at the file writing level, instead you will be working at the level of a "Document" for XML and the tree that is associated with it. Early on this will present a steeper learning curve, but before the end of the semester you should find it making your life easier.

Both the general text files and the XML files have the advantage of being very portable because text moves nicely between systems. They have the disadvantage of being somewhat slower because all the communication is basically done across disk which has some significant drawbacks for speed. In addition to the above advantages of XML over plain text, there is also a conceptual advantage of using a standard format for representing all of your data. Using standards typically makes your code more appealing to others, and in this case could make it more flexible. If you wanted to hook your code into another application later down the line, that other application wouldn't have to know the details of some text format that you dreamed up, it could

How to use XML in the project:

I'm not going to go into much detail on how to write XML code here because to some extent, that will depend on the exxact parser that you decide to use. I strongly recommend that you use the Xerces parser (link below). It is what I have been using and it is the only one I can guarenteed will be in place when I try to run your code. If you go to their web site, they have pages on how to program both Xerces Java and Xerces C++. You can read in XML documents in many ways. I wouldn't bother reading up on SAX because what you will want to do will require DOM for the purposes of this project. The slightly harder aspect to find is how you will write a document out to file. You can use the new Load/Save parts of the W3C Level 3 API or use the extra classes that xercers provides for you in org.apache.xml.serialize when you are in Java. On the C++ side they provide a DOMWriter class that you can use for the same purpose. You can find these in the APIs that they provide.

The last trick with using XML is making sure that your compilers can find and use the Xerces libraries. Doing this is slightly different in C++ and Java, but neither is particularly hard to do (especially if someone points the way).

XML References:

Xerces site (This is the main site you will probably be using as it has the APIs for the parser you'll be using.)

xml.com (by O'Reilly)

Java and XML at OnJava