Assignment #8


It's time now to have you do something that you might actually find interesting. It will use XML as much as possible as well. As described in the last assignment, there will be 5 paths that you might take. These paths overlap in fairly significant ways in certain situations. For example, there are only 3 main options for this assignment because two of the options each work for two of the major project lines.

No matter which of the options you pick, I want you to also put your XML functions inside of a structure called XML. Because you are putting them in this structure, you can also simplify the names a bit. So the functions should be called read, write, prune, and alter. Calling them will use a syntax like XML.read "file.xml".

Graphs (for Data Structures and Text Game) - This option has you reading in an XML file that specifies a graph data structure. Graphs are the most general form of linked data structure. They consist of a set of vertices and a set of edges. Each edge links one vertex to another. You can draw graphs as dots connected with lines. Graphs can be directed or not. In a directed graph, often called a digraph, the edges go from one edge to another, but not back and are drawn with arrows. So we say they have a direction. For this assignment you will write code that reads in a graph that is stored in an XML file and do some basic operations with it.

First let's start by talking about the format of the graph in the XML file. The graph will be stored in an XML format like this file. If your parser does not handle the DOCTYPE element that is fine. If you want to learrn more about XML though the DTD for the graph file is here. The entire graph is inside a graph element which contains multiple vertex elements. Each vertex element has an attribute that gives the name of the element. The vertex can also contain zero or more elements of type linkto. The linkto element has two attributes: name and vertex. Name is the name for that link in that vertex and vertex is the name of the vertex it links to. In a proper graph file, all the vertex names are unique and all the names of the linktos in a given vertex are also unique.

For those who are doing the data structures option, the format described here won't be modified much, but later you will do a lot more processing on it. For people doing the text game, you will extend the definition to fit the type of information you want in the "rooms" of your game. The linkto names will likely be things like "north" and "south" for directions the player can move. In addition, you might add another element called description to each vertex that contains text for the description of that room, or simply have it so that any text occuring in the vertex before the links is the description. Depending on how far you decide to take your game, you might also have elements for items the player could pick up or other things the player has to interact with. Hopefully you can see how all of these things could be easily specified in XML.

Of course, reading in the XML file isn't exactly a tough problem since that was the last assignment. In fact, you could choose to keep your graph in the XML domNode structure while processing it though that won't be optimal for all things. You might find it is better to have some other representation and write functions that convert from domNodes to your other format and back.

The work you have to do for this assignment is fairly simple. You will write a function that reads in a graph from a file and returns whatever format you want for it. You will also write a function that writes whatever format you are using back out to file. In addition, you will write some very basic graph algorithms to check some things about the graph. First you will write a function that checks if the graph has any cycles in it and returns true or false. Second you will write a function to tell if you can get from one vertex to another. These four functions should look like the following.

read fileName - returns a "graph" (possibly just a domNode or list of them)

write fileName graph - returns unit

hasCycles graph - returns boolean

reachableFrom graph startVertexName endVertexName - returns boolean

These functions and anything else you need for them should be put in a structure called Graph. Use a signature to hide any helper functions that you might write to support these functions.

For extra credit, write a method shortestPath startVertexName endVertexName that goes beyond what reachableFrom does and tells you the minimum number of steps you have to take to get from the start to the end.

Octree (for Ray Tracing and Physics) - In this option you will build a basic spatial data structure called an octree. Octrees are used to storing things that are located in three dimensional space. There are a number of ways to build octrees. I will have you build what is called a region octree. This type of tree is a bit different from what you are used to because data is found only in the leaves and the higher level structure is only used to break the space up into different regions.

The idea of a region octree is that you start with a particular region of space, a cube, and repeatedly divide it into eight subregions of equal volume by cutting in the middle in x, y, and z. Basically, this takes a cube and cuts it into eight equal octants. In a standard implementation, each node might store its center location, (x,y,z), and the size across an edge (only one size if needed if the tree is a regular cube). Each node either have eight children, or, if it is a leaf, it can have the information for the things that it stores. Typically that thing is a point or some other geometry in 3-D space. For this assignment assignment, you will be working with just points and only allowing one point per leaf node.

So the main function that you have to write is an add function. Add in this form of octree works much like it does in a binary search tree. You walk down the tree until you get to a leaf. In this case though, that leaf has to be split so it isn't a leaf and it has 8 leaf children that have no point in them. You then add both the new point and the point that had been in the current node to the proper children. It is possible they will both go in the same child, but that simply invokes the same code to split again. Obviously this causes a problem if you put in two points at the exact same location, but we won't be doing that.

You also need a function that creates an empty octree root. This function will take a three tuple of reals for the center point and another real for the size. With these two functions you can build a region octree of points. Here is what these two functions should look like.

create (x,y,z) size - returns a single leaf node at the given position with the given size with no contents

add tree (x,y,z) - returns a new tree with the given point added

These methods and everything else you write for this should go in a structure called Octree. You should use a signature to hide any methods I don't sepcifically ask you to write.

XML comes in with this option because you will be reading the point data from an XML file. A sample file is given here. The format of this document type has an octree element as the root element with attributes that give the center and size of the root node of the tree. Inside the octree element will be many point elements, each of which has values for x, y, and z. The DTD for this is given here.

There are three other functions you will provide in addition to create and add. The first is a simple read method that takes a file name and reads in the XML octree file and returns an octree with all the points from the file added into it. The second and third are methods that use the power of the octree. The reason for using octrees is that they allow you to search for objects in space in O(log n) time by not going down paths that are far away from what you are searching for. Given this nature, your other two functions are, nearestPoint and pointsInSphere. The nearestPoint function takes a point location and returns to you the x, y, z of the point in the tree that is nearest to what was passed in. The pointsInShere function takes a point and a radius, and returns a list of all the points that are inside of a sphere centered at a given position with a given radius.

These three methods have the following format and return types. They should also be in the Octree structure.

read fileName - returns an octree of the proper size with all the points added in

nearestPoint tree (x,y,z) - returns a 3-tuple of the point nearest to what is passed in

pointsInSphere tree (x,y,z) rad - returns a list of 3-tuples with all the points that are inside the given sphere

For extra credit, write a function closestPair that takes a tree and returns a tuple with the two closest points in that tree.

Validating XML - This is for the advanced XML option. To do this, you will be adding to the XML parser that you wrote for assignment #7 so that it is a validating parser. You only have to validate with DTDs though, not Xschema. The idea here is that the XML file tells you the Data Type Definition (DTD) file that it uses and you will also read in and parse that file. You will then tell if the XML file follows the format of the DTD and if it doesn't, you should print a message giving some information about where the first problem is.

To do this you must obviously add to your parser so that it can recognize the DOCTYPE tag at the top of the XML file that specifies where the DTD can be found. This tag has the format <!DOCTYPE Name SYSTEM "file.dtd">. There are other formats for this, but you don't have to support them. You will need to read in the DTD file and compare what it says to what the XML file contains.

Note that the DTD tells what elements can appear inside of another element and the number of those elements. Your code will have to check that these rules are followed in the actual document. You can test your code on the XML files and DTDs that are provided for the other two options. You can manually alter the files so that they don't match the DTDs and see if your code catches the problems.

For extra credit, make it so you also support the DOCTYPE format where the DTD is given in the XML file. That looks like this <!DOCTYPE Name [ ... DTD stuff ...]>.