Assignment #4

Assignment #4 Description

This assignment is going to have you write your first more advanced data structure, a balanced tree. Most of the work for this will be done on the C++ side of things. As long as your design for the web page drawer for the last assignment was fairly strong, this should not require significant changed on the Java side.

C++: On the C++ side you are supposed to write an AVL tree implementation that can work with any type (or red-black if you prefer). The difficulty in this really depends on the quality of your coding. The algorithm itself is not all that difficult, but the debugging time if you are sloppy can be incredibly long. This is one case where coming up with good test cases beforehand will be extremely important and could reduce your difficulties significantly later on.

The primary use of this tree in your project is to give you a fast method of finding sites that allows you to do successor and predecessor operations. The finding of a single site will be O(log n) which is slower than what you should have gotten for a good hash: O(1). However, the hash had no efficient method for finding the next site or the previous one in alphabetical order. Alphabetical ordering of URLs might not seem to have significant value, but it can be used to quickly find all of the files under a particular directory. It can also be used to find incomplete URLs so if you don't know if something ends with .html or .htm it won't matter. The hash has to be told exactly the right URL to work.

Extra credit: Building the AVL tree every time you run the application would be quite expensive. For that reason, the extra credit part of this assignment that you are supposed to do in C++ is to save your AVL tree to disk so that you can either traverse it on disk or quickly load it into memory for traversal. There are many ways to do this, but basically you will be using a static linking and will need the nodes of your tree to be nicely enumerated. Storing them in a pool in memory will give you the ability to quickly and efficiently move the structure between memory and disk. If your program does this make sure that I can't miss it when I go look at it. Having an echo in the makefile would do a decent job of telling me.

Java: On the Java side the main extension that I want is that I want you to be able to show me complete "sub-sites" in your graph drawer. This means that it should be able to take something smaller than the complete site and use the root of that as the new "main" page. In addition, it should be able to display more than just two "clicks" away from the main page at any one time so that for small but deep subsites the user can view the entire "site". The obvious connection to the C++ code is that user should be able to type in a URL (or partial URL) and get a "sub-site" of all the pages that are "below" that in the directory structure. You should then be able to view that part of the site in your graph.

Extra credit: Add a different type of interactivity to your display. I want the user to be able to "select" nodes in the graph. As extra credit for the last assignment they were given the ability to click on sites and get information on those. When a site has been clicked I'd like to to be highlighted in the drawing so the user can tell which one they have selected. In addition to that, give them a menu option on the frame so that they can view the HTML for the file they have selected. You'll have to reload that page since you aren't saving off the full text. The JEditorPane (or possibly its subtype JTextPane) can be used as a fast way for you to display something that is in HTML format. It won't have all the bells and whistles of a full browser, but it should give the user the ability to see the page.