Assignment #5 Description


Now that you have the ability to read from the web and hopefully keep track of how many substrings appeared on different sites, we can actually go through the task of calculating what you might want to charge different companies for the substrings they have on their web sites. Before you do that however, you will want to write some code that should boost efficiency so that you won't be waiting as long for it to complete that tasks.

Objective 1 - For this I want you to write two different trees. The first is a sorted binary tree (SSBinTree). The loading on this one is going to be a bit interesting because we haven't learned how to dynamically balance sorted binary trees yet. If you read the data in in order, you get what is basically a singly linked list which you have already had. I'll be sending out code that uses a random number generating function to load in the file in random order. It won't work with your code automatically. Instead, you will have to edit it to fit the interfaces you have on your objects. If you want to you can made this a template class that takes a comparison functor. In that case it will also be able to help you keep track of the number of counts if you want to use it for that.

Objective 2 - Now write a 26 way tree (SS26Tree). This tree will be designed so that each level of the tree has 26 children and which child a substring goes into is determined by the letters in it. The first letter determines the first branch, the second letter determines the second branch, etc. For a SubStr4 this tree will have 4 levels and you can find any substring with only 4 searches. In this sense it is O(1). Note that there are some differences between this tree and the one from objective 1. In the binary tree data members are stored in every node. In this tree they are stored only in the leaves.


Submission executable - For your executable you are going to go back to assignment #3 and do some speed testing with three different container types. I want you to pick a page that is reasonably large and run versions of assignment #3 on it. If you want, you can alter it so that it takes a URL directly like assignment #4 did, but don't do a whole spider, just get the one page and do counts of the substrings in it. You will do this with SSLinkList, SSBinTree, and SS26Trees as the containers for all of the substrings. To get information on how long it takes to run these you can use the "time" command in Linux. Just type time followed by a space and what you would normally execute. It will report back how long it took to execute that command.

Extension - What you did above times a lot more than just the searching for substrings. You can fix this by writing your own timing code. Use the man pages to look up the clock command and use it so that you only time the part of your code where you are counting substrings.


Once again I would like the written design handed in to me. The code need only be mailed on the date it is due, but this time I would also like you to return both the original design and the revised design. It would be nice if you can make in the revised design where you made alterations. This can be done by putting in any character string that stands out. Making sure it isn't common also allows you to do a search for it before the next assignment and remove it easily. I would recommend something like "!ALTERED!" so that it is also clear to me what it means. Of course, this has to be in a comment if it is parts of your header file. Also make sure you e-mail me the ChargerOutput.txt file with your submission.

This time also give me a written documents telling me how long it took to execute your program using the different containers. You might try a few sites. If you do the extension give me information on both the whole programs and just the searching parts.