CSCI 3366 (Introduction to Parallel and Distributed Processing), Spring 2002:
Homework 3

Assigned:
February 19, 2002.

Due:
February 28, 2002, at 5pm.

Credit:
40 points.


Contents

Overview

The textbook discusses an algorithm called ``bucket sort'' for using multiple processes to sort a list of numbers. For this assignment, your mission will be to implement this algorithm to sort an array of integers and compare its performance with that of a sequential (one-process) sort. Because reading numbers in and printing them out is tedious, we will generate the numbers to sort using a random number generator, and rather than printing them out we will compare the results to the results of sorting using a sequential sort. We will time only the part of the program that performs the bucket sort, allowing reasonable comparisons.

Details

The starter program

For this assignment, I have written an MPI main program that accepts as a command-line argument a number $ N$ representing the size of the array to sort. The program generates an array of $ N$ integers, sorts it, and compares the results to a correct sort. It also times the sorting step and prints the results. The actual sorting is done in functions masterProcess() (running in process 0) and slaveProcess() (running in all other processes). As currently written, the slaveProcess() function does nothing, and all the work of sorting is done in masterProcess() -- i.e., the program is currently a sequential sort.

You can obtain code for the starter program by downloading the following files:

To compile the program, use the command
make bsort
Run the executable (named bsort) as you would run any other MPI program with a command-line argument; e.g., to run it with one process and sorting 1000 numbers, use this command:
mpirun -np 1 bsort 1000
The provided code also includes some examples of adding ``debug print'' statements that can be included/excluded at compile time (look up #ifdef in a C/C++ textbook to learn more about this). By default, these statements are included; to include them, compile as follows:
make DEBUG= bsort
You may find this feature useful for adding your own debug print statements; then you can easily turn them on (for debugging) and off (for collecting timing information) simply by varying the command you use to compile.

What you are to do, part 1: Coding

You are to replace the contents of masterProcess() and slaveProcess() with code to implement the bucket sort algorithm described in the textbook. You should not make changes in the main program; the main program as currently written tests the correctness of the sorting operation and so will serve as a demonstration that your code to perform the sort works properly. So, you will change only file masterslave.cpp, not bsort.cpp. Comments in file masterslave.cpp describe the parameters to the two functions you are to modify. Of course, you should feel free to add additional functions and/or #include statements. Be sure your program works for any number of processes (not just, for example, if the number of processes is a power of 2).

What you are to do, part 2: Running your program

Once you have your code producing correct results (as reported by the main program), run it for varying values of $ N$ and for varying numbers of processes, and compare execution times with execution times for the original program. Ideally execution time for your code using one process will be not much worse than execution time of the original program for the same value of $ N$, and will decrease as you increase the number of processes. Also, your code should perform better relative to the original code for larger values of $ N$. Experiment to see if this appears to be the case. Record at least half a dozen observations (timing results for sorting $ N$ numbers with $ P$ processes, compared to using the original program to sort $ N$ numbers).

Helpful hints

The bucket sort algorithm

See chapter 4 of the textbook for a discussion of the algorithm. Two versions are presented, one in which each process gets a copy of all $ N$ numbers and one in which each process gets only $ N/P$ of them. You can choose to implement either version; my experiments suggest that the second one is more work to implement but performs somewhat better.

One thing to watch out for in doing such comparisons: You should be running both programs (your parallel one and the original one) on similar machines. That is, if you are logged into one of the Janus machines, but you are running the parallel program using the Dwarf machines, execution times may not be comparable.

Some potentially useful STL functions

Some of you may be familiar with the C++ Standard Template Library and its functions. The provided code makes use of two of them that you also may find useful:

A good Web source of detailed information about this library is SGI's online Standard Template Library (STL) Programmer's Guide.

Cautionary comments and miscellaneous tips

What to turn in

Submit your revised version of source file masterslave.cpp, plus a text file containing your timing observations (as described in the ``Details'' section above), by e-mail as described in the Guidelines for Programming Assignments, using a subject header of ``cs3366 hw 3''. You do not need to submit file bsort.cpp (you didn't change it anyway, right?). Please submit the timing observations as a plain text file.



Berna Massingill
2002-02-19