CSCI 3366 (Introduction to Parallel and Distributed Processing), Spring 2002:
Homework 4

Assigned:
April 9, 2002.

Due:
April 18, 2002, at 5pm. Not accepted later than April 19 at noon.

Credit:
20 points.


Contents

Overview

The variance of a set of $ N$ numbers $ a_0, a_1, \ldots, a_{N-1}$ is defined to be the sum

$\displaystyle (a_0 - \mathit{avg})^2 +
(a_1 - \mathit{avg})^2 + \cdots +
(a_{N-1} - \mathit{avg})^2
$

where $ \mathit{avg}$ is the average of $ a_0, a_1, \ldots, a_{N-1}$.1

For this assignment, you are to write a multi-threaded program to compute the variance of a set of numbers. The textbook discusses using multiple threads to speed up the calculation of the sum of a set of numbers; it is not difficult to extrapolate from this discussion to an approach for using multiple threads to speed up calculating the variance of a set of numbers, if you break down the calculation into two steps:

  1. Compute the average of the input numbers, using multiple threads to speed up the required calculation of the sum of the numbers.
  2. Compute the variance as defined above, using multiple threads to speed up the calculation.

Details

Program input and output

Your program should take two command-line arguments:

  1. The number of threads to use (call this $ P$). This argument is required.

  2. The number of input numbers to generate (call this $ N$). This argument is optional.

It should process these arguments as follows.

In either case, once it has read or generated its input numbers, the program should compute their variance and print the following output:

Choice of programming language/library

You can write your program either (i) in C++ with the POSIX threads library functions or (ii) in Java using Java's built-in support for multi-threading. Whichever language you use, be sure your program compiles and executes correctly on the department's Linux machines.

A starter program

So that you do not have to write the tedious and non-parallel parts of this program, I am providing a sequential program that performs the required calculations.

Running the program

Once you have confirmed that your program is operating correctly (for small numbers of inputs), try running it for a large number of generated inputs and varying values of $ P$ (number of threads). Record at least half a dozen observations (different combinations of $ N$ and $ P$) to see how running time varies with these two variables. Also record which machine you performed these experiments on. You may find it interesting to see whether multi-threading can help even if you have more threads than processors. FYI, machines known to have multiple processors include SnowWhite.CS.Trinity.Edu (4 processors) and the Dwarf$ n$.CS.Trinity.Edu machines (2 processors each).

Helpful hints, etc.

Sample programs to look at

You may find it useful to look at some of sample programs, in particular the various programs to compute the sum of $ N$ numbers.

Compiling programs that use POSIX threads

Note that when you compile (and link) a program that uses the POSIX threads library, you need the flag -pthread. The sample programs page has an example of a Makefile that takes care of this.

Cautionary comments

$ P$ (the number of threads) might not evenly divide $ N$. Your code should be prepared to cope with this. At the very least, it should print an error message and stop.

What to turn in

Submit your completed program (variance.cpp or Variance.java), plus the text file containing your timing measurements, as described in the Guidelines for Programming Assignments, using a subject header of ``cs3366 hw 4''.



Footnotes

....1
A student has pointed out that a more standard definition of variance is this sum divided by $ N$. For this assignment, use the incorrect definition, since this was pointed out after some students had submitted solutions.


Berna Massingill
2002-04-19