CSCI 3366 (Parallel and Distributed Processing), Fall 2017:
Homework 2

Credit:
55 points.

Reading

Be sure you have read, or at least skimmed, readings from the relevant updated appendices.

Honor Code Statement

Please include with each part of the assignment the Honor Code pledge or just the word ``pledged'', plus one or more of the following about collaboration and help (as many as apply).1Text in italics is explanatory or something for you to fill in. For written assignments, it should go right after your name and the assignment number; for programming assignments, it should go in comments at the start of your program(s).

Overview

Your mission for this assignment is to improve the programs you wrote for Homework 1, to write versions in Java and OpenCL, and to measure their performance and accuracy more systematically.

Details

Thread-safe RNG

(5 points)

Your first step will be to write a thread-safe random number generator, i.e., one that can be called from multiple threads concurrently without ill effects. To keep this part manageable, I suggest that you just use the technique mentioned in class, LCG (Linear Congruential Generator). The Wikipedia article has a pretty good discussion, but briefly:

This algorithm generates a pseudorandom sequence $ x_0, x_1, x_2, \ldots$ from a seed $ S$ , constants $ a$ , $ c$ , and $ M$ , and a simple recurrence relation:

$\displaystyle x_0$ $\displaystyle =$ $\displaystyle S$  
$\displaystyle x_{i}$ $\displaystyle =$ $\displaystyle (a x_{i-1} + c) \mod M, \mathrm{for} \; i > 0$  

The Wikipedia article gives values used by many library implementation of this algorithm; to me the most attractive choice is the one cited for two POSIX functions, namely $ a = 25214903917$ , $ c = 11$ , and $ M = 2^{48}$ . (This seems attractive because -- if I understand the discussion correctly -- it will generate long sequences without duplicates (which we want), and values will be within the range of a 64-bit signed data types, which is available as int64_t in standard C and Long in Java.) Also, the mod part of the calculation is easily done by using bitwise and with $ 2^{48}-1$ .) (Note that you will need to #include stdint.h to use int64_t.)

(If for some reason you want to try a different algorithm, check with me first -- there may well be better choices, but there are probably worse choices too.)

You will need two implementations of whatever algorithm you choose, one in C and one in Java. Exactly how you package the algorithm is somewhat up to you, but you want functions analogous to srand() and rand(), and there needs to be some way to deal with the ``state'' of the sequence being generated (the current or next $ x_i$ ) in a way that makes it possible for each thread to have its own state (rather than there being one hidden global state, as with srand() and rand()).

For C, what I think makes sense is to represent the saved state as a int64_t and define two functions that take a pointer to a state as a parameter:

You'll also want to define a constant, with something such as the following:
const int64_t RANDMAX = (1LL << 48) - 1;
(Notice that this is $ M-1$ .)

For Java, you'll probably want to define a class analogous to java.util.Random, but much simpler, with just a RANDMAX constant, a constructor, and a next method.

Revised sequential programs

The next step is to replace the current code for generating random numbers in two starter programs, one in C and one in Java, with your RNG code:

Code

(5 points)

Replace the current code for generating random numbers in the two starter sequential programs with calls to your RNG. (If you didn't already test your RNG code, you might temporarily put in some debug-print statements to be sure it's generating reasonable output.) The two programs (C and Java) should now produce the same output (except for execution time).

Results (accuracy)

(5 points)

(You only need to do this for one of your sequential programs, since they should give the same results.) Experiment until you find a seed that seems to give reasonable results, and then measure the relationship between accuracy (difference between the computed value of $ \pi$ and the constant as defined in the math library) and number of samples: Generate output for at least six different values of ``number of samples'' (I recommend starting with a medium-size number and then repeatedly doubling it, rather than increasing by a fixed amount). Plot the results, by hand or with whatever program you like. (I use gnuplot. Short introduction/example below.) You can repeat this for more than one seed and plot all sets of results if you like.

Parallel programs

Code

(30 points)

Your mission for this step is to produce parallel programs for our four programming environments: C with OpenMP, C with MPI, Java, and C with OpenCL.

So to recap, command-line arguments should be as follows:

As we noted in class, having all UEs (processes or threads) generate points using the same RNG and seed is not useful. You have two options for dealing with this:

Hints for using leapfrogging:

Results (accuracy)

(5 points)

(UPDATED for OpenCL) (You only need to do this for one of your parallel programs, since they should give the same results for the same number of units of execution, where ``units of execution'' is threads for OpenMP and Java, processes for MPI, and work items for OpenCL.) Experiment until you find a seed and number of samples that seem to give good results, and then measure the relationship between accuracy (difference between the computed value of $ \pi$ and the constant as defined in the math library) and number of UEs. Generate output for at least six different values of ``number of UEs'' (I recommend powers of two, starting with one). (Since for OpenCL the number of work items has to be a multiple of the minimum work-group size, it might be interesting to make a second plot showing that minimum value and then several multiples of it.) Plot the results, again by hand or with whatever program you like.

Results (performance)

(5 points)

For the values of seed and number of samples you used above, measure execution times for both sequential programs and all three parallel programs. For the parallel programs, measure execution times using different numbers of UEs (start with one and double until you notice that execution time is no longer decreasing). I strongly encourage you to do this on the machines that to me seem most suitable in terms of being able to ``scale up'' to interesting numbers of UEs: For OpenMP and Java, that would be Dione, for MPI, the Pandora cluster, and for OpenCL, Deimos or one of the Atlas machines. You should do each measurement more than once; if you get wildly different results it probably means you are competing with other work on the machine and should try again another time or using another machine or machines.

Plot the results, again by hand or with whatever program you like:

Hints and tips

What to turn in and how

Turn in the following:

Submit your program source code by sending mail to bmassing@cs.trinity.edu. Send program source as attachments. You can turn in your plots and input data as hardcopy or by e-mail; I have a slight preference for e-mail and a definite preference for something easily readable on one of our Linux machines -- so, PDF or PNG or the like (in the past I think some students have sent me Excel spreadsheets, which -- I'd rather you didn't). Please use a subject line that mentions the course number and the assignment (e.g., ``csci 3366 homework 2'').

A very little about gnuplot

I talked about the plotting tool gnuplot in class one day (9/25). Here are files for a simple example along the lines of what you need to do for this assignment (plot parallel times as a function of UEs, also showing sequential time):

With all these files in a directory, the command gnuplot < par.plotin will generate a file par-times.png with the plot.



Footnotes

... apply).1
Credit where credit is due: I based the wording of this list on a posting to a SIGCSE mailing list. SIGCSE is the ACM's Special Interest Group on CS Education.


Berna Massingill
2017-10-19