Assignment 2

advertisement
Solving complex computational problems using GPUs – 236607 Winter 2013/2014
Assignment 2
In this assignment, you will design an efficient implementation of Conway’s Game of Life cellular
automaton (http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life).
Game of life
The game of life is not a game in the conventional sense, as there are no human players involved. The
only input to this “game” is an initial configuration, which evolves in a series of steps.
The game of life is based on a 2D grid of cells, each of which can be in one of two states: ALIVE or
DEAD. In each time step, the state of each cell is updated according to its state and the state of its 8
neighbors in the previous time step. The new state is determined by applying the following set of rules:
1. A LIVE cell with 2 or 3 live neighbors remains ALIVE (survival)
2. A DEAD cell with exactly 3 neighbors becomes ALIVE (reproduction)
3. In all other cases, the cell is DEAD (under-population or overcrowding)
The Game of Life fascinated many due to its self-evolving and self-organizing nature. You are
encouraged to read more about its origins and the patterns that it creates that survive through multiple
generations. By convention, the cell with coordinates (0,0) is located in the center of our window to the
“Game of Life universe”.
Implementation
The Game of Life was chosen for this assignment as it can be implemented using various techniques that
we have learned in class, and requires creative thinking.
The “universe” will be limited to a square window of size 1000x1000, such that any cell beyond this
window is considered to be always DEAD.
In this assignment, you need to write an application that receives the following parameters:
1. A file name for a file that describes the initial grid in Life 1.05 format
2. The window size
3. The number of iterations to evolve (0 iterations means original grid)
The application will compute the state of the grid after the given number of iterations. The intermediate
states do not need to appear in the output.
You will need to use at least 4 of the techniques in the following table:
register tiling
loop unrolling
concurrent CPU/GPU execution
using shared memory
sparse representation
scatter/gather
bitwise operations
atomic operations
inline PTX optimizations
intra-warp communication
using texture memory
multiple items per thread
You may not use other techniques instead of these without getting a confirmation from the TA.
Solving complex computational problems using GPUs – 236607 Winter 2013/2014
Tips:
-
Avoid grid out-of-bounds checks by adding margin cells (that are always DEAD) or by using texture
memory with clamp (return 0 when out of bounds)
Use double-buffering (write to output buffer and then swap pointers to read from there)
Your program will be called using the following syntax:
hw2 <input-file> <output-file> <num-iterations>
Functions for reading files in the Life 1.05 format and for writing the output files are provided, as well as
a simple host implementation. You can modify these functions. You need to implement all the other parts
of the application.
Testing
Your program will be compiled and run on csl-gpgpu3 (Linux). Calling make in the project folder should
produce an executable with the name hw2 in the same folder. The program will be tested on several
.lif files. Selected sample files are provided. The number of iterations will be chosen for expected
execution time of approximately 1 second, and the execution will be aborted after 30 seconds.
The output will be compared against a reference. Please make sure that the output matches the reference
in the provided samples. Printing to the screen takes time, so try to avoid it in the final version.
External resources
You may get implementation ideas from external sources. You may discuss the implementation with
your classmates. You may not copy any portions of your classmates’ code. You may not use libraries
that do not come with a standard C++ and CUDA toolkit installation. You may copy small snippets of
code that are unrelated to this assignment from online and external sources.
Report
You need to submit a report of your application design and performance. The report will describe the
design of your application. For each of the four techniques you used, explain how it can improve the
performance of the application, and compare the execution time using this technique and using a simpler
alternative. If the technique does not improve performance, explain why and find the conditions in which
it will be better. Use the file spaceship.lif to evaluate the performance gain of the methods, and use
it to measure the execution time as a function of the number of iterations (show graph). The time should
include the execution of the entire application. You may do the experiments on any compatible GPU
platform; specify the system specs in the report, and do all the experiments on the same platform.
Submission
Submission is in pairs. You need to submit a ZIP file named using format <ID1>_<ID2>.zip with the
following contents:
1. Report file named using format hw2_<ID1><ID2>.pdf
2. Source files and a makefile. Calling make should compile the code and produce the executable hw2.
3. A readme.txt file with your names, IDs, and emails
Solving complex computational problems using GPUs – 236607 Winter 2013/2014
Grading
20%
output correctness in automatic test
30%
design complexity and quality (techniques)
30%
general optimizations (coalescing, block size, etc.)
20%
report
Bonus:
+10% for the fastest application
+5% for the 2nd fastest application
+5% for using sparse representation
Comments
-
Frequently asked questions will be answered in the FAQ section of the course site
Please report (and fix ) bugs in the provided code
Please include “236607 HW2” in the subject of emails regarding this assignment.
Good luck!
Download