Solving complex computational problems using GPUs – 236607 Winter 2013/2014 Assignment 2 In this assignment, you will design an efficient implementation of Conway’s Game of Life cellular automaton (http://en.wikipedia.org/wiki/Conway%27s_Game_of_Life). Game of life The game of life is not a game in the conventional sense, as there are no human players involved. The only input to this “game” is an initial configuration, which evolves in a series of steps. The game of life is based on a 2D grid of cells, each of which can be in one of two states: ALIVE or DEAD. In each time step, the state of each cell is updated according to its state and the state of its 8 neighbors in the previous time step. The new state is determined by applying the following set of rules: 1. A LIVE cell with 2 or 3 live neighbors remains ALIVE (survival) 2. A DEAD cell with exactly 3 neighbors becomes ALIVE (reproduction) 3. In all other cases, the cell is DEAD (under-population or overcrowding) The Game of Life fascinated many due to its self-evolving and self-organizing nature. You are encouraged to read more about its origins and the patterns that it creates that survive through multiple generations. By convention, the cell with coordinates (0,0) is located in the center of our window to the “Game of Life universe”. Implementation The Game of Life was chosen for this assignment as it can be implemented using various techniques that we have learned in class, and requires creative thinking. The “universe” will be limited to a square window of size 1000x1000, such that any cell beyond this window is considered to be always DEAD. In this assignment, you need to write an application that receives the following parameters: 1. A file name for a file that describes the initial grid in Life 1.05 format 2. The window size 3. The number of iterations to evolve (0 iterations means original grid) The application will compute the state of the grid after the given number of iterations. The intermediate states do not need to appear in the output. You will need to use at least 4 of the techniques in the following table: register tiling loop unrolling concurrent CPU/GPU execution using shared memory sparse representation scatter/gather bitwise operations atomic operations inline PTX optimizations intra-warp communication using texture memory multiple items per thread You may not use other techniques instead of these without getting a confirmation from the TA. Solving complex computational problems using GPUs – 236607 Winter 2013/2014 Tips: - Avoid grid out-of-bounds checks by adding margin cells (that are always DEAD) or by using texture memory with clamp (return 0 when out of bounds) Use double-buffering (write to output buffer and then swap pointers to read from there) Your program will be called using the following syntax: hw2 <input-file> <output-file> <num-iterations> Functions for reading files in the Life 1.05 format and for writing the output files are provided, as well as a simple host implementation. You can modify these functions. You need to implement all the other parts of the application. Testing Your program will be compiled and run on csl-gpgpu3 (Linux). Calling make in the project folder should produce an executable with the name hw2 in the same folder. The program will be tested on several .lif files. Selected sample files are provided. The number of iterations will be chosen for expected execution time of approximately 1 second, and the execution will be aborted after 30 seconds. The output will be compared against a reference. Please make sure that the output matches the reference in the provided samples. Printing to the screen takes time, so try to avoid it in the final version. External resources You may get implementation ideas from external sources. You may discuss the implementation with your classmates. You may not copy any portions of your classmates’ code. You may not use libraries that do not come with a standard C++ and CUDA toolkit installation. You may copy small snippets of code that are unrelated to this assignment from online and external sources. Report You need to submit a report of your application design and performance. The report will describe the design of your application. For each of the four techniques you used, explain how it can improve the performance of the application, and compare the execution time using this technique and using a simpler alternative. If the technique does not improve performance, explain why and find the conditions in which it will be better. Use the file spaceship.lif to evaluate the performance gain of the methods, and use it to measure the execution time as a function of the number of iterations (show graph). The time should include the execution of the entire application. You may do the experiments on any compatible GPU platform; specify the system specs in the report, and do all the experiments on the same platform. Submission Submission is in pairs. You need to submit a ZIP file named using format <ID1>_<ID2>.zip with the following contents: 1. Report file named using format hw2_<ID1><ID2>.pdf 2. Source files and a makefile. Calling make should compile the code and produce the executable hw2. 3. A readme.txt file with your names, IDs, and emails Solving complex computational problems using GPUs – 236607 Winter 2013/2014 Grading 20% output correctness in automatic test 30% design complexity and quality (techniques) 30% general optimizations (coalescing, block size, etc.) 20% report Bonus: +10% for the fastest application +5% for the 2nd fastest application +5% for using sparse representation Comments - Frequently asked questions will be answered in the FAQ section of the course site Please report (and fix ) bugs in the provided code Please include “236607 HW2” in the subject of emails regarding this assignment. Good luck!