Laboratory Nine – cache and performance improvement – 23 August 2002 – y k choi Objective There are two purposes of this laboratory: to study more about the caching effect and use profiling to measure and to improve a programs’ performance. The exercise is extracted form exercise 4, CTE. Note that for Pentium III: Cache Structure: 32 Kb split cache - 16 Kb (16384 bytes) data, 16 Kb instructions 4-way set associative 512 lines 32 bytes per line (per cache line) Procedure 1 – Caching Activity 1 – 20 minutes You should run three times and take the third result, as it is more accurate. This program uses a square block to maximize the performance. #include <stdio.h> #include <stdlib.h> #include <memory.h> #include <minmax.h> // this program use a SQUARE block size to maximise // the use of cache memory void main(){ float a[250][250], b[250][250], c[250][250]; int i, j, k, kk, jj; float r; int BLKSZE = 8; //the total bytes will be 8 x 4 bytes (floating) = 32 bytes for (i = 0; i< 250; i++){ for (kk = 0; kk < 250; kk += BLKSZE) for (k = kk; k < min(kk + BLKSZE, 250); k++) { r = a[i][k]; for (jj = 0; jj < 250; jj += BLKSZE) 1 for (j =jj; j <min(jj + BLKSZE, 250); j++) c[i][j] += r * b[k][j]; } } } Write down the following figures: Func Func+Child Hit Time % Time % Count Function --------------------------------------------------------Now change the block size to 16 bytes. (BLKSZE = 4 The line size of cache) Write down the following figures: Func Func+Child Hit Time % Time % Count Function --------------------------------------------------------- Now change the block size to 64 bytes. (BLKSZE =16 The line size of cache) Write down the following figures: Func Func+Child Hit Time % Time % Count Function --------------------------------------------------------- Now change the block size to 96 bytes. (BLKSZE = 24 The line size of cache) Write down the following figures: Func Func+Child Time % Time % BLKSIZE Expected Cache line Real time 4 16 Hit Count Function 8 32 16 64 24 96 2 Looking at the above table, what is your conclusion? (which will perform better) One mark______ Activity 2 – consider the following program with two loops forming a square tile [15 minutes] #include <stdio.h> #include <stdlib.h> #include <memory.h> #include <minmax.h> // this program use a SQUARE block size to maximise // the use of cache memory void main(){ float a[250][250], b[250][250], c[250][250]; int i, j, k, kk, jj; float r; int BLKSZE = 4; for (kk = 0; kk < 250; kk += BLKSZE) for (jj = 0; jj < 250; jj += BLKSZE) for (i = 0; i< 250; i++){ for (k = kk; k < min(kk + BLKSZE, 250); k++) { r = a[i][k]; for (j =jj; j <min(jj + BLKSZE, 250); j++) c[i][j] += r * b[k][j]; } } } Write down the following figures: Func Func+Child Time % Time % Hit Count Function Up to here, you should choose the size of BLKSZE so that a BLKSZE x BLKSZE submatrix of b and a row of length BLKSZE of c can fit in the cache. Now complete the following table. 3 BLKSZE 4 8 12 16 32 TIME (ms) One mark______ 4 Procedure 2 – one hour and twenty minutes Login into the system and follow the instruction This is from CTE exercise 4 Objective: You should make a new version of substitute.exe and demonstrate, using profiling output, that it runs faster. You should be able to obtain at least a factor of 2 speedup (old run time divided by new run time). You do not have to use Microsoft Foundation Class objects, but given that these are well-written and probably correct, you should only replace code that is doing unnecessary work as reflected in profiler measurements. Procedure 1) download the code again similar to last week and measure the time it takes. 2) There are many ways you can modify the code to run faster. You will be given one mark if you find one two marks if you can find one more. You have to measure the original program without modification and then the one with your modification. One mark______, find one method and demonstrate to me One mark_____, find another method and demonstrate to me /* substitute -- substitute strings in a list of files This program operates on a set of files listed on the command line. The first file specifies a list of string substitutions to be performed on the remaining files. The list of string substitutions has the form: "string 1" "replacement 1" "string 2" "replacement 2" ... If a string contains a double quote character or a backslash character, escape the character with backslash: "\"" denotes the string with one double quote character. "\\" contains one backslash. Each file is searched for instances of "string 1". Any occurences are replaced with "replacement 1". In a similar manner, all "string 2"s are replaced with "replacement 2"s, and so on. The results are written to the input file. Be sure to keep a backup of files if you do not want to lose the originals when you run this program. 5 */ #include "afx.h" #include "iostream.h" // parse a quoted string from buffer // return final index in string int parse1(CString *buffer, int start, CString *str) { // look for initial quote: int i = buffer->Find("\"", start); if (i != -1) { // copy to result string str->Empty(); int j = 0; // index into str i++; // skip over the opening double-quote // scan and copy up to the closing double-quote: while ((*buffer)[i] != 0) { if ((*buffer)[i] == '\\') { // read next char to see what to do i++; if ((*buffer)[i] != 0) { str->Insert(j++, CString((*buffer)[i])); } } else if ((*buffer)[i] == '\"') { return i + 1; } str->Insert(j++, CString((*buffer)[i])); i++; } } return -1; } // parse two quoted strings from buffer; return false on failure // bool parse(CString *buffer, CString *pattern, CString *replacement) { int start = parse1(buffer, 0, pattern); if (start < 0) { return false; } start = parse1(buffer, start, replacement); return (start >= 0); } 6 void substitute(CString *data, CString *pattern, CString *replacement) // modify the code here for the first mark { int loc; // find every occurrence of pattern: for (loc = data->Find(*pattern, 0); loc >= 0; loc = data->Find(*pattern, 0)) { // delete the pattern string from loc: data->Delete(loc, pattern->GetLength()); Somewhere // insert each character of the replacement string: here for (int i = 0; i < replacement->GetLength(); i++) { data->Insert(loc + i, (*replacement)[i]);} } } void do_substitutions(CString *data, CString *subs_filename) { TRY { CStdioFile file(*subs_filename, CFile::modeRead); while (true) { CString buffer; // holds line from file CString pattern; CString replacement; file.ReadString(buffer); // handle end of file if (buffer.GetLength() == 0) break; if (parse(&buffer, &pattern, &replacement)) { substitute(data, &pattern, &replacement); } else { cout << "Bad pattern/replacement line: " << buffer << endl; return; } } } CATCH(CFileException, e ) { cout << "File could not be opened or read " << e->m_cause << endl; } END_CATCH } void process_file(CString *filename, CString *subs_filename) { // read in filename to a CString 7 TRY { CFile file(*filename, CFile::modeRead); int size = file.GetLength(); // read the data, allocate more than we need char *data = new char[size + 16]; file.Read(data, size); // files are not zero-terminated but string should be: data[size] = 0; // now we can make a CString from the data: CString content(data); delete data; // data is no longer needed do_substitutions(&content, subs_filename); // write the data file.Close(); file.Open(*filename, CFile::modeWrite); file.Write(content, content.GetLength()); file.Close(); } CATCH(CFileException, e ) { cout << "File could not be opened or read " << e->m_cause << " " << *filename << endl; } END_CATCH } int main(int argc, char *argv[]) { if (argc < 3) { cout << "Not enough input arguments" << endl; cout << "Usage: substitute subs-file src1 src2 ..." << endl; } else { CString subs_filename(argv[1]); for (int i = 2; i < argc; i++) { CString filename(argv[i]); process_file(&filename, &subs_filename); } } return 0; } 8