CO5023 – Computer systems Assignment component 1 CO5023 – Computer Systems Assignment component 1 Performance Evaluation 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 The Evaluation of computer performance Contents 1 2 3 4 5 6 7 8 9 10 11 History of Performance Evaluation Performance Measurement Measuring Performance Choosing Programs to Evaluate Performance The Role of Requirements and Specifications in Performance Testing Some Problems with Benchmarking Summarizing Performance Tests Rating Methodology Bibiliography Appendix Index 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 1 History of performance evaluation Performance Evaluation is a how systems perform in a physical comparable measurement. This was first measured in how quickly a computer could carry out a calculation. However, this method produced too similar results (Hennessy and Patterson,1994). Millions of Instructions per Second was later introduced which measured the number of machine instructions executed in one second(1.1/1.5)(www.DefineThat.com,2008). FLOPS improved this using operations including fractional/floating point decimal numbers which take longer to solve than integers. Issues with this include whether the processor is running under heavy/light workloads, giving inaccurate readings(1.2). Whetstone used floating point arithmetic, measuring performance in kWIPS (kilo/thousands of whetstones) and MWIPS (millions of whetstones)(1.3) Dhrystone was created in 1984 and is another benchmark which processed string/integer and does not process floating point operations(1.4). Eventually Dhrystone was surpassed by the CPU89 benchmark suite (www.Wikipedia.com,2008). 2 Performance Measurement Computer users are interested in reducing the time taken between the start and end of an event. Whereas the system manager would be interested in the amount handled all at the same time. Performance measurement includes response time and latency (2.1). How quickly a system can complete ONE job How MANY tasks a system can handle The performance formulae for a system running a program (X) would be – Performance X = 1/ Execution Time X(Hennley and Patterson, 1998). 3 Measuring performance We measure performance to determine how fast a computer is operating, for comparison purposes against another machine or to determine it’s running efficiently. Measuring Performance is defined as – Duration(clock cycle) and Frequency(clock rate)(3.1) (Clack,2007). Clock Rate/Cycle formulae can be expressed as a formulae(3.2). Instructions/clock rate received can be a performance measurement(3.3). Other ways of measuring performance(3.4). 4 Choosing programs to evaluate performance Real applications o Word/graphical processing software. o Options a user can select and have specified inputs/outputs while running the program o Used by all users o Accurate real-life workload(Hennessy and Patterson,1998). Moddified applications o Tests specific areas of systems o Prime example is graphics benchmark test (4.1) o Used by all users(Hennessy and Patterson,1998). 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 Kernels o Test stress workload on computer performance in MFLOPS o Livermoore loops/Linpacks(4.2). o Tests specific areas of systems o Not used by typical users(Hennessy and Patterson,1998). Toy benchmarks o A short program such as prime number counting(1-90,000) o Tests how fast a computer operates o Automatically detects register, cache size and memory latency. (Hennessy and Patterson,2003). Synthetic Benchmarks – o Whetstone and Dhrystone(1.3/1.4) o These do not commute anything a user would need(Hennessy and Patterson,2003). There are more types of benchmarks in use such as parallel, Input/output and component benchmarks(4.3)(Hennessy and Patterson,1998). 5 The Role of Requirements and Specifications in Performance Testing Outline all necessary objectives you want your system to meet in a performance document which should not be altered(5.1). When computer vendors give specifications to consumers it is important that consumer requirements are outlined beforehand(5.2). The document must be carefully/fairly verified, it is expected to have used realistic workloads(www.Visableprogress.com,2008). 6 Some Problems with Benchmarking Companies concentrate on using benchmarks for good speed results, which does not consider account security, availability, reliability, execution integrity, serviceability or scalability. They do not measure total cost of owership as a price/performance metric. Using more electrical power to power the device if it is portable will mean it will have a shorter battery life - not detected by benchmarking. Most Benchmarks measure workloads as real life workloads running many applications all at once to simulate a business environment for example. However they do not measure the I/O and large and fast memory design such servers will require(www.wikipedia.com,2008). Vendors ignore requirements for development, test and disaster recovery computing capacity and concentrate on productions costs to make their final price lower. Not all benchmarks are set up to deal with some network topologies, they have problems some grids will be user friendly while others are not. Benchmarks concentrate on mean score rather than the user specififed low standard deviations. Systems are documented by vendors at 80% usage and are not documented at 100%. Workloads may effect the server architectures dramatically at 100%.(www.wikipedia.com,2008) 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 7 Summarizing Performance Tests Ways of reducing performance to a single number – using many programs to test performance, therefore assuming a computer is faster if it executes the same number of programs in a faster time and peak performance figures only giving the best performance figure. (Hennessy and Patterson,2003),(Smith,1988). Using (7.1/7.2) we can see totals have been added together to obtain overall results. Working out performance - (7.3/7.4) Good performance - (7.5) 8 Rating Methodology Sysmark2000 (8.2) tests real-life programs (8.1) producing reliable results for users. Content creation tests graphical performance and office productivity tests business related software using as many system resources as possible, giving users the best choice.(www.dewassoc.com,2008) It rates each program, each heading and assigns an overall rating. The programs are tested between the system being tested and a calibration platform (8.3) is used as a basis for comparison. 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 9 Bibiliography This is a list of all the books, Powerpoint lectures and websites I have used for this assesment Fleming, P. J., Wallace, J. J. (1986). How not to lie with statistics: the correct way to summarize benchmark results. (3rd Edition), New York, NY: The Association for Computing Machinery Inc. Hennessy, J. L., Patterson, D. A. (1994). Computer Organization and Design: The Hardware Software Interface. (1st Edition), San Francisco, CA: Morgan Kaufman Publication Inc. Hennessey J. L., Patterson, D. A. (1998). Computer Organization and Design: The Hardware Software Interface. (2nd Edition), San Francisco, CA: Morgan Kaufman Publication Inc. Hennessey, J. L., Patterson, D. A., Goldberg, D. (2003). Computer Architecture: A Quantitative Approach. (4th Edition), San Francisco, CA: Morgan Kaufman Publication Inc. Smith, J. E. (1988). Choosing Programs to Evaluate Performance. (Volume 31, No.10, pp 1202-1206) New York, NY: The Association for Computing Machinery Inc., Clack, C. (2007) B261 – System Architecture: Measuring Performance [Slides 3-16] Retrieved from UCL Department of Computer science web site: http://www.cs.ucl.ac.uk/teaching/B261/PowerPoint/lecture2.ppt Wikipedia Wikipedia. (2008). MIPS. Retrieved from Wikipedia website on 02/12/08: http://www.webopedia.com/TERM/M/MIPS.html Wikipedia. (2008). Floating point operations. Retrieved from Wikipedia website on 01/12/08: http://en.wikipedia.org/wiki/FLOPS Wikipedia. (2008). Whetstone. Retrieved from Wikipedia website on 01/12/08: http://en.wikipedia.org/wiki/Whetstone_(benchmark) Wikipedia. (2008). Dhrystone. Retrieved from Wikipedia website on 02/12/08: http://en.wikipedia.org/wiki/Dhrystone Wikipedia. (2008). Benchmark Challanges. Retrieved from Wikipedia website on 03/12/08: http://en.wikipedia.org/wiki/Benchmark_(computing)#Challenges Wikipedia. (2008). Linkpack. Retrieved from Wikipedia website on 01/12/08: http://en.wikipedia.org/wiki/LINPACK Active hardware. (2008). Benchmarks. Retrieved from Active Hardware website on 05/12/08: http://www.active-hardware.com/english/benchmarks/benchmarks.htm Answers. (2008). Livermore Loops. Retrieved from Answers website on 06/12/08: http://www.answers.com/topic/livermore-loops 0718268@chester.ac.uk CO5023 – Computer systems DEW Associates corporation. Assignment component 1 DEW Associates corporation (2008). What is sysmark 2000?. Retrieved from DEW Associates corporation website on 01/12/08: http://www.dewassoc.com/performance/benchmark/what_is_sysmark_2000.htm DEW Associates corporation (2008). Sysmark 2000. Retrieved from DEW Associates corporation website on 01/12/08: http://www.dewassoc.com/performance/benchmark/benchmark.htm Define That. (2008). Define MIPS. Retrieved from Define That website on 06/12/08: http://www.definethat.com/define/6390.htm Future Tech. (2008). MIPS/FLOPS. Retrieved from Future Tech website on 06/12/08: http://www.futuretech.blinkenlights.nl/perf.html Tech Terms. (2008). MIPS. Retrieved from Tech Terms website on 02/12/08: http://www.techterms.com/definition/mips Visible Progress. (2008). Software performance testing. Retrieved from Visible progress website on 01/12/08: http://www.visibleprogress.com/software_performance_testing.htm Xavier University Dept. of computer science. (2008). Choosing programs to evaluate performance. Retrieved from Xavier University website on 06/12/08: http://cerebro.xu.edu/csci210/01f/n1113/slide2.html 10 Appendix 1.1 Stands for "Million Instructions Per Second." It is a method of measuring the raw speed of a computer's processor. Since the MIPS measurement doesn't take into account other factors such as the computer's I/O speed or processor architecture, it isn't always a fair way to measure the performance of a computer. For example, a computer rated at 100 MIPS may be able to computer certain functions faster than another computer rated at 120 MIPS. The MIPS measurement has been used by computer manufacturers like IBM to measure the "cost of computing." TechTerms. (2008). Million Floating Point. Retrieved from the TechTerms website on 02/12/08: http://www.techterms.com/definition/mips 1.2 In computing, FLOPS (or flops or flop/s) is an acronym meaning FLoating point Operations Per Second. The FLOPS is a measure of a computer's performance, especially in fields of scientific calculations that make heavy use of floating point calculations, similar to instructions per second. Wikipedia. (2008). Floating point operations. Retrieved from Wikipedia website on 01/12/08: http://en.wikipedia.org/wiki/FLOPS 1.3 The Whetstone benchmark is a synthetic benchmark for evaluating the performance of computers . It was first written in Algol 60 in 1972 at the National Physical Laboratory in the United Kingdom and derived from statistics on program behaviour gathered on the KDF9 computer, using a modified version of its Whetstone Algol 60 compiler. The program's behavior replicated that of a typical KDF9 scientific program and was designed to defeat compiler optimizations that would have adversely affected the accuracy of this model. The Whetstone Compiler was built at the Atomic Power Division of the English Electric Company in Whetstone, Leicestershire, England, hence its name. The Fortran version, which became the first general purpose benchmark that set industry standards of computer system performance, was developed by Harold Curnow of HM Treasury Technical Support Unit (TSU - later part of Central Computer and Telecommunications Agency or CCTA). Further development was carried out by Roy Longbottom, also of TSU/CCTA, who became the official design authority. 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 The Whetstone benchmark primarily measures the floating-point arithmetic performance. A similar benchmark for integer and string operations is the Dhrystone. Wikipedia. (2008). Whetstone benchmark. Retrieved from Wikipedia website on 01/12/08: http://en.wikipedia.org/wiki/Whetstone_(benchmark) 1.4 Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance until it was superseded by the CPU89 benchmark suite from the Standard Performance Evaluation Corporation, today known as the "SPECint" suite. With Dhrystone, Weicker gathered meta-data from a broad range of software - including programs written in FORTRAN, PL/1, SAL, ALGOL 68, and Pascal. He then characterized these programs in terms of various common constructs - procedure calls, pointer indirections, assignments, etc. From this he wrote the Dhrystone benchmark to correspond to a representative mix. Dhrystone was published in Ada, with the C version for Unix developed by Rick Richardson ("version 1.1") greatly contributing to its popularity. Wikipedia. (2008). Dhrystone benchmark. Retrieved from Wikipedia website on 02/12/08: http://en.wikipedia.org/wiki/Dhrystone 1.5 Problems with MIPS include giving inaccurate readings due to instruction lengths varying and no official MIPS testing standard and being unable to be used across different instruction sets. Therefore a Computer with a faster MIPS rating may not even run a computer faster than one with a low MIPS rating when executing applications. Wikipedia. (2008). Floating point operations. Retrieved from Wikipedia website on 02/12/08: http://en.wikipedia.org/wiki/FLOPS 2.1 Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done? Clack, C. (2007) B261 – System Architecture: Measuring Performance [Slides 3-16] Retrieved from UCL Department of Computer science web site: http://www.cs.ucl.ac.uk/teaching/B261/PowerPoint/lecture2.ppt 3.1 Duration examples 1/1000 second = 1 millisec = 1ms = 10-3 s 1/1,000,000 s = 1 microsec = 10-6 s 1/1,000,000,000s = 1 nanosec = 10-9 s Frequency examples 1 Herz = 1 cycle per second 1 MHz = 1,000,000 cycles per sec 100MHz = 100,000,000 cycles per sec. Clack, C. (2007) B261 – System Architecture: Measuring Performance [Slides 3-16] Retrieved from UCL Department of Computer science web site: http://www.cs.ucl.ac.uk/teaching/B261/PowerPoint/lecture2.ppt 0718268@chester.ac.uk CO5023 – Computer systems 3.2 Assignment component 1 Clock cycle: time between 2 consequent (machine) clock ticks. Instead of reporting execution time in seconds, we often use cycles. Clock rate (frequency) = cycles per second. ( 1 Hz = 1 cycle/sec) Example: Machine with 200 Mhz clock has 200 * 106 Hz => it produces 2*108 clock cycles per second => its cycle (time) is 1/ 2*108 = 5 nanoseconds. (nanosecond = 10-9 seconds). Hennessey J. L., Patterson, D. A. (1998). Computer Organization and Design: The Hardware Software Interface. (2nd Edition), San Francisco, CA: Morgan Kaufman Publication Inc. 3.3 CPU Time = I * CPI * T I = number of instructions in program CPI = average cycles per instruction T = clock cycle time This shows how CPU execution time can be measured using instructions within the program and instructions Clack, C. (2007) B261 – System Architecture: Measuring Performance [Slides 3-16] Retrieved from UCL Department of Computer science web site: http://www.cs.ucl.ac.uk/teaching/B261/PowerPoint/lecture2.ppt 3.4 # of cycles to execute program # of instructions in program # of cycles per second average # of cycles per instruction average # of instructions per second Clack, C. (2007) B261 – System Architecture: Measuring Performance [Slides 3-16] Retrieved from UCL Department of Computer science web site: http://www.cs.ucl.ac.uk/teaching/B261/PowerPoint/lecture2.ppt 4.1 Graphic and Video benchmark tests SpecViewPerf - 55.8Mb OpenGl performance Benchmark ChameleonMark - 49.5Mb Nvidia GeForce 3 GPU tests 3DMark 2001 SE - 39.8Mb Nvidia GeForce 3 GPU tests CRT Alignment - 66Kb Bench your monitor performance Direct 3D Bench - 206Kb Multiple 3D tests for graphic cards Tunnel Test - 147Kb Direct 3D tests Incoming Demo - 26,2Mb 3D Benchmark. Final Reality - 7.9Mb Complex 3D Benchmark under D3D 3D Mark 2000 V1.1 - 19.4Mb Complex 3D Benchmark under D3D Active Hardware. (2008). Benchmarks. Retrieved from Active Hardware website on 05/12/08: http://www.active-hardware.com/english/benchmarks/benchmarks.htm 4.2 [Livermore Loops: a laboratory in California] informatics A specific computer program package representative of intensive calculation typical of nuclear physics in the 1980s, used as a benchmark to measure the power of computers, usually as MFLOPS, by standardized conversions. 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 Answers. (2008). Livermore Loops. Retrieved from Answers website on 06/12/08: http://www.answers.com/topic/livermore-loops The LINPACK Benchmarks are a measure of a system's floating point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense N by N system of linear equations Ax=b, which is a common task in engineering. The solution is obtained by Gaussian elimination with partial pivoting, with 2/3·N3 + 2·N2 floating point operations. The result is reported in millions of floating point operations per second (MFLOP/s, sometimes simply called FLOPS). Active Hardware. (2008). Linkpack. Retrieved from Active Hardware website on 01/12/08: http://en.wikipedia.org/wiki/LINPACK 4.3 I/O benchmarks Parallel benchmarks: used on machines with multiple processors or systems consisting of multiple machines Component Benchmark/ micro-benchmark programs designed to measure performance of a computer's basic components automatic detection of computer's hardware parameters like number of registers, cache size, memory latency Wikipedia. (2008). Benchmark challanges. Retrieved from Wikipedia website on 03/12/08: http://en.wikipedia.org/wiki/Benchmark_(computing)#Challenges 5.1 Set Performance Testing Objectives It is useful during performance testing to start by setting clear objectives. More often than not, your performance tests will seek to achieve one or more of the following objectives: apacity. Visible Progress. (2008). Software performance testimg. Retrieved from Visible progress website on 01/12/08: http://www.visibleprogress.com/software_performance_testing.htm 5.2 Determine Customer Requirements Early It is extremely important that you fully understand your customer's intentions and requirements as early as possible regarding software performance i.e. the operating environment (both hardware and software) in which your product will be deployed and the manner in which it will be used. To begin to identify your customer's requirements you must determine: Visible Progress. (2008). Software performance testimg. Retrieved from Visible progress website on 01/12/08: http://www.visibleprogress.com/software_performance_testing.htm 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 7.1 A comparison of three different computers, using two different benchmarks. Performance is then worked out by adding up their scores. Smith, J. E. (1988). Choosing Programs to Evaluate Performance. (Volume 31, No.10, pp 1202-1206) New York, NY: The Association for Computing Machinery Inc., 7.2 Obtaining different performance rating using different calculations, this is how you can compare 3 different computers. Smith, J. E. (1988). Choosing Programs to Evaluate Performance. (Volume 31, No.10, pp 1202-1206) New York, NY: The Association for Computing Machinery Inc., 7.3 These are described further in figure 7.4, but are the basic formulae behind working out the different means. 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 Smith, J. E. (1988). Choosing Programs to Evaluate Performance. (Volume 31, No.10, pp 1202-1206) New York, NY: The Association for Computing Machinery Inc., 7.4 Artethmetic, Geometric and Harmonic means using MFLOP results can give us different performance figures. Arithmetic mean should be used an expression for time, Geometric mean should be expressed as a rate or time and Harmonic mean should be used as an expression of rate/comparing two rates, it corresponds accurately that it will be used by real-life programs and can be used as an accurate calculation for performance. If performance should be normalised then total time or the harmonic mean should be calculated before normalising is done (Smith,1988) 7.5 Good testing is whole system stay constant with only one variable, such as changing the processor, meaning only one item is being changed. However an issue with this is that processors will only fit into certain motherboards, meaning the motherboard and processor must be changed, giving unfair results. If we were to test two systems to display frames per second we should use identical equipment but different graphics cards for example. (Flemming and Wallace, 1986) 8.1 OFFICE PRODUCTIVITY Corel® o CorelDRAW ® 9 o Corel Paradox® Microsoft® o Word 2000® o Excel 2000® o PowerPoint® Dragon Systems® o NaturallySpeaking® Preferred v.4.0 Netscape® o Communicator® 4.61 Caere OmniPage® CONTENT CREATION Adobe® o Photoshop® v.5.5 o Premiere® v.5.1 Elastic Reality® v.3.1 MetaCreations® o Bryce® 4 Microsoft Windows® Media Encoder v.4.0 There are many new versions of Sysmark out such as Sysmark2001, these update the programs which are tested and update the calibration platform. DEW associates corporation. (2008).What is Sysmark 2000?. Retrieved from DEW associates corporation website on 01/12/08: http://www.dewassoc.com/performance/benchmark/what_is_sysmark_2000.htm 8.2 Overview SYSmark 2000 is the latest release in the SYSmark family of benchmarking products cooperatively designed and developed by the members of Business Applications Performance Corporation (BAPCo). In 1992, BAPCo developed the concept of application-based benchmarking using popular business software. Ever since the inception of SYSmark 92, other benchmarks have been developed using the concept pioneered by BAPCo. In 1997, BAPCo, with the help of several Fortune 500 corporations, identified new application areas and benchmarking methodologies that address new evaluation and characterization requirements of corporations. Today, BAPCo breaks new ground again by introducing SYSmark 2000, a new suite of application-based 0718268@chester.ac.uk CO5023 – Computer systems Assignment component 1 benchmarks that addresses the new computing challenges faced by corporations. Particular emphasis has been given to Internet related operations in today's applications. SYSmark 2000 is an unprecedented suite of twelve application-based benchmarks that accurately evaluate and characterize the performance of personal computers within the new computing paradigms. Comprehensive and scientifically designed workloads in new and exciting areas such as Internet Content Creation, Speech Recognition, and Video Editing complement new workloads in the traditional Office Productivity category to provide an all-encompassing suite of benchmarks. Information technologists can now use SYSmark 2000 to accurately forecast present and future desktop computing needs as well as properly evaluate the various and sometimes dizzying solutions provided by different vendors. The advantage of using SYSmark 2000 lies in the strength of its workloads, which reflect actual usage of real applications within the new computing paradigms. Moreover BAPCo's benchmarks are cooperatively designed and developed by top performance engineers representing a wide cross-section of industry-leading publications, testing labs, PC manufacturers, software developers and semiconductor manufacturers. SYSmark 2000 allows comparisons between Intel® Architecture systems based on the performance of real applications running on Windows 2000, Windows NT* 4.0 , Windows* 98, and Windows* 95. This document describes the structure and performance characteristics of SYSmark 2000. DEW associates corporation. (2008). Sysmark 2000 overview. Retrieved from DEW associates corporation website on 01/12/08: http://www.dewassoc.com/performance/benchmark/benchmark.htm 8.3 After SYSmark 2000 is run on a system to be evaluated, it assigns the system a performance rating for each application, a rating for each category, and an overall rating. The application ratings are based on a comparison of workload run times between the system being tested and a fixed calibration platform. A rating of 100 indicates the test system has performance equal to that of the calibration platform, 200 indicates twice the performance of the calibration platform, etc. Each category rating is simply a geometric mean of the workload ratings in the category. The overall rating is a weighted geometric mean of the category ratings. The SYSmark 2000 calibration platform has the following configuration: Motherboard: Based on the Intel® 440BX motherboard CPU: Intel® Pentium® III processor Core Frequency: 450 MHz Memory: 128MB DIMM Video/Resolution: Diamond Viper V770 Ultra, 32 MB, 1024x768 16 bpp. Disk: IBM* DJNA 371800 Operating System Windows* 98 Second Edition *A system that scores a SYSmark 2000 rating of 200 is twice as fast as the Calibration Platform. DEW associates corporation. (2008). Sysmark 2000 overview. Retrieved from DEW associates corporation website on 01/12/08: http://www.dewassoc.com/performance/benchmark/benchmark.htm 0718268@chester.ac.uk CO5023 – Computer systems 11 Index 1.1 1.2 1.3 1.4 1.5 What is MIPS? What is FLOPS? Dhrystone Whetstone Problems with MIPS 2.1 Response time/Latency 3.1 3.2 3.3 3.4 Duration/Frequency Clock cycle/ Clock rate Instructions in measurement Ways of measuring performance 4.1 4.2 4.3 Graphics Benchmarks Livermore loops/Linpack Extra Benchmarks 5.1 5.2 Set performance test objectives Determine customer requirements early 7.1 7.2 7.3 7.4 7.5 Performance of thee computers on two benchmarks Performance of benchmarks in MFLOPS Three types of means About the three types of means Good testing 8.1 8.2 8.3 Sysmark 2000 programs Sysmark 2000 overview Sysmark 2000 calibration platform 0718268@chester.ac.uk Assignment component 1