Statistical Usage Models in Mobile Processor Thermal Design and Testing by Thomas C. Evans B.S. Mechanical Engineering, Case Western Reserve University 1994 Submitted to the Department of Mechanical Engineering and the Sloan School of Management in Partial Fulfillment of the Requirements for the Degrees of Master of Science in Mechanical Engineering and Master of Business Administration In Conjunction with the Leaders for Manufacturing Program at the Massachusetts Institute of Technology June 2003 @2003 Massachusetts Institute of Technology. All rights reserved. Signature of Author Department of Mechanical Engineering Sloan School of Management May 2003 Certified by Daniel E. Whitney Senior Research Scientist Thesis Supervisor Certified by___________ C edRoy E. Welsch Professor fStatistics and Management Science Thesis Supervisor Accepted by Ain Sonin, Chairman, Graduate Committee Department of Mechanical Engineering Accepted by Margaret Andrews, Executive Director of Masters Program Sloan School of Management MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 0 8 2003 LIBRARIES 2 Statistical Usage Models in Mobile Processor Thermal Design and Testing by Thomas C. Evans Submitted to the Department of Mechanical Engineering and the Sloan School of Management on May 9, 2003 in partial fulfillment of the Requirements for the Degrees of Master of Science in Mechanical Engineering and Master of Business Administration Abstract The performance, quality and reliability of microprocessors are highly dependant upon their operating temperature. Intel microprocessors are equipped with a thermal monitor feature that reduces power to the chip when the maximum temperature is reached to prevent overheating. This is commonly referred to as "throttling". Worst-case thermal design methodologies provide highly robust thermal solutions that keep processors running below their maximum temperature. OEMs do not always follow Intel's recommendations in this regard, particularly when they use desktop processors in mobile form factors. The processors in these systems run hotter and are more likely to throttle. A methodology that uses the principles of statistical tolerancing is developed to quantify the performance impact of throttling on thermally under-designed mobile systems. Customer usage models are developed from market survey data, and used with Monte Carlo simulation techniques to calculate the distributions of processor temperature and performance in use. Simulation results from both worst-case operating conditions and a statistical usage model are analyzed and compared. The statistical usage model is then used to compare the theoretical operation and performance of a true mobile system and a desktop processor transportable system. Finally, an analysis is performed to evaluate the effect of reducing the throttle set point on a mobile processor. Results show that the predicted occurrences of throttling drops by two orders of magnitude when comparing the statistical usage model to worst-case, that approximately 10 percent of the desktop transportable systems would throttle severely, and that total quality events could be reduced by 37 percent by lowering the throttle set point. The implications for new product positioning and increasing integrality of mobile computer systems are discussed. Thesis Supervisor: Daniel E. Whitney Title: Senior Research Scientist Thesis Supervisor: Roy E. Welsch Title: Professor of Statistics and Management Science 3 Acknowledgements I would like to acknowledge the Leaders for Manufacturing Program for its support of this work. I would like to acknowledge Vivek Phanse, my supervisor at Intel, for his support and guidance throughout my internship. I would like to acknowledge Professors Whitney and Welsch for their constructive ideas. I dedicate this thesis to my wife, Nancy, without her constant support and encouragement none of my success would have been possible. 4 Table of Contents A bstra ct .............................................................................................................................. Acknowledgements....................................................................................................... Table of Contents......................................................................................................... Chapter 1 - Overview .................................................................................................. 1.1 Background ........................................................................................................ 1.2 Project Goals....................................................................................................... 1.3 Approach.............................................................................................................. Chapter 2 - Background and Project Setting ........................................................ 2.1 Statistical Tolerancing...................................................................................... 2.2 Intel Mobile Platform Group ............................................................................... 2.3 Thermal Enabling ............................................................................................. 2.3.1 Thermal Design Power............................................................................ 2.3.2 TDP Ratio and Application Ratio (AR)................................................... Chapter 3 - Product Overview ................................................................................. 3.1 Thermal Monitor................................................................................................ 3.2 Thermal Impact ................................................................................................ 3.2.1 Temperature Effect on Processor Frequency....................................... 3.2.2 Temperature Effect on Processor Reliability ....................................... 3.3 Power Consumption ................................................................................... 3.3.1 Dynamic Power......................................................................................... 3.3.2 Leakage Current Power ......................................................................... 3.4 Heat Dissipation................................................................................................ Chapter 4 - Simulation Process .............................................................................. 4.1 Monte Carlo Simulator .................................................................................... 4.2 Simulator Modifications.................................................................................... 4.2.1 Application Ratio and Ambient Temperature Distribution................... 4.2.2 Performance Impact Modeling ............................................................... Chapter 5 - Usage Models......................................................................................... 5.1 Mobile Choice Survey...................................................................................... 5.2 Defining a "User".............................................................................................. 5.2.1 Software Characterization - Application Ratio .................................... 5.2.2 User Characterization............................................................................... 5.3 Usage Model Matrix ........................................................................................ 5.3.1 Worst-Case Peak 5-Second Average Usage Model .......................... 5.3.2 Expected Peak 5-Second Average Usage Model................................ 5.3.3 Expected Average Usage Model.......................................................... 5.3.4 W orst-Case Average Usage Model...................................................... 5.4 Application Ratio Distribution Parameters ................................................... 5.5 Ambient Temperature Distributions............................................................. Chapter 6 - Simulation Results .............................................................................. 6.1 Worst-Case Operating Conditions vs. Statistical Usage Model............... 5 3 4 5 7 7 8 8 11 11 13 15 16 17 19 19 20 20 22 22 23 24 24 27 27 30 30 31 33 33 34 35 36 37 37 39 41 43 44 47 49 49 6.2 Mobile System vs. Desktop Arbitrage System........................................... 6.3 Recommended Change to Throttle Set-Point.............................................. Chapter 7 - Recommendations and Future Use .................................................. 7 .1 V a lid atio n ........................................................................................................ .. 7.2 Market Segments and Product Positioning .................................................. 7.3 Industry Dynamics and Strategic Partnerships........................................... Appendix A: Selected Questions and Sample Answers from End-User Mobile C ho ice S u rvey............................................................................................................. . B ib lio g ra p h y..................................................................................................................... 6 54 60 69 69 71 73 75 77 Chapter 1 - Overview 1.1 Background Intel Corporation is the leading supplier of microprocessors to the computing industry. Intel supplies microprocessors to many market segments, including both the desktop market segment and the mobile market segment. The appearance of a new mobile computing market segment, the desktop replacement or "transportable", has resulted in higher-power desktop processors installed in laptop form factor chassis. The high power processors are pushing the limits of the cooling systems that can be used in these form factors, and processors are running hotter. The performance, quality and reliability of microprocessors are highly dependant upon their operating temperature. Intel microprocessors are equipped with a thermal monitor feature that reduces power to the chip when the maximum temperature is reached to prevent overheating. This is commonly referred to as "throttling". Intel uses statistical methods and tolerances for many aspects of estimating product quality and to determine product-testing specifications as part of its standard business practice. However, the usage model input, one of the major factors, is typically entered as a "worst-case realistic" value. The thermal design process is also based on worst-case tolerances. Intel sets its thermal specifications and recommendations such that processor "throttling" is an unlikely event, happening only under extreme circumstances. However, 7 some original equipment manufacturers (OEM) are disregarding Intel's recommendations, and are producing systems with less capable thermal solutions. These systems are considered thermally under-designed, and throttle more often because the processors are not adequately cooled and running hotter. With increases in processing power, thermal under-design is expected to increase in both quantity and magnitude. The problem that Intel is faced with is the challenge of realistically predicting or simulating the amount of throttling that occurs in the under-designed systems. 1.2 Project Goals The project goals are to answer the following questions: * What is the expected occurrence of throttling with thermally under-designed systems? o How many units will throttle/suffer performance impact? o How significant is the performance impact? o How often is the performance impacted? " How can Intel incorporate the usage model of mobile computers into product specs and recommendations and/or process parameters in order to maximize value? 1.3 Approach Chapter 2 begins with a brief description of statistical tolerancing and a comparison to worst-case tolerancing. It then looks at the project setting, the mobile computer market segment, and the Thermal Enabling function at Intel. Chapter 3 goes into some depth 8 about the microprocessor product characteristics that are germane to this thesis. Chapter 4 discusses the simulation process that is the basis for making the throttling predictions. Chapter 5 develops the statistical usage model that is used as an input to the simulator to provide realistic throttling estimations. Chapter 6 reviews the results of the three simulation comparisons: worst-case operating conditions vs. statistical usage model, mobile system vs. desktop arbitrage system, and 100 *C vs. 95 'C throttle set point. Chapter 7 concludes with recommendations and future uses for the methodology developed. 9 This page intentionally left blank. 10 Chapter 2 - Background and Project Setting 2.1 Statistical Tolerancing Statistical tolerancing is a methodology that allows an increase in the manufacturability of a product while achieving the required performance and quality by understanding the statistical nature of the factor deviations. Statistical tolerancing works because the chance that all factors will experience maximum deviations in the same direction, what is called worst-case stack-up, is small. In fact, with more factors involved the chance of worst-case stack-up typically becomes less. This is in direct contrast to worst-case tolerancing, which always assumes worst-case stack-up. As a simple example, imagine a five-inch stack of blocks made of five one-inch blocks as shown in Figure 2.1. 1.000" ±/-? 5.000" +/- 0.050 Figure 2.1: Example of stack-up tolerances If the tolerance for the stack is 0.050 inches, what should the tolerance on each individual block be? Worst-case tolerancing says 0.010 inches. That way sum of the tolerances can never be more than 0.050. If the size of the blocks is normally distributed with a mean of 11 one and standard deviation of 0.005 (i.e. N(l, 0.005)), then five percent of the blocks would be outside of the 0.010 individual tolerance and would need to be scrapped. However if all of the blocks are N(1,0.005) and the deviations are independent, then the stack can be assembled with all of the blocks (none of the blocks are scrapped), and only one in 100,000 of the assembled stacks will fall outside of the 0.050 stack tolerance. This is because the chance that all five blocks will have large deviations in the same direction is rare. This is the principle that makes statistical tolerancing work. One caveat of statistical tolerancing is that there is no longer any absolutes. One cannot say that the stack will never be greater than X. There is always some probability, however infinitesimal that it could happen. Of course, it is still possible to inspect the assembly and scrap and assemblies that do not meet the final specifications. Intel uses this kind of statistical approach to estimate product quality and to determine product-testing specifications as part of its standard business practice. However, the usage model input, one of the major factors which contains two input variables, is typically entered as a "worst-case realistic" value. Let us return to the block example and see what happens if we assume that two of the blocks are always at the worst-case tolerance of 0.010, even though they are still actually N(1,0.005). We now need to inspect the assembly to 3 inches +/- 0.030 after stacking the three blocks we know to be normally distributed. Now 30 in 100,000 three-block assemblies would measure out of tolerance and need to be scrapped. The scrap rate 12 increased by a factor of 30 because-worst case assumptions were made for two blocks out of five. This is analogous to the approach Intel is taking with its quality simulations, and why it is important to correctly identify the distributions for the last two parameters. 2.2 Intel Mobile Platform Group Intel Mobile Platform Group (MPG) is one of Intel's six strategic business units. MPG and is responsible for developing and marketing microprocessors, chipsets and related hardware and software for laptop and other mobile computing systems. Products for this market segment have features not found in desktop products. These features, most of which revolve around reducing power consumption; make them particularly valuable for mobile applications. For this reason, mobile products command a premium in the market. Over the last several years the mobile market segment has become much more complicated as the traditional notebook computer form factor has given way to a variety of form factors as shown in Figure 2.2 13 Transportable =MWW9:::f V7 Notebook computer Thin & light MiniNote Figure 2.2: The variety of mobile form factors currently available. Source: Intel Mobile Platforms Group In addition to the proliferation of form factors, mobile system powers have been increasing rapidly, as shown in Figure 2.3. System Power 60 50 40 30 n"20 10 0 1994 1995 1996 1997 1998 1999 2000 Figure 2.3: Average total mobile system powers. Source: Intel Mobile Thermal Enabling 14 2001 The active/passive line refers to the cooling threshold described in Chapter 3. The combination of smaller form factors and higher system powers continues to make heat removal more challenging. In addition, OEM's have been installing higher-power desktop processors in laptop form factor chassis, creating a new mobile computing market segment, the desktop replacement or "transportable". This practice is referred to as "desktop arbitrage", as desktop processors are replacing premium priced mobile processors in what is traditionally thought of as a mobile form factor. Obviously, "desktop arbitrage" is of great concern to MPG as they are directly losing sales of its mobile products. It also negatively impacts the revenue of Intel as a whole as sales of less expensive desktop processors replace those of the premium priced mobile processors. In addition to the negative financial implications for Intel, "desktop arbitrage" also presents significant technical hurdles. The higher power processors exceed the capabilities of the cooling systems used in these mobile form factors. 2.3 Thermal Enabling The role of the Mobile Thermal Enabling team is to provide guidance and make it as easy as possible for customers (OEMs) to design Intel products into its systems. They test solutions from suppliers and create example thermal solutions. They also publish design collateral on the best design practices. Most interesting for this thesis is the first part of the thermal enabling process; defining the Thermal Design Power (TDP) for a processor. 15 2.3.1 Thermal Design Power TDP is the value that Intel recommends the computer OEMs use in designing their cooling solutions. TDP is not the theoretical maximum power for a given chip design, but is an empirical value (adjusted for anticipated leakage) derived from testing the processor with most intensive commercially available software application. A wide variety of software applications are screened to find the one that is most processor intensive. During TDP testing the power consumed by the processor is measured and recorded at the millisecond timescale. However power spikes at that frequency are of no thermal interest. Due to the thermal inertia of the processor and the cooling solution the temperature of the processor responds on a much slower time scale. Therefore the maximum power of interest for a given application is not the maximum power spike. Laboratory experiments, computer modeling and experience have shown that a rolling 5second average power trace is a good indicator of the temperature that a mobile processor will reach. The rolling 5-second average power is post-processed from the data. Each data point is given the average value of the previous five seconds worth of data. This simulated the thermal inertia of the system. Since the TDP value is intended for thermal calculations it makes sense to use a measurement that corresponds with temperature. Because the maximum temperature that the processor reaches is the value of interest, the peak 5-second average is used as the rating. Figure 2.4 shows a power trace for a software application test. 16 3Dmark2000 1.6 1.4 1.2 0 0.8 0.6 0.4 In (C0 r- (a C DC 40 0)0NN C 0 v N .0 D C -N C U N Time Figure 2.4: Software application power trace showing power spikes, 5-second rolling average power, and mean power. Source: Intel Mobile Power Lab After the peak 5-second average value is calculated the TDP rating still needs to be corrected for leakage power. Leakage power varies from chip to chip (see Section 3.3.2). The leakage power is measured for the TDP test chip when it is at idle. To make the correction the test leakage value is subtracted from the measured peak 5-second average power and the maximum leakage power for that processor specification is added back. This ensures that the TDP corresponds to the highest leakage part for a given specification. 2.3.2 TDP Ratio and Application Ratio (AR) TDP is expressed in watts. However it can also be useful to think of it in terms of a ratio of a maximum power. The maximum power of a processor is determined by running a "power virus", a piece of specifically developed in-house software code that has no 17 functionality other than to apply maximum stress to the CPU. The TDP ratio is then defined as shown in Equation. 2.1. TDP ratio = (TDP - Leakage power) (Maximum power - Leakage power) (Eq. 2.1) TDP ratios typically fall between 0.7 - 0.85. The TDP ratio can be particularly useful in modeling simulations and in comparing processors. Software applications can also be defined in terms of an application ratio (AR). In mobile computing the process is the same as defining the TDP ratio. The equation 2.2 is given for AR. AR = (Peak 5-second average power - Leakage power) (Maximum power - Leakage power) (Eq. 2.2) This chapter looks at the fundamentals of statistical tolerancing, provides an overview of Intel Mobile Platform Group and gives some insight to the functions of the Mobile Thermal Enabling team. The next chapter focuses on the features and characteristics of Intel microprocessors that are important for this thesis. 18 Chapter 3 - Product Overview The central feature of the Intel microprocessor that is relevant to this work is the thermal monitor. The thermal monitor attempts to control the processor temperature by initiating the thermal control circuit, which reduces the processor frequency, and hence its power dissipation. This is commonly referred to as "throttling". This chapter will provide a brief discussion of the thermal monitor, why it is necessary, how heat is generated in a semiconductor, and how that heat is removed from a mobile computer system. 3.1 Thermal Monitor Intel® Pentium® 4 processors are equipped with a thermal monitor feature. This consists of an on-die temperature sensor and a built in control circuit which can stop the processor clock input to the central processing unit (CPU) core, placing the microprocessor in a lower power state [1]. When the microprocessor reaches the maximum operating temperature (the Throttle Set Point) the thermal monitor toggles the stop clock feature. Modulating the power dissipated in the CPU reduces it linearly with the percentage of time the clock is stopped, and preventing the processor from overheating. While the clock is stopped the CPU is also prevented from executing instructions, which reduces its operating performance. This is commonly referred to as "throttling". Based on the characteristics described above, this project set out to answer 3 questions: 1. How many Mobile CPU's will throttle and suffer performance impact? 19 2. How significant is the performance impact? 3. How often does the performance impact occur? 3.2 Thermal Impact Why is a thermal monitor necessary? The thermal monitor is designed to provide nontraumatic temporary performance decreases that are virtually unnoticeable to the end user in the place of more severe thermal failures. Without a thermal monitor a microprocessor can literally overheat and destroy itself [2]. Temperature affects the operation of a microprocessor in several ways. A microprocessor's maximum frequency is a function of its temperature. Changes in temperature can have an instantaneous effect on how fast the processor can perform operations. Temperature also has a cumulative impact. The long-term reliability of a microprocessor is also a function of its temperature. Elevated temperatures increase the rate of all chip related failure mechanisms [3]. These two temperature effects are discussed below. The purpose of the thermal monitor is to substitute short-term performance losses in place of more severe thermal failures. 3.2.1 Temperature Effect on Processor Frequency The maximum frequency that a microprocessor can operate at before it loses functionality is referred to as Fmax [4]. Fmax is a function of both input voltage and temperature. Fmax increases with increasing voltage, and decreases with increasing temperature. Fmax can be measured on an individual part either in the laboratory or on the production line with the proper equipment. Due to manufacturing variation there is a distribution of Fmax on any given product. An example of this distribution is shown in Figure 3.2. 20 (0 . a- Fmax (GHz) Figure 3.2: Processor frequency maximum (Fmax) due to manufacturing variation. Source: Intel Corporation Before a device is shipped its frequency is tested at a prescribed temperature and voltage at the factory. Based on the results of this test the device is "binned", and its speed is preset. If a microprocessor's temperature increases (or voltage decreases) to the point that Fmax is less than the pre-programmed frequency, the processor will "lock-up" and stop operating. This is known as an Fmax failure. The speed is preset on each device such that as long as the temperature specifications are met, an Fmax failure would be an extremely rare event. Throttling is a mechanism that is intended to prevent a catastrophic Fmax failure by virtue of trying to limit the temperature. 21 3.2.2 Temperature Effect on Processor Reliability In addition to the instantaneous Fmax failures, elevated temperatures also have a cumulative negative effect on the long-term reliability of a semiconductor microprocessor. All chip related failure mechanisms are aggravated by high temperature. One example is dopant diffusion, which is the diffusion of contaminants within the semiconductor structure. Increases in temperature increase the diffusion rate, which in turn reduces the mean time to fail. Another example is gate oxide, where temperature increases the rate of the oxidation reaction [3]. In addition, a large number of assembly related and operation induced failure mechanisms are also aggravated by high temperature. One example is electro migration, an operation induced failure where higher temperatures make it easier for metal atoms to be moved by the impact of current flow [3]. These effects also make it desirable to limit the silicon temperature. 3.3 Power Consumption Heat is generated in a semi-conductor circuit, like any electrical circuit, when a voltage is applied across a resistance and a current flows. The heat generated is equal to the power consumed. For a simple approximation, the power consumed by a digital logic circuit is entirely due to charging and discharging circuit capacitor nodes [5]. The power consumed by any node is small, but multiplied by millions of transistors per chip 22 switching billions of times per second, it adds up rapidly [1]. This results in Equation 3.1 for power: Power = CtotaIV 2 f (Eq. 3.1) where: Ctotai is the average capacitance charged per cycle V is the supply voltage f is the operating frequency. Equation 3.1 only accounts for the dynamic power that occurs from changes in transistor states. There is also a leakage current power component that in theory should be zero, but has become more significant as processor geometries have shrunk and incorporated more gates [1]. Both dynamic power and leakage current power are detailed below. 3.3.1 Dynamic Power Dynamic power is typically the major source of power dissipation in microprocessors [6]. The dynamic power consists of the dynamic switching power and short circuit current power. The dynamic switching power is the power consumed in switching a transistor gate, and the short-circuit current occurs during a transition when both the input and output gates are partially open. In a well-designed circuit the short circuit power dissipation can be limited to 5-10% of the total dynamic power [6]. For this thesis the most important aspects of dynamic power are that it is the dominant source of power dissipation in microprocessors and it is determined by the design of the microprocessor (the number of transistors and their capacitance), the operations it is performing (how many transistors are switching), the supply voltage, and operating frequency. It is also useful to think in terms of the dynamic current, which is the dynamic power divided by the supply voltage. 23 3.3.2 Leakage Current Power The leakage, or standby current (Isb) is, as the name implies, the current that leaks across transistors when they are turned off Therefore leakage current is independent of the operations performed. Leakage current is temperature sensitive, Isb can increase dramatically at higher temperatures [6]. As a result of manufacturing variability the Isb can vary significantly. The distribution of Isb for a particular product will also change over time as the manufacturing process matures. Also, there are process parameters that can be used to tune other desirable properties of the microprocessors that also affect leakage current, and will add to the shifts in Isb distribution. For each product there is a maximum specified Isb, and all parts are screened to below that level. The leakage current power is calculated with ohm's law: Pleakage = V * Isb (Eq 3.2) 3.4 Heat Dissipation All of the heat generated within a system needs to be dissipated, otherwise the temperature will continue to rise. A normally sized mobile computer can accommodate approximately 15 W of passive cooling [7] from radiation and natural convection. Systems that exceed this total require an active cooling solution. In particular, the CPU, which has been on of the higher power devices in the system, requires a dedicated active thermal solution. A remote heat exchanger (RHE), as depicted in Figure 3.3, is the most common technology used to remove heat from the CPU in mobile computer systems. 24 Attachment Block Air Inlet M Air Exhaust Heat Pipe CPU (junction) Heat Exchanger (HX) Figure 3.3[8]: Remote heat exchanger for mobile computer systems. The heat generated at the transistors in the processor is conducted through the silicon into the attachment block. There is typically a thermal interface material between the CPU and the attachment block to improve conduction across this boundary. The attachment block also serves as a heat spreader to equalize temperatures across the silicon die. The heat then flows through the heat pipe. Heat pipes are extremely efficient. They operate by vapor/liquid phase change and mass transfer [7]. The heat is then conducted into the fins of the heat exchanger, where it is removed via forced convection from the fan. The heat flow path can be modeled as a series of thermal resistors as shown in Figure 3.4. 0 j-heat pipe 0 heat pipe-HX OHX-ambient Oj-a Figure 3.4: Thermal resistance model of remote heat exchanger 25 The total thermal resistance is empirically defined in a steady state condition as shown in Equation 3.3. (Eq. 3.3) E_ 'i Po we Power where 0 j-a is the thermal resistance in 0 C/W Tj is the transistor junction temperature of the CPU in 'C Ta is the ambient temperature in 'C Power is the power dissipated by the CPU in watts Lower thermal resistance allows greater cooling capacity for a given temperature difference. To reduce the total thermal resistance it is necessary to reduce one or more of the resistances in series. The most common method to reduce Oj-heat pipe is to use a lower resistance thermal interface material. The heat pipe thermal resistance can be reduced with a larger cross-section heat pipe. The easiest way to reduce the total resistance is to reduce OHX-ambient by using a heat exchanger with greater surface area, a larger more powerful fan to flow more air, or both. However, those options are in direct contradiction to the desire to make the systems smaller, thinner, and lighter. This chapter reviews the thermal monitor feature of Intel microprocessors and the thermal impact and power consumption characteristics of semi-conductor microprocessors. The next chapter looks at how those characteristics and features are a simulator for product quality. 26 Chapter 4 - Simulation Process The simulation tool used as the basis of this work was developed by Intel Corporation. The simulation runs within Excel (Microsoft Corp.) and uses JMP (SAS Institute Inc.) for data analysis and graphing. The tool is used to understand the impact of different test conditions for microprocessors and how they affect factory yield and quality defects in the field. Quality defects are defined broadly as when a device does not perform as expected. Setting the test conditions too conservatively results in unnecessarily scrapping microprocessors at the factory that would have performed flawlessly in the field. Setting the test conditions too aggressively increases the yield at the factory, but will result in quality defects in the field. This chapter provides an overview of how the simulator works at a very high level, and the modifications that where made to it. 4.1 Monte Carlo Simulator In Monte Carlo simulations, a model is evaluated repeatedly using parameter values that are randomly drawn from statistical distributions. The results of the repeated simulations are then evaluated in terms of their statistical distributions. Monte Carlo methods are particularly useful when dealing with complex systems and a variety of failure rate models [9]. 27 The tool used in this project simulates the microprocessor product creation, testing, and end use in a virtual environment. The simulator uses the Monte Carlo technique to randomly sample the products, testers and end use conditions from a statistical description of the characteristics of each. Millions of parts can be sampled, tested and evaluated under use conditions in this virtual environment leading to estimates of failure rates in defects per million (DPM). By treating parts as samples of a distribution, worst case assumptions can be avoided and more accurate assessments of expected field failure rates are obtained. First, the product is created by randomly choosing parameter values for Fmax, Dynamic current, and Isb. Distributions for those parameters for a product are developed from production data and engineering characterization data. The randomly chosen parameters are fed into a set of equations that define the operating characteristics of the microprocessor. After the program creates a simulated product it is virtually tested by choosing a set of parameters that define the test environment. The automated test equipment (ATE) is described by its operating tolerances and the test condition settings. Statistics and distributions of the product characteristics that pass the testing can be calculated. If the device under test passes, it is then simulated in the end-use environment. The enduse environment is characterized by tolerances for the electrical and thermal specifications of the system platform or chassis, the ambient air temperature, and degree 28 to which the microprocessor is utilized or stressed. Statistics and distributions of the product characteristics in the field can be calculated. Figure 4.1 is a diagram showing an overview of the simulation. Product 0 tibtoteristicC A TE, ttings and Ed s Producer iImpact Figure 4.1: Simulation overview. Source: Intel Corporation This work focuses on the thermal aspects of the End-Use Characteristics and the Customer Impact. From this perspective the most important value that the simulator calculates is the temperature of the microprocessor (Tj) in use as defined in Equation 4.1. Tj = Tair + Tsys + O*Power (Eq. 4.1) where: Tair is the ambient temperature that the system is operating in. Tsys is the temperature rise within the system due to other electrical components. o is the thermal resistance of the OEM thermal solution in *C/W. Power = V*(Isb + AR*Idyn) Isb is the leakage or standby current AR is the software application ratio Idyn is the dynamic current The above equations have been simplified for presentation here. Temperature, current and voltage are all coupled in the simulation model. 29 4.2 Simulator Modifications Several comments need to be made about the simulator methodology. The methodology is not concerned with behavior integrated over time. It mainly focuses on instantaneous catastrophic events such as Fmax. Therefore, no time dimension is included in the model. For inputs that will vary over the lifetime of a device, such as software application ratio, and in the case of mobile computers, the ambient environment, the methodology is only concerned with the worst-case instances that may lead to a failure. For these reasons, the application ratio is set at the TDP ratio and the ambient temperature is set at 35 0 C. However, throttling is not an instantaneous catastrophic event. It can be a reoccurring event that happens frequently over the lifetime of a device. One of the goals of this project is to determine how often throttling will occur for a particular device. This explicitly requires some inclusion of a time or frequency dimension. The application ratio and ambient temperature variables are used to bring that time dimension into the simulator. 4.2.1 Application Ratio and Ambient Temperature Distribution With assistance from the programmers at Intel the simulation code was modified to accept statistical distribution parameters for the application ratio that are sampled by the Monte Carlo method. The application ratio parameters are developed to include an estimation of the amount of time spent along each point on the curve (see Chapter 5). 30 The simulator already had the capability to sample from a statistical bi-modal ambient temperature distribution. A time-based model of this distribution for mobile computers is developed in Chapter 5. 4.2.2 Performance Impact Modeling Another goal of this project is to estimate the performance impact of throttling. When the simulator calculates Tj for a device that is greater than the throttle set point temperature, it then back calculates the AR required to maintain the maximum temperature for the device. This reduction in AR is translated into a performance loss based on performance data collected in the lab. The problem was that all of the performance loss data was focused on the TDP application ratio. With the application ratio distribution model, information about performance loss from a much greater range of applications is required. Lab testing of a broad range of software benchmarks under throttling conditions showed consistent performance decreases for similar AR reductions. A model is fit to this performance data. The model, shown in Figure 4.2, is exponential and is included as a post processor in JMP. 31 (0 -j E (0 AR Reduction Figure 4.2: Performance loss model. The most important attributes of Figure 4.2 are that for small levels of AR reduction (throttling) the performance loss is negligible, and that there is a rapid transition from small performance loss to very significant performance loss. The implications are three regimes: limited throttling produces almost no performance impact; throttling can produce moderate performance decrease and the system will still function; performance loss increases rapidly to the point where system functionality may be compromised. This chapter looks at the framework of a product quality simulator that uses Monte Carlo methods, and how to incorporate a time dimension into the simulation by using statistical usage models. The next chapter looks at the development of the user models in detail. 32 Chapter 5 - Usage Models In order to predict the quality of a product some determination of its intended use has to be established. These are termed usage models and typically contain information regarding how the product is used, how often and for how long, and under what conditions. Software benchmark companies have conducted a significant amount of research and data collection on how people use software [10,11]. This thesis will build on that work and focus on which software applications are being used, processing power those applications utilize, and the environment and ambient temperature in which the computer is being used. 5.1 Mobile Choice Survey The basis for this usage model was data from a survey of mobile computer end users that Intel Corporate Marketing Research had conducted in the spring of 2002. Over 900 users were surveyed from multiple countries across the globe. Embedded in the survey were five questions that related to the manner in which people used their mobile computers. These questions included: What environments? Hours per week on? Hours per week from batteries? Hours per week actively using? 33 What type of activities(software applications)? Appendix A contains the actual survey questions as well as sample tabulated answers. Another useful feature of the survey data is that it included sample-weighting parameters. These are used to more accurately aggregate results among various sub-groups. For example, consider 200 users surveyed in two different countries. If one country has a total user population of 10,000,000 users and the other country only has 100,000 users, simply combining the survey results would not provide an accurate picture of the entire 10,100,000-user population. Differences in the ratios between the number of users surveyed in a subpopulation and the total number of users in a sub-population are accounted for by the sample weighting parameters. 5.2 Defining a "User" The definition of a user for this thesis is focused on defining an application ratio or application ratio distribution that can be used as an input to the simulation model. Having the raw data on the types of applications that are employed by mobile computer users was the first step. In order to be utilized in the existing modeling framework, the data needed to be distilled down to a single statistical distribution, while maintaining an accurate representation of the of the variety of user profiles. Several methods of defining the usage distribution were considered, both from the standpoint of how to characterize each software type and how to characterize each user. 34 5.2.1 Software Characterization - Application Ratio As discussed in Chapter 2, the dynamic power that a CPU consumes running a particular software application varies significantly over time as it performs different operations. It is useful to think of software's dynamic power consumption in terms of an application ratio (AR). Software applications, many of which are industry standard benchmarks that corresponded to the 14 software types listed in the survey, were tested for their power consumption. Two measures of power consumption were considered: average power and the peak 5-second rolling average power. The power measurements for each software type were converted to application ratios. The results are shown in Table 5.1. Table 5.1: Software application ratios. Source: Intel Mobile Power Lab Software Type Word processing Spreadsheets Presentation Creation Email Internet browsing Games Audio/MP3 VideoNiewing DVDs Video/Downloading from Internet Video/inputting to Devices Graphics/CAD Web content creation Programming tools Simulations/math models Average App Ratio 0.29 0.29 0.29 0.18 0.22 0.71 0.32 0.27 0.47 0.86 0.59 0.78 0.77 0.69 35 Veak 5-Sec Ave App Ratio 0.78 0.78 0.78 0.18 0.22 0.77 0.37 0.39 0.83 0.88 0.74 0.79 0.77 0.70 5.2.2 User Characterization With information about the types of software applications that each respondent uses the next challenge is to define a way to characterize each user. Two approaches are taken. The first approach followed in the tradition of the worst-case methodology; each user is defined by the most strenuous software type that he/she reported in the survey. A single application ratio can then be associated with each user based on the application ratio definition. The second approach involved calculating the expected value of the application ratio for each user. This is the average value of the application ratios reported used, weighted by the percent time reported for each software type. The expected application ratio calculated for each surveyed user is given by Equation 5.1: E(AR)i= Al*Ti,1 + A2*Ti,2 + ... A14*Ti,14 where Al is Application Ratio for software type I where Ti, 1 is % time user i reports for software type 1 Both methods produce different results depending on the definition of software application ratio. 36 (Eq. 5.1) 5.3 Usage Model Matrix A two by two matrix can be formed with the two definitions of software application ratio and user characterization, as shown in Figure 5.1. Software Application Ratio Peak 5-Sec Average 0 N Worst Most conservative Case Will it ever happen? Average (- 0) Least Conservative -C Expected o Will it happen most of the time? Figure 5.1: Usage Model Matrix The nomenclature to describe each quadrant is the user characterization (Worst-Case, Expected) followed by the software application ratio (Peak 5-Second Average, Average). Each quadrant of the matrix encompasses a different degree of conservatism, and represents a different time scale or frequency of events when used in the simulator. Each of the quadrants is explored in detail, starting in the upper left and going counterclockwise. 5.3.1 Worst-Case Peak 5-Second Average Usage Model The application ratio distribution that results from the worst-case 5-second average usage model is displayed in Figure 5.2. 37 Worst Case 5-second 0.8 0.7 C 0.6 <0.5 0.4 0.3 9L 0.2 m 0 .1...... 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Application Ratio Figure 5.2: Worst-case 5-second average usage model application ratio distribution. This application ratio distribution is very similar to the single value worst-case assumption. This is because almost all users will at some point use software types that have the potential to consume a high fraction of the processor power. This analysis validates the choice of TDP application ratio for a quality analysis at worst-case conditions. However, this type of distribution offers no insight to how often a recurring event, such as throttling, would occur. This type of input to the quality simulator will only answer one time-based question - "Will it ever happen?" This type of analysis is appropriate for catastrophic unrecoverable events, and may be appropriate for catastrophic recoverable events (e.g. Fmax). This input is also useful to calculate the absolute maximum temperature a CPU will see in use. However , it provides absolutely no information on 38 the frequency of events; would the event happen once a year? once day? once and hour, once a minute? No conclusions can be drawn. Does it make sense to use this as the standard for events where the consequences are transitory and functionality is maintained? For example, a simulated person who used their mobile computer for email, Internet browsing, and word processing would be assigned an application ratio of 0.78 under this model. If that 0.78 application ratio, in combination with the CPU, system, and ambient factors, created a throttling event, then the result of the simulation would record the user as suffering from a throttling system. However, if that user was word processing only 10 percent of the time, and word processing software reaches it peak power infrequently and for short durations, throttling may be imperceptible, if it occurs at all. In such a case, the user would never experience a quality event. For these reasons the worst-case 5-second average usage model is not useful for calculating throttling or other transient events, but it does provide an upper bound on the number of units that could be affected. 5.3.2 Expected Peak 5-Second Average Usage Model The application ratio distribution that results from the expected 5-second average usage model is displayed in Figure 5.3. 39 Expected 5-Second Average 0.1 .!n 0.08 0.06 0.04 2 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Application Ratio Figure 5.3: Expected 5-second average usage model application ratio distribution. This is an interesting distribution that uses the peak 5-second average application ratio definition, accounting for the highest powers (and hence temperatures) that each software type can generate. The peak 5-second average application ratio definition also removes any influence of processor idle time in the calculation of the application. At the same time by using the expected user characterization, "credit" is given for time spent in low power applications. Running the simulation with this usage model input should produce a typical distribution of maximum temperatures. Note that this is different than the absolute maximum temperatures, which are calculated using the worst-case 5-second average usage model. The expected 5-second average usage model also provides some insight into the frequency of events. Basically, simulating with this type of usage model answers the 40 question, "Is it expected to ever happen?" This is actually quite different than "Will it ever happen?" A processor could spend minutes throttling over its 1000's of hours of operating lifetime. Did the processor ever throttle? Yes. Would the processor be expected to throttle? No. Let's return to the example person who used his/her mobile computer for email, Internet browsing, and word processing. If this person spent 10 percent of his/her time word processing and split the remaining time equally between email and Internet browsing their expected 5-second average application ratio (from EQ 5.1 and Table 5.1) would be: E(AR) = 0.78*0.10 + 0.18*0.45 + 0.22*0.45 = 0.258 With a 0.258 application ratio it is highly unlikely that the simulator would calculate a throttling event. The interpretation of this result is that the user would not be expected to be affected by throttling. This does not mean the user would never experience throttling. For a different user with a higher expected 5-second average application ratio the simulator may predict throttling. The interpretation of a throttling result is that the CPU would be throttling "sometimes", not all of the time, but more than infrequently. 5.3.3 Expected Average Usage Model The application ratio distribution that results from the expected average usage model is displayed in Figure 5.4. 41 Expected Average 0.2 0.18 0.16 0.14 0.12 - 0.1 0.08 o 0.06 0 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Application Ratio Figure 5.4 Expected average usage model application ratio distribution. This distribution includes the effect of processor idle time when running applications, and includes the influence of time spent in low power applications. Inputting this usage model produces the expected average temperature distribution, but no information about peak transient temperatures. Using this distribution will only predict events that happen most of the time. This may be useful for predicting the number of laptops that would never work properly due to excessive rate of throttling. If a simulated CPU is predicted to throttle under this model, the interpretation is that on average it would be throttling to some degree for its active life. Because this model will predict the long-term average temperature distribution, it may be useful in long-term reliability modeling. 42 5.3.4 Worst-Case Average Usage Model The application ratio distribution that results from the worst-case average usage model is displayed in Figure 5.5. Worst Case Average 0.4 0.35 .I 0.3 0.25 0.2 0.15 0 L 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Application Ratio Figure 5.5: Expected average usage model application ratio distribution. This model only accounts for the most strenuous application that each respondent reported using in the survey. It then defines each of those applications by their average application ratio, including processor idle and low-power time. Using this model will predict the number of users that experience events on average when running their worstcase applications. However, it does not provide any information on how often these events will happen because it does not include any information on how often the worstcase applications are used. 43 5.4 Application Ratio Distribution Parameters To utilize an application ratio distribution model in the simulation program it is necessary to define it terms of statistical distribution parameters. The first step was to choose a usage model from the matrix. The expected 5-second average application ratio usage model is chosen because it provides the best balance between the maximum power that each software type can consume and the percentage of time spent in both high and low power applications. It is a good compromise for throttling simulations between the highly conservative worst-case 5-second average model, which will count throttling events that happen so infrequently that a user may never notice, and the expected average model, which only counts events if they are occurring more than half of the time. Because the application ratio (AR) is bounded between zero and one, the data is transformed into the Z variable as shown in Equation. 5.2. Zi = log [E(AR)i/(1 -E(AR)i)] The Z variable is then approximated by a normal, and the mean (/A) (Eq. 5.2) and standard deviation (&) are calculated. The normal parameters are also adjusted by different weighting factors for each user (w) from the survey data. The equations for the mean (fl) and standard deviation (&) are given in Equation 5.3 and 5.4 respectively. 44 $=1 xZ2(w x Z,) ~ w^ x~ 3=~ 5.3) X(Eq. Wi Wi 'X Zi z )2 PEq. (I1 5.4) Multiple weighting factors and combinations are examined. The sample population weighting factors (as described in 5.1), the hours of actual use reported by each user, and the combination of the two are each used to calculate a set of normal distribution parameters. The Z normal parameters are entered directly into the simulation program, which transforms the Z values back into AR as shown in Equation 5.5. AR=1Oz /1Oz +1) (Eq 5.5) The four resulting application ratio distributions (raw, weighted by population factor, by hours, and by population factor*hours) are shown in Figure 5.6. 45 Expected 5-Second Average rpp Ratio j '-- raw fit -weighted 0 0.2 0.4 0.6 0.8 by Hours - weighted by Population Factors - weighted by Population Factors*HOURS 1 Application Ratio Figure 5.6: Expected 5-second average application ratio data and normal functions. Weighting the data by the hours of active use that each user reported makes the distribution more accurately reflect probability that a CPU would be operating at one condition compared with another. Take the example of two users; one operates his/her computer for 10 hours per week at an AR of 0.35; the second operates his/her computer for 40 hours at an AR of 0.70. If we were to randomly sample the operating condition of a CPU in this example, 0.70 would be four times more likely. Weighting the raw data by the hours of active use corrects for this affect. It is interesting to note that weighting the data by hours reduces the standard deviation. This suggests that people who spend more time actively using their mobile computer also tend to have an average expected 5-second average application ratio that is closer to the mean of the distribution. 46 As discussed in Section 5.1, weighting the data by the sample population factors accounts for difference in the ratio of sample size to total population for different subgroups. It is interesting to note that weighing the data by these factors does not significantly shift the mean or standard deviation. This suggests that mobile computer users have similar software type usage patterns across the geographies and sub-groups surveyed. The result of the above analysis is that both the hours of active use and the sample population factors are used in developing the normal parameters to be used in the simulation model. 5.5 Ambient Temperature Distributions A model of the mobile computer ambient temperature distribution is also developed from the Mobile Choice Survey data. The base for this mobile ambient temperature model is desktop model that was developed by Corporate Marketing Research. The desktop model focuses on indoor ambient conditions, and concludes conservatively that 40 percent of desktop computers are used in an air-conditioned environment. The ambient temperature estimations for non air-conditioned environments are based on daily maximum summer temperatures. Analysis of the mobile survey data shows that 87 percent of the active use time fell into one of the four indoor categories: Office, Home Office, Home, Hotel. All of these 47 environments conservatively fit into the desktop indoor pattern described above. The remaining 13 percent of the active use time (Airport, Car, Dorm, Classroom, Outdoors, Other) was placed entirely in the non air-conditioned category. The resulting bi-modal ambient temperature distribution for a mobile computer is shown in Figure 5.7. -0.08 -0.06 -0.04 0.02 18 20 22 24 26 28 30 32 34 36 Ambient Temperature (deg C) Figure 5.7: Mobile computer ambient temperature distribution model. This chapter details the development of mobile computer usage models from end-user survey data and software application ratios. A two by two matrix of usage models is developed with different levels of conservatism in each quadrant. A model for the mobile ambient operating temperature is also developed. In the next chapter the usage and ambient temperature models are used in product quality simulations to predict throttling events. 48 Chapter 6 - Simulation Results A Monte Carlo simulation program is used to evaluate the expected operating characteristics of microprocessors in mobile computer systems. Simulation results from both worst-case operating conditions and a statistical usage model are analyzed and compared. The statistical usage model is then used to compare the theoretical operation and performance of a true mobile system and a desktop arbitrage transportable system. Finally, an analysis is performed to evaluate the effect of reducing the throttle set point on a mobile CPU, and how this could reduce the number of quality events in the field. One hundred thousand simulations are used for all cases to tabulate results. 6.1 Worst-Case Operating Conditions vs. Statistical Usage Model The first case to be investigated was a simulation of a 2.2 GHz mobile CPU in a theoretical traditional thin & light mobile system with a the thermal solution capability that is designed to Intel's recommendations for that CPU. A comparison between the results using the TDP application ratio user model and the expected 5-second average application ratio user model and mobile ambient temperature distribution was performed. As shown in Figure 6.1 a and 6. 1b, the average predicted CPU temperature (Tj) drops by over 20 degrees C when evaluated with the expected 5-second average application ratio. 49 The range of Tj also increases dramatically, by more than a factor of four, and the distribution is better described by a normal distribution as predicted by the Central Limit Theorem. This is the expected result of convolving an additional normal distribution in the calculation in place of a single worst-case value. - -4. -4 - 3 -3. 044 E E a 0 0 Z 0 --- ns- 2 ni-3 F.ue .n P Mli tepeatr -:1rditd n s o . ~ 50 60 70 80 Tj in Use (deg C) 90 100 40 50 60 70 80 Tj in Use (deg C) 90 100 (a) (b) Figure 6.1la & b: Predicted CPU temperature distribution (Tj) in use for a 2.2 GHz mobile CPU in a thin & light system using (a) Worst-case operating conditions (b) Statistical usage model. The impact throttling, power reduction, and performance of this significant shift in predicted CPU temperature is summarized in Table 6. 1. 50 -2 -3 -1tiuin(j Pu sz 40 Z Table 6.1: Predicted operating characteristics of 2.2 GHz mobile CPU thin & light systems. Worst-case operating conditions CPUs throttling Statistical usage model 0.91% 0.01% 9.1% 9.7% 5.1% 16.4% (90% upper confidence limit) Maximum power reduction required Maximum performance reduction The 90 percent upper confidence limit on the percentage of CPU's that will experience throttling in the field drops by almost two orders of magnitude. It is interesting to note that to maintain the CPU temperature limit, the maximum power reduction required is slightly greater when using the expected 5-second average application ratio distribution in the simulation. This is true because, with the application ratio distribution, it is possible to simulate an application ratio higher than the TDP application ratio. The higher application ratio results in higher power consumption, which then requires a greater power reduction to maintain the temperature limit under certain conditions. Also of interest is that the slightly greater maximum power reduction required in the application ratio distribution simulation. This results in more than three times the performance loss. This is due to the highly non-linear relationship between power reduction and performance loss. It is possible to examine in greater detail the impact to the CPU's that were predicted to throttle. Figure 6.2 shows the fraction of the power reduced to maintain the CPU temperature limit with the TDP application ratio for the subset of the CPU's that are 51 throttled in simulation. It is important to remember that the throttling subset represents only 0.91% of the total population. 0.9- -0.20 I0.8- 0.15 EL 0.5 n -0.1 a0 0 0.3- CL -0.050. 0.10 .01 02 03 .04 .05 .06 .07 .08 .09 0 Power Reduction .01 .02 .03 .04 .05 .06 .07 .08 .09 Power Reduction (a) (b) Figure 6.2a & b: Predicted power reductions for throttling 2.2 GHz mobile CPU in thin & light systems using worst-case operating conditions (a) Histogram (b) Cumulative distribution function. As can be seen from Figure 6.2b, half of the CPU's that are simulated to throttle under the TDP usage model require less than two percent power reduction to maintain the temperature limit. Figures 6.3a and b show the CPU performance reduction as a result of the power reductions of Figure 6.2a and b 52 . -0.50 00.9- -0.40 0.0.7- 0.30 8. >, 00 0.2020..1- .01 .4.04 0 .02 kirirt~ .03 Performance Impact .50 .05 .01 .02 .33.04 .05 PromneIpc (a) (b) Figure 6.3a & b: Predicted performance reductions for throttling 2.2 GHz mobile CPU in thin & light systems using worst-case operating conditions (a) Histogram (b) Cumulative distribution function. Due to the inexact nature of the performance correlation there is a slight offset at zero. What is important to note is that the majority of the CPU's that are predicted to throttle under the TDP application ratio assumption experience less than one percent performance impact. In fact, more than 90 percent of the throttling CPU's would suffer from less than two percent performance impact. This validates Intel's current mobile thermal design process. Intel starts with the assumption of the TDP application ratio user model and a 35 degree centigrade ambient temperature. Its success criteria are that throttling will only occur in rare circumstances and the performance impact should be practically undetectable. The simulation validates that using those assumptions, Intel meet its goals. 53 The picture looks very different with the expected 5-second application ratio distribution. Only one unit out of the 100,000 simulated with the application distribution indicated throttling. This was only in the extremely rare instance of a high ambient temperature (Ta > 33C), an extremely high application ration (AR > 0.85), and a high power CPU. 6.2 Mobile System vs. Desktop Arbitrage System Simulations were run to compare the performance and occurrences of throttling between two theoretical systems; a thin & light system with a 2.2 GHz mobile processor as in Section 6.1, and a transportable system with a 3.06 GHz desktop processor. The desktop processor can consume up to 146 percent more power than the mobile processor, and that power needs to be dissipated. The larger physical size of the transportable system allows for a thermal solution with a greater cooling capacity. For these simulations the thermal solution capability of the transportable system is 72 percent greater than the thin & light system, which is typical. However the transportable system with the desktop processor is still 31 percent thermally under-designed per Intel's current design recommendations. The OEMs may design and build thermally under-designed systems because they realize Intel is using a worst-case design methodology, and the OEMs may believe they can relax the thermal design conditions (i.e. the ambient temperature and processor power) without suffering any consequences. The first result of the simulation that needs to be mentioned is that with the TDP application ratio user model and a 35 degree C ambient temperature, Intel's current design assumptions, the simulation predicts that all of the desktop processor transportable 54 systems will experience severe throttling. They would all require a 10% - 50% power reduction to maintain the CPU temperature limit. If such extreme conditions are realistic it is safe to say that many of the systems would fail to operate at all. Running the simulation with the expected 5-second average application ratio and the ambient temperature distribution use model produces results that are more indicative of real world use. As shown in Figure 6.4a and 6.4b the average predicted CPU temperature (Tj) is approximately 15 degrees C higher for the "transportable" desktop CPU system when compared to the "thin & light" mobile CPU system. 55 ......... ..... ...... ..................................... .. -........... -........ ....... ........... ... ..................... ....... .................................. ......... .. ................. ............. .... - 1.. ........... -4 4 1 -3 -3 A? -.2 -2 M1 Hi La .10 --1 -- 1 -- 2 13 7 --3 Lii Dl] -0.04 80 60 70 Tj in Use (deg C) -- 3 -0.13 A -0.10 -0.08 -0.06 50 -- 2 II 0.08 40 Z a0 90 n -0.05 -0.02 - -0.03 40 100 50 60 70 80 Tj in Use (deg C) 90 a.. 100 (b) (a) Figure 6.4a & b: Predicted CPU temperature distribution (Tj) in use for (a) 2.2 GHz mobile CPU in a thin & light system using (b) 3.06 GHz desktop CPU in a transportable system The effect of the thermal monitor can clearly be seen in Figure 6.4b. The thermal monitor limits the right hand side of the temperature distribution to approximately 100 degrees C. The impacts of these predicted CPU temperature distributions are summarized in Table 6.2. 56 Table 6.2: Operating characteristics predicted with statistical usage model of theoretical mobile and transportable computer systems CPUs throttling (90% upper confidence limit) Maximum power reduction required Maximum performance reduction 2.2 GHz mobile CPU thin & light system 0.01% 3.06GHz desktop CPU transportable system 19.6% 9.7% 49.1% 16.4% Some systems would not operate properly The 90 percent upper confidence limit shows that nearly 20 percent of the desktop processor transportable systems would be expected to experience some degree of throttling. This also means that 80 percent of the systems, if equally distributed among standard mobile computer users, would not be expected to experience any significant throttling. However transportable system users may have a different profile than standard mobile computer users. The distributed usage model still predicts that some of the desktop processor transportable systems are thermally constrained to the point that not only the processor performance suffers, but also they may fail to operate entirely. We can investigate in more detail the impact to the 19.6 percent of the desktop CPU's that are predicted to throttle. Figures 6.5a & b show the fraction of power reduction required to maintain the temperature limit for the subset of the CPU's that throttle. 57 0.0 0.90.8- -0.06 0.04 _0.7- CC -0.02 E 050.40.30.2- 0.1- 0 .1 .2 .3 Power Reduction .4 0 .5 .1 .2 .3 Power Reduction .4 (a) (b) Figure 6.5a & b: Power reduction predictions under the statistical usage model for throttling 3.06 GHz desktop CPU in transportable systems (a) Histogram (b) Cumulative distribution function. From Figure 6.5b, 50 percent of the throttling desktop CPU's require 10% power reduction or less. These systems would continue to function, although at varying degrees of full performance when compared with an un-throttled equivalent system. Analysis of the performance of the entire throttling sub-population is shown in Figures 6.6a & b. -0.70 0.9- -0.60 0.7- -0.50 -00.60.5E 0.40 -0.30 S0.4 - . 0.20 0.2 - -0.10 0 .1 .2 .3 Performance Reduction 0.1- .4 Performance Reduction (b) (a) Figure 6.6a & b: Performance reduction predictions under the statistical usage model for throttling 3.06 GHz desktop CPU in transportable systems (a) Histogram (b) Cumulative distribution function. 58 Fifty percent of the throttling desktop units, which represent approximately 10 percent of the total population, suffer less than a five percent performance reduction. However beyond that the performance of the remaining sub population falls off dramatically. This is due to the highly non-linear nature of performance loss. The performance loss correlation is capped at 40%, which results in non-uniform distribution for highly thermally stressed systems, such as in Figure 6.6a & b. The reality is that at high levels of thermal stress, throttling and power reduction (>>10%), the performance of the processor is no longer the critical issue, rather function of the system becomes the critical issue. There are documented cases [12] of desktop processors used in mobile systems that resulted not only in reduced performance, but lack of function. With this data we can now make a direct comparison between the performance of a 2.2 GHz Mobile CPU thin& light system, and a 3.06 GHz desktop CPU transportable. For the mobile systems 99.99 percent would operate at their full rated performance. Only under very rare and extreme use conditions would they experience any significant throttling. In comparison, 80 percent of the desktop transportable systems would perform at their full performance level. However, the full performance of the desktop transportable system is greater than the mobile system's performance due to the desktop CPU's higher speed rating. 59 6.3 Recommended Change to Throttle Set-Point Another interesting analysis that can be performed with this methodology is the impact of the throttle temperature on the number of quality events that will be experienced in the field. Returning to the theoretical 2.2 GHz mobile CPU thin & light systems, multiple simulations were run where the only variable changed was the temperature at which the CPU began to throttle. The system parameters were held constant to reflect no change in the thermal solution. By definition the simulated systems with the lower throttle set point are thermally under-designed because there is less of a temperature difference to drive the heat out of the system (see Eq 3.3). There are two types of events that are of concern in this analysis, frequency maximum (Fmax) events, and throttling events. Fmax events occur when, due to elevated temperature, the processor cannot run at its prescribed frequency. Fmax events increase with increasing CPU temperature. Throttling events occur when a CPU reaches the throttle set point temperature, and a power reduction is required to not exceed that temperature. For a given CPU temperature distribution reducing the throttle set point increases the number of throttling events. Figure 6.7 shows the simulation results of how the two events vary as a function of the throttle temperature set point when evaluated with the standard worst-case TDP application ratio and 35 'C ambient temperature. 60 --U-- o 88 90 92 Fmax Throttle Total 94 96 98 100 102 Throttle Set Point Figure 6.7: Predicted quality events as a function throttle set point for a 2.2 GHz mobile CPU thin & light system under worst-case operating conditions. As can be seen in Figure 6.7, in this worst-case scenario throttling events dominate the Fmax events. To minimize the total number of quality events the throttle set point should be set at 100 (set points above 100 were not considered). Using a statistical usage model to provide a more realistic view of the number of throttling events provides a very different picture. Figure 6.8 represents the simulation results when the expected five-second average application ratio distribution and ambient temperature distribution are used to calculate the CPU temperatures and hence the occurrences of throttling. 61 0 -- 0 Fmax Throttle . Total 8. 3 90 92 94 96 98 100 102 Throttle Set Point Figure 6.8: Predicted quality events as a function throttle set point for a 2.2 GHz mobile CPU thin & light system under statistical usage model. As shown in figure 6.8, the number of predicted throttling events no longer dominates the Fmax events when evaluated with the distributed usage model. With the throttle set point at 100, the expected number of throttling events is five times less than predicted with the worst-case use model. And, as the throttle set point is reduced from 100 the number of Fmax events declines more rapidly than the number of throttling events increases. Therefore to minimize the total number of quality events the throttle set point should be reduced to 95. This will reduce the total number of quality events by 37 percent and the number of catastrophic events by 57 percent. This is counter-intuitive to the traditional thermal design mindset. Recall that no changes are made to the definition of the thermal solution for this theoretical system. The thermal 62 solution is appropriately sized for a CPU with a maximum temperature of 100. As the CPU temperature limit is reduced, thermal design principals state that a larger thermal solution is required. Otherwise the system will be considered thermally under-designed. We know that thermally under-designed systems generate a greater number of quality events for a given throttle set point. However, by adjusting the throttle set point a tradeoff can be made between Fmax and throttling quality events. As shown in Figure 6.8, the number of events can go down when system is thermally under-designed by virtue of a lower throttle set point. Of course, lowering the throttle set point and specifying a larger thermal solution could achieve even lower event rates. That would place additional burden on system designers and OEM's, and runs counter to the trend of making systems smaller and lighter. Perhaps the most important aspect of these results is that the thermal design process need no longer be one-dimensional. The size and capacity of the thermal solution is not the only variable Intel can manipulate in its thermal design and specification process. By using the throttle set point, the thermal solution size, a realistic representation and understanding of throttling and Fmax events, and an understanding of the OEM thermal solution capabilities, Intel can better optimize the system. It is also important to understand the impact on the CPUs of the proposed change to a 95*C throttle set point. Figures 6.9 and 6.10 show detailed information on the simulated CPU temperature distributions for a 2.2 GHz mobile CPU thin & light system with a throttle set point of 100*C and 95*C, respectively. 63 '4 2 -3 -2 S C a .75- Quantiles 15 -- z01 25- -- 2 100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median quartile .25.0% 10.0% 2.5% 0.5% 0.0% minimum I 98.723 92.469 88.130 82.639 77.197 70.920 Moments Mean Std Dev Std Err Mean upper 95% Mean lawer 95% Mean N 70.83124 9.0753978 0.0514445 70.932074 70.730407 b4.4U 58.973 53.011 48.213 38.949 ni -0.058 -0.04 -0.042 40 50 60 70 80 Tj in Use (deg C) 90 100 Figure 6.9: Predicted CPU temperature distribution (Tj) in use for a 2.2 GHz mobile CPU with throttle set point at 100*C in a thin & light system 64 31121 -2 m- 1~ . :3 as G 0 .r-1 .1a0 "5DI- --213 -- m.- -32 i L -- Moments Quantiles -3 -3 100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum 97.162 92.418 88.130 82.639 77.197 70.920 64.548 58.973 53.011 48.213 38.949 Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N -I70.82B178 9.0673384 0.0513988 70.928921 70.727434 31121 1.------ -2500 2000-.! -1500 40 40 50 60 70 80 U 90 Figure 6.10: Predicted CPU temperature distribution (Tj) in use for a 2.2 GHz mobile CPU with throttle set point at 1 00*C in a thin & light system As can be seen in a comparison of Figures 6.9 and 6.10, reducing the throttle set point from 100*C to 95*C has a negligible impact on the overall temperature distribution. This is expected as the throttle set point only affects the far right tail of the distribution. Despite the minor effect on the overall temperature distribution there is a noteworthy effect on the number of units that throttle, and the performance impact of the throttling. A comparison of the summary data is presented in Table 6.3. 65 Table 6.3: Operating characteristics predicted with statistical usage model of 2.2 GHz mobile CPU thin & light systems. Throttle Set Point 100 0C CPUs throttling (90% upper confidence limit) Maximum power reduction required Maximum performance reduction Throttle Set Point 95 0 C 0.01% 0.15% 9.7% 19.9% 16.4% 43.4% The 90 percent upper confidence limit of the number of units throttling has increased by a factor of 15, but it is still less than 1 in 650. The maximum power reduction and the associated performance loss to keep all of the CPU temperatures within specification have also more than doubled. Figures 6.11 a & b show the distribution of the power reduction required to keep the throttling CPUs at the 95*C temperature limit. 66 1.0- 0.9- -0.20 0.80.7 - - 0.7 -0.15 ~0.5D.E5 -0.10 _ a0.3-0.05 0.20.1 0 .02 .04 .06 .08 .1 .12 .14 .16 .18 .2 Power Reduction 0 .05 (a) .1 Power Reduction .15 (b) Figure 6.11 a &b: Predicted power reductions for throttling 2.2 GHz mobile CPU with throttle set point at 95*C in thin & light systems using statistical usage model (a) Histogram (b) Cumulative distribution function. Fifty percent of the CPU's that throttle need less than a five percent power reduction, and 90+ percent of the CPU's that throttle need less than 10 percent power reduction to stay within the maximum temperature specification. Again, the one outlier at 20 percent power reduction is the result of the extremely rare instance of a high ambient temperature (Ta > 33C), an extremely high application ratio (AR > 0.85), and a high power CPU. Figures 6.12a & b show the results of translating the power reductions into performance reduction metrics. Over 90 percent of the throttling CPU's would suffer less than five percent performance loss. The remaining few percent of the throttling population may suffer significant performance loss, or even experience some functional problems, however these would be at most 1 in 100,000 of the total population. 67 .2 1.0-1.00 0.90.8- 0.75 0.7- - no 0.6 -0.50B ~0.4 0.30.2 0.1 0.25 0 .1 .2 .3 Performance Reduction r- 0 .4 I I .05 .1 (a) I I I I I I .15 .2 .25 .3 .35 Performance Reduction .4 (D) Figure 6.12a &b: Predicted performance reductions for throttling 2.2 GHz mobile CPU with throttle set point at 95*C in thin & light systems using statistical usage model (a) Histogram (b) Cumulative distribution function. This chapter reviews the results of the three simulation comparisons: worst-case operating conditions vs. statistical usage model, mobile system vs. desktop arbitrage system, and 100 *C vs. 95 *C throttle set point. The next chapter concludes with recommendations and future uses for the methodology developed. 68 ~1 .45 Chapter 7 - Recommendations and Future Use The results of this work point towards three topics for future consideration. The first focuses on validating the process. The second involves new ways of segmenting the market and opportunities for new products. The third focuses on strategic relationships within the industry. 7.1 Validation One of the barriers to adopting this usage model based methodology is that some people, engineers in particular, do not believe that users can accurately report on a survey the percent time that they use each software type. Admittedly the data is not precise as most respondents parsed their time down to five or ten percent increments, and the accuracy of any one user is questionable. However in the opinion of the author the data accurately represents the mobile user population as a whole. We could not determine any reason that there would be systematic bias in the data. To overcome this barrier a validation of the survey results is desirable. One method is to place software on the laptops of a representative group of mobile users that records the applications being used. The software can periodically sample and record the application data from the Windows Task Manager shown in Figure 7.1. 69 Figure 7.1: Windows Task Manager showing applications being used. The aggregate data sampled from the Windows Task Manager would then be compared to the aggregate data from the end user survey. The distribution of software use should be similar. The group of users having their Windows Task Manager sampled could also be given a use survey to fill out. A direct comparison of reported use pattern to measured use pattern on a per user basis could be performed. In many ways this validation approach is the simplest because it is independent of the computer system hardware. It focuses only on the software applications. 70 It would be tempting to sample the CPU Usage function from the Windows Task Manager and use that as a proxy for the software application ratio. This would skip the step of measuring and assigning an application ratio to each software type. Unfortunately the CPU Usage in the Windows Task Manager does not correlate well with the power consumed by the CPU [8]. The most important output of this methodology and simulation is the CPU temperature distribution profile. This is the variable that drives throttling and performance impact. Ideally one would like to validate the temperature distribution output from the simulation to real-world data. This is possible, software can be placed on a computer that samples and records the CPU temperature. However to make a valid comparison one would need a appropriately sized representative group of mobile users, all of whom had the same computer system and processor. The logistics of coordinating such a study is daunting. 7.2 Market Segments and Product Positioning Section 6.2 shows that 3.06 GHz desktop processor in a transportable form factor may not perform to its full potential for all users. In fact, because they are thermally underdesigned and cannot properly cool the microprocessor, as many as 10 percent of those systems may fail to function properly for their end-users. However there is significant demand for higher power processors in the transportable segment as evidenced by the increase in desktop arbitrage. Using the simulation and methodology presented here as a path-finding tool, Intel could determine how powerful a processor can operate adequately 71 in a transportable form factor. They could then increase its mobile product offerings to include processors up to that power level that also have other mobile specific features enabled. Section 8.3 shows that reducing the throttle set point to 95 0 C on a traditional mobile processor can reduce the number of total quality events that the general population experiences by 37 percent and the catastrophic events by 57 percent. However this would leave a small group, approximately 0.1 percent of the population, of high power users who would be stressing their processor to the point that they would suffer significant performance loss. One solution is to allow those users to identify themselves as high power users and offer them a processor and/or system that meets their needs for a premium price. From a technical perspective there are several ways to satisfy their demands. Intel could select processors from the screening at the end of the production line that can operate reliably at higher temperatures, set the throttle set point on those processors at 100 0 C, and sell them into the premium segment. Or Intel could select the processors that have the lowest leakage current, and therefore run at the lowest power and lower temperatures, and sell then into the premium segment. A third option is that Intel could enter into closer partnerships with one or more of the OEM's and have them offer a premium system with a more robust thermal solution for the power user segment. The importance of these partnerships is discussed in the next section. 72 7.3 Industry Dynamics and Strategic Partnerships Much of Intel's success in the computer industry is attributed to the modular product and supply chain architecture of the personal computer [13]. The relationship between system thermal performance and CPU performance analyzed in this thesis points to a product architecture that is becoming more integral. In particular, as processing power increases and mobile systems become smaller, overall system design becomes more important and needs to be better integrated, as predicted by Whitney [14] This could be problematic for Intel in the future. Intel has built the "Intel Inside*" brand on the premise that the processor is the most important feature of a PC, more so than even the OEM brand. However if overall system design begins to be the constraint on performance, consumers may place less emphasis on processor brand and more on the OEM or system designer. This would be a significant shift in the power of the industry players and a major loss for Intel. Intel should use the resource of its brand strength to leverage themselves into other areas of the mobile PC architecture, particularly in areas where integration is becoming more important. A recent example is the launch of Intel@ CentrinoTM mobile technology [15]. The Intel@ CentrinoTM mobile technology is package consisting of a CPU, chipset, and wireless network connection. We suggest that Intel also form closer partnerships with the OEMs and system designers. As the CPU performance gains greater dependence on the 73 thermal system performance, it becomes in Intel's best interest to do all that it can to help its customers design successful systems. Closer relationships with the OEMs would allow Intel more input to the design and verification of thermal solutions, and perhaps the next incarnation of "Intel Inside*" will include not only a CPU, chipset and network connection, but a promise of thermal system and processor performance as well. This may be crucial for Intel's survival if the double helix industry/ product structure, as described by Fine in Clockspeed, continues to head towards more integration [13]. 74 Appendix A: Selected Questions and Sample Answers from End-User Mobile Choice Survey @ 2002 Intel Corporation 8. Thinking of a typical week, please allocate your notebook usage across each of the following environments, so that the total equals 100%. Please enter '0' to any option not applicable to. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Office (not home office) Office setting in home Home (other than desk/office setting) Airport Car, train, bus, or airplane Hotel Dorm room Classroom or library Outdoors Other location (please specify) % % % % % % % % % % 12. Approximately, how m any hours each week is your notebook computer turned on? Please provide your best estimate and record whole hours only. 13. When it is turned on, approximately, how many hours each week do you run your notebook computer from its batteries, rather than plugging it into a wall outlet? Please record whole hours only. 14. Approximately, how many hours each week are you actively using your notebook computer? Please provide your best estimate and record whole hours only. 21 .Of the time that you, personally, spend using your laptop in an average week, what percent of the time do you spend on each of the following activities, for either business or personal use? Please allocate your time spent on each activity so that the total adds up to 100%. Please enter '0' for any activity that does not apply to you. 1. 2. 3. 4. 5. 6. Word processing (e.g. Microsoft Word) Spreadsheets (e.g. Excel) Presentation creation (e.g. Powerpoint) Email Internet browsing (excluding email) Games % % % % % % 7. Audio/MP3 % 8. Video - Viewing DVD's 9. Video - Downloading video from the Internet 11. Video - Inputting video files to other devices, e.g. camcorders or digital cameras % % % 11. Graphics/CAD % 12. Web content creation 13. Programming Tools 14. Simulations/Math Models % % % 75 .... ................ ... .............. *.............. .... USAGI!Yo ... .......... .......... ...... .... ......... ....... NOTEBOOK time in each environment Office ar ect 'Hotel Home Office' Home :Ai Dorm ---- - ----------------- User ID [Classroor,60utdoors ................ ..... . .. ... .. ....... .. ------------..... .............. .. ........ Other hrs on/Wk- hrs baffim /wli hrQ ",za/WV-. 01 ..................... U ...................... 40....................... .................... .. ........... 0... D ............. 101 01 .. ......... ....... ... ... .. .. ...... .... .... . ........ ... 0. ...... .. ........ ........ 0 U 0 0i ...... Uz OL 6: 6. 0 0 0 0: 9: ............ 0!z 4: ............. 0:........... 100 0: .............................. 0 ................... 0 .... 02 0 ................. 7:................. 71..................... 0:............. ................................. 6, ................... ......................... 1........... ........................................ .............. .................. .......................... .............. ............................... .... ...... I ...................................... ............. 15. 15 10: 0: 46 5: ........... 15: 15 .................................. 01.. U 01:......................... 0 .... 53 ........................................ 53 .................................... ........... I...................................... ....................... ............ ........................... .......... .................. ................................. .......................... ........ ... .... ....... .............. ...... ................................ .... ... ........... :.............. 6: 40 102 0: 10 0 10 0: 35 0 201 25 .0 . .......... .......... ....... ... . .... 01.... 0 100 On 0:: 0: U 01 6: 21 ... ........... ---7: ........ . ............ .... ....... ....... ....... .. ...... . ..................... p 8: 0. 100. .......... 0i 0 .............................. Op 01.......... G ......... 0. ........... 0! ............. 10: G........... ...... .. ... 0: .................................. ....................... ............ .................................. -1 ................ : ........................... 01 0: 9: 01 100 ............. 0: 0: U 10 6: ..... .0: .. 0:w ..... ...... b. 10 80: .............. 20, ... b, ............. 0: 0: 0! 0: 20 01 20 .... .... ....... ................................ .... ........ ... ... ... ....... ...... ......... ............................... .................................................... .................. ......................................... .. ... ....... ........... ..... ........... ........................... ........... ............ ..... ............. ............................. ... .. .......... ........................... .. . 3 .......... -............. .................. ...................... .................... .. . .. .......... .............. NOTEBOOK USAGEo time using respective applicatiph .. ..... .. ..... ........... ............. ............ ... ............ ..... Pres Internet Audio/: View ing VideoVideo- CAD Math/ 1 .... ................ ... 901, 2: 3: 75: 0:...... 6 ............. 0: ............ ......... .. 01: ........ ..99: ... ..................... ... User ID WP spreadsheet creation Email browsinq' mes MP3 DVD S download input graphicsl Web content.Prog tooli. Sim 01 10: 2, 0 Ga Oi 0: 0: 30 16. 0 __,____20 oil G 40, 0 0 0 b i l , G -," 0;.......... 01 0 ........................ .................................... ............ .................... ........... ...... .......... . i........... ... ............. .... ........... .... . 2. 3 2:................................... 2' 5 .. 85-....... 01 .......... 01 0 0: ............................ .............. ........................... ................. .............. ............ . .. .......... - 0 ........ ... ............ 0 - .......... .............. -0).......... . 3: .......... ............... 0.. 4 10: 0 5 30 0 0: G 0: 0 ..........1 G O 5 15 0 15 6 0 G 0 0: 01 G g: ............ 6 ... 1........... 5 ............................. 5 Q .......... 01 25 - ............ 2. ........... 0 .............. .............................. ............ .................. .......................... ................................... ............ ............................ .. ...... ........ .................................. ........ ............ ................ 2 .51 .............................. 0 5 7 0: 0: 0' 8 5 .. ........ 0. ........ 0q 01 0: 0 ............... ..... ..... ...... ........... 50: 8 ................ G 01 0 0 01 0:...... 0: 0.: 0 0. ....... ... ....... .............. .... O 0i 9..... 901 5:........................ 3 2: .............. 0. .... U 0: ......... 0. 0: 0: 0 ........... ........ .................... ,........... ............ ................ .. .......... ............... ... .......... ..... .................... 10 10i 60! 0: ..-0 0: 0: 1 a... 0: ...... '..",., .......... 0: On 120,;.,l'.1 0. .... G ........... ..... .. ........... I'.1111.11, ...... . ... .. G ... .............. .. ...... ...... .... ........... ... ......... ... .... 10. Bibliography [1] Building the power-efficient PC : a developer's guide to ACPI power management / Hillsboro, OR: Intel Press, c2001. [2] Pabst, Thomas and Volkel, Frank "Hot Spot- How Modem Processors Cope with Heat Emergencies" September 17, 2001, Tom's Hardware Guide accessed at http://www6.tomshardware.com/cpu/20010917/index.html [3] Rajsuman, Rochit. Digital hardware testing : transistor-level fault modeling and testing / Boston: Artech House, c1992. [4] Reliability challenges in a large microprocessor below quarter micron technology [Cambridge, Mass.]: MIT Microsystems Technology Laboratories, c1998 [5] Design of high-performance microprocessor circuits / New York: IEEE Press, c200 1. [6] Burd, Thomas D. Energy efficient microprocessor design / Boston : Kluwer Academic Publishers, c2002. [7] Phanse, Vivek "Enabling Technology for Mobile Computing" MS thesis, MIT c2000 [8] Intel Corporation, Jim Hermerding [9] Pukite, Jan, Modeling for reliability analysis : Markov modeling for reliability, maintainability, safety, and supportability analyses of complex computer systems / New York : IEEE Press, c1998. [10] An overview of MobileMark 2002 : Business Applications Performance Corporation accessed at http://www.bapco.com/MobileMark2002WhitePaper.pdf [11] An overview of SYSMark 2002 : Business Applications Performance Corporation accessed at http://www.bapco.com/SYSmark2002Methodology.pdf [12] Los Angles Superior Court, Case # BC278163, Class action v. Toshiba America Incorporated, July 23, 2002 [13] Fine, Charles, Clockspeed / Reading : Perseus Books, c1998 [14] Whitney, Daniel "Why Mechanical Design Cannot Be Like VLSI Design" April 4, 1996 accessed at http://web.mit.edu/ctpid/www/Whitney/papers.html [15] http://www.intel.com/products/mobiletechnology/index.htm?iid=sr+centrino& 77 ,- - .