计算机系统结构 -经典理论(1) 上海大学计算机学院 徐炜民 4/13/2015 1 目录 历史基础 冯氏 定义 等级 系列机 模拟与仿真 Amdahl’s Law 层次结构 现代组成定义 4/13/2015 P.2-6 P.7-17 P.18-22 P.23 P.24-26 P.27-30 P.100-113 P.120-125 P.126-127 2 布尔逻辑代数 早在1847和1854年,英 国数学家布尔发表了两 部重要著作《逻辑的数 学分析》和《思维规律 的研究》,创立了逻辑 代数。逻辑代数系统采 用二进制,是现代电子 数字计算机的数学和逻 辑基础。 4/13/2015 3 仙农计算机开关电路 1938年,信息论的创 始人、美国科学家仙 农发表论文《继电器 和开关电路的符号分 析》,首次阐述了如 何将布尔代数运用于 逻辑电路,奠定了现 代电子数字计算机开 关电路的理论基础。 4/13/2015 4 阿塔纳索夫计算机三原则 1939年,阿塔纳索夫提出计算机三 原则;采用二进制进行运算;采用 电子技术来实现控制和运算;采用 把计算功能和存储功能相分离的结 构。1939年,阿塔纳索夫还设计并 试制数字电子计算机的样机“ABC 机”,但未能完工。 阿塔纳索夫关于电子计算机的设计 方案启发了ENIAC开发小组的莫克 利,并直接影响到ENIAC的诞生。 1972年美国法院判决ENIAC的专利 权无效,阿塔纳索夫拥有作为第一 个电子计算机方案提出者的优先权。 4/13/2015 5 图林机 4/13/2015 现代通用数字计算机的数学模型 1936年,24岁的英国数学家图林发 表著名论文《论可计算数及其在密 码问题的应用》,提出了“理想计 算机”,后人称之为“图林机”。 图林通过数学证明得出理论上存在 “通用图林机”,这为可计算性的 概念提供了严格的数学定义,图林 机成为现代通用数字计算机的数学 模型,它证明通用数字计算机是可 以制造出来的。 图林发表于1940年 的另一篇著名论文《计算机能思考 吗?》,对计算机的人工智能进行 了探索,并设计了著名的“图林测 验”。1954年图林英年早逝,年仅 6 42岁。 维纳的现代计算机设计五原则 1940年,美国科学家维纳阐述了 自己对现代计算机的五点设计原 则:数字式而不是模拟式;以电 子元件构成并尽量减少机械装置; 采用二进制而不是十进制;内部 存放计算表;内部存储数据。 维纳在1948年完成了著作《控制 论》,这不仅使维纳成为控制论 的创始人,而且对计算机后来的 发展和人工智能的研究产生了深 刻的影响。 4/13/2015 7 电子计算机之父 “电子计算机之父”的桂冠,被戴在数学 家 冯·诺依曼(J.Von Neumann)头上, 而不是ENIAC的两位实际研究者,这是 因为冯·诺依曼提出了现代电脑的体系结 构。 4/13/2015 8 4/13/2015 9 冯·诺依曼小传-1 冯·诺依曼是本世纪最伟大的科学家之一。 他 1913年出生于匈牙利首都布达佩斯,6岁能心 算8位数除法,8岁学会微积分,12岁读懂了函 数论。通过刻苦学习, 在17岁那年,他发表了 第一篇数学论文,不久后掌握七种语言,又在 最新数学分支——集合论、泛函分析等理论研 究中取得突破性进展。22岁,他在瑞士苏黎士 联邦工业大学化学专业毕业。 一年之后,摘取 布达佩斯大学的数学博士学位。 转而攻向物理, 为量子力学研究数学模型,又使他在理论物理 学领域占据了突出的地位 4/13/2015 10 冯·诺依曼小传-2 1928年,美国数学泰斗韦伯伦教授聘请这位26岁的柏林大学 讲师到美国任教, 冯·诺依曼从此到美国定居。1933年,他 与爱因斯坦一起被聘为普林斯顿大学高等研究院的第一批终 身教授。 数学史界却坚持说, 冯·诺依曼是本世纪最伟大的 数学家之一,他在遍历理论、拓扑群理论等 方面作出了开创 性的工作, 算子代数甚至被命名为“冯·诺依曼代数”。物 理学界表示,冯·诺依曼在30年代撰写的《量子力学的数学基 础》已经被证明对原子物理学的发展有极其重要的价值,而 经济学界则反复强调,冯·诺依曼建立的经济增长模型体系, 特别是40年代出版的著作《博弈论和经济行为》,使他在经 济学和决策科学领域竖起了一块丰碑。 1957年2月8日, 冯·诺依曼因患骨癌逝世于里德医院,年仅54岁。他对电脑科 学作出的巨大贡献,永远也不会泯灭其光辉! 4/13/2015 11 戈德斯坦请教问题 1944年夏,戈德斯坦在阿贝丁车站等候去费城 的火车,偶然邂逅闻名世界的大数学家冯·诺依 曼教授。戈德斯坦抓住机会向数学大师讨教, 冯·诺依曼和蔼可亲,耐心地回答戈德斯坦的提 问。听着听着,他敏锐地从这些数学问题里, 察觉到不寻常事情。他反过来向戈德斯坦发问, 直问得年轻人“好像又经历了一次博士论文答 辩”。最后,戈德斯坦毫不隐瞒地告诉他莫尔 学院的电子计算机项目。 4/13/2015 12 从研制中产生思想 他为阿贝丁试炮场的计算问题焦虑万分。 他希望到莫尔学院看看ENIAC的研制。 从此,他成为了莫尔小组的实际顾问, 与小组成员频繁地交换意见。年轻人机 敏地提出各种设想,冯·诺依曼则运用他 渊博的学识,把讨论引向深入,并逐步 形成电子计算机的系统设计思想。 。 4/13/2015 13 4/13/2015 14 发现问题 在尚未投入运行前, 冯·诺依曼就看出这台机 器致命的缺陷,主要弊端是程序 与计算两分离。 程序指令存放在机器的外部电路里,需要计算 某个题目,必须首先用人工接通数百条线路, 需要几十人干好几天之后,才可进行几分钟运 算。 冯·诺依曼决定起草一份新的设计报告, 对电子计算机进行脱胎换骨的改造。他把新机 器的方案命名为“离散变量自动电子计算机”, 英文缩写是“EDVAC” 4/13/2015 15 4/13/2015 16 著名的“101页报告” 1945年6月,冯 ·诺依曼与戈德斯坦、勃 克斯等人,联名发表了一篇长达101页纸 的报告,即计算机史上著名的“101页报 告”,直到今天,仍然被认为是现代电 脑科学发展里程碑式的文献。报告明确 规定出计算机的五大部件,并用二进制 替代十进制运算。EDVAC方案的革命意 义在 于“存储程序”,以便电脑自动依 次执行指令。人们后来把这种“存储程 序”体系结构的机器统称为“诺依曼 机”。 4/13/2015 17 冯·诺依曼结构的特点 • 使用单一处理部件来完成计算、存储及通信 功能; • 线性组织的定长存储单元(地址); • 存储空间的单元是直接寻址的(地址); • 使用低级机器语言,其指令完成基本操作码 的简单操作; • 对计算进行集中的顺序控制(程序存储)。 • 首次提出“地址”和“程序存储”的概念。 4/13/2015 18 计算机系统结构的定义 Amdahl提出:计算机系统结构是从程序设计 者所看到的计算机的属性,即概念性结构和功 能特性。这实际上是计算机系统的外特性。 从计算机系统的层次结构概念出发,不同级的 程序设计者所看到的计算机属性显然是不一样 的, “系统结构”就是指计算机系统中对各级 之间界面的定义及其上、下的功能分配。 例:图1-8中M2级:机器语言级计算机。其界 面之上是所有软件功能,界面之下是所有硬件 和固件的功能。 4/13/2015 19 计算机组成-1 计算机组成(Computer Organization)指 计算机系统结构的逻辑实现,包括机器 级内的数据通道和控制信号的组成及逻 辑设计,它着眼于机器级内各时间的时 序方式与控制机构、各部件功能及相互 联系。 4/13/2015 20 计算机组成-2 计算机组成还应包括:数据通路宽度; 根据速度、造价、使用状况设置专用部 件,例如是否设置乘法器、除法器、浮 点运算协处理器、 I/O处理器等;部件共 享和并行执行;控制器结构(组合逻辑、 PLA、微程序)、单处理机或多处理机、 指令先取技术和预估、预判技术应用等 组成方式的选择;可靠性技术;芯片的 集成度和速度的选择。 4/13/2015 21 计算机实现 计算机实现(Computer Implementation) 指计算机组成的物理实现,包括处理机、 主存等部件的物理结构,芯片的集成度 和速度,芯片、模块、插件、底板的划 分与连接,专用芯片的设计,微组装技 术,总线驱动,电源、通风降温、整机 装配技术等,它着眼于芯片技术和组装 技术。 4/13/2015 22 三者之间的关系 计算机系统结构、组成和实现是三个不同的概 念。系统结构是计算机系统的软、硬件界面; 计算机组成是计算机系统结构的逻辑实现;计 算机实现是计算机组成的物理实现。他们各自 有不同的内容,但又有紧密的关系。 例如:指令系统功能的确定属于系统结构,而 指令的实现,如取指、取操作数、运算、送结 果等具体操作及其时序属于组成,而实现这些 指令功能的具体电路、器件设计及装配技术等 属于实现。 4/13/2015 23 计算机等级与设计思想 计算机等级的发展遵循以下三种不同的设计思想。 (1)在本等级范围内以合理的价格获得尽可能好的 性能,逐渐向高档机发展,称为最佳性能价格比设 计; (2)只求保持一定的合用的性能而争取最低价格, 称为最低价格设计,其结果往往是从低档向下分化 出新的计算机等级; (3)以获取最高性能为主要目标而不惜增加价格, 称为最高性能设计,以至于产生当时最高等级计算 机。 4/13/2015 24 系列机概念 先设计一种系统结构(机器属性),而后按这种系统 结构设计它的系统软件,按器件状况和硬件技术研 究这种结构的各种实现方法,并按照速度、价格等 不同要求,分别提供不同速度、不同配置的各挡机 器。(系列机必须保证用户看到的机器属性一致) 例:IBM 360 IBM AS/400 4/13/2015 25 IBM 360 (1964年) 系列中各机型(规模由小到大,功能从弱 到强,包括20、30、40、50、65、75等6 个型号,后来扩充了25、85、91、195等 型号)具有兼容性 4/13/2015 26 系列机的优点 1。在使用共同系统软件的基础上,解决程序的兼容性问题; 2。在统一数据结构和指令系统的基础上,便于组成多机系统和网络; 3。使用标准的总线规程,实现接插件和扩展功能卡的兼容,便于实 现OEM(Original Equipment Manufacture)。 4。扩大计算机应用领域,提供用户在同系列的多种机型内选用最合 适的机器的可能性; 5。有利于机器的使用、维护和人员培训 6。有利于计算机升级换代; 7。有利于提高劳动生产率,增加产量、降低成本、促进计算机的发 展。 4/13/2015 27 模拟与仿真-1 系列机能实现程序移植,其原因在于系列机有 相同的系统结构。如果要求程序能在具有不同 系统结构的机器间相互移植,就要求做到在某 系统结构之上实现另一种系统结构,即实现另 一种机器的属性。 仿真是用微程序解释,其解释程序在微程序存 储器;模拟是用机器语言程序解释,其解释程 序在主存储器。 4/13/2015 28 模拟与仿真-2 模拟(Simulation) B虚拟机(Virtual Machine) A 宿主机(Host Machine) B的一条机器指令用A的一段机器 语言程序去解释执行-->模拟。 4/13/2015 29 模拟与仿真-3 仿真(Emulation) B目的机(Target Machine) A宿主机(Host Machine) B的一条机器指令用A的一段 微程序去解释执行-->仿真。 4/13/2015 30 M5:高级语言 M4以上应用 M4:汇编 M3:OS M3:OS M2:机器语言 B 虚拟机 模拟 M2:机器语言 仿真 M1:微程序 A 宿主机 4/13/2015 31 Introduction-1 Computer technology has made incredible progress in the roughly 60 years since the first general-purpose electronic computer was created. (发展迅速) Today, less than a thousand dollars will purchase a personal computer that has more performance, more main memory, and more disk storage than a computer bought in 1980 for 1 million dollars. (经济性) This rapid rate of improvement has come both from advances in the technology used to build computers and from innovation (创新)in computer design. 4/13/2015 32 Introduction-2 During the first 25 years of electronic computers, both forces made a major contribution; but beginning in about 1970, computer designers became largely dependent upon integrated circuit technology(集成电路技术). During the 1970s, performance continued to improve at about 25% to 30% per year for the mainframes(主机系统) and minicomputers (小型机) that dominated the industry. 4/13/2015 33 Introduction-3 The late 1970s saw the emergence of the microprocessor (微型机). The ability of the microprocessor to ride the improvements in integrated circuit technology more closely than the less integrated mainframes and minicomputers led to a higher rate of improvement— roughly 35% growth per year in performance. This growth rate, combined with the cost (成本) advantages of a mass-produced microprocessor, led to an increasing fraction of the computer business being based on microprocessors. 4/13/2015 34 Introduction-3 In addition, two significant changes in the computer marketplace made it easier than ever before to be commercially successful with a new architecture. First, the virtual elimination of assembly language programming reduced the need for object-code compatibility. (汇编语言程序设计) Second, the creation of standardized, vendorindependent operating systems, such as UNIX and its clone, Linux, lowered the cost and risk of bringing out a new architecture. (操作系统) 4/13/2015 35 Introduction-4 These changes made it possible to successfully develop a new set of architectures, called RISC (Reduced Instruction Set Computer) (精简指令系统) architectures, in the early 1980s. The RISC-based machines focused the attention of designers on two critical performance techniques, the exploitation of instruction-level parallelism(指令级并行) (initially through pipelining(流水线)and later through multiple instruction issue(多指令发射)) and the use of caches (initially in simple forms and later using more sophisticated organizations and optimizations). 4/13/2015 36 Introduction-5 The combination of architectural and organizational enhancements has led to 20 years of sustained growth in performance at an annual rate of over 50%. 4/13/2015 37 Introduction-6 Figure 1.1 shows the effect of this difference in performance growth rates. 4/13/2015 38 Figure 1.1 shows the effect of this difference in performance growth rates. 4/13/2015 39 Introduction-7 First, it has signifi-cantly enhanced the capability available to computer users. For many applications, the highest-performance microprocessors of today outperform the supercomputer(超级计算机) of less than 10 years ago. Second, this dramatic rate of improvement has led to the dominance of microprocessor-based computers across the entire range of the computer design. 4/13/2015 40 Introduction-8 Workstations(工作站) and PCs have emerged as major products in the computer industry. Minicomputers, which were traditionally made from off-the-shelf logic or from gate arrays(门阵列), have been replaced by servers made using microprocessors. Mainframes have been almost completely replaced with multiprocessors multiprocessors(多处理器) multicore (多核)consisting of small numbers of offthe-shelf microprocessors. 4/13/2015 41 Introduction-9 Even high-end supercomputers (高端超级 计算机)are being built with collections of microprocessors. Freedom from compatibility with old designs and the use of microprocessor technology led to a renaissance in computer design, which emphasized both architectural innovation and efficient use of technology improvements. 4/13/2015 42 Introduction-10 This renaissance is responsible for the higher performance growth shown in Figure 1.1—a rate that is unprecedented in the computer industry. This rate of growth has compounded so that by 2001, the difference between the highest-performance microprocessors and what would have been obtained by relying solely on technology, including improved circuit design, was about a factor of 15. 4/13/2015 43 Introduction-11 In the last few years, the tremendous improvement in integrated circuit capability has allowed older, less-streamlined architectures, such as the x86 (or IA-32) architecture, to adopt many of the innovations first pioneered in the RISC designs. (用新技术手段改造过时的结构) 4/13/2015 44 Introduction-12 As we will see, modern x86 processors basically consist of a front end that fetches and decodes x86 instructions and maps them into simple ALU(算术逻辑单元), memory access(存储器访问), or branch operations(分支操作) that can be executed on a RISC-style pipelined processor. 4/13/2015 45 Introduction-13 Beginning in the late 1990s, as transistor counts soared(晶体管数量的迅猛增长), the overhead (in transistors) of interpreting the more complex x86 architecture became negligible(微不足道的) as a percentage of the total transistor count of a modern microprocessor. 4/13/2015 46 Introduction-14 The architectural ideas and accompanying compiler improvements that have made this incredible growth rate possible. The dramatic revolution has been the development of a quantitative approach (定量 方法)to computer design and analysis that uses empirical observations (经验观察能力力)of programs, experimentation, and simulation as its tools(工具). 4/13/2015 47 Introduction-15 Sustaining the recent improvements in cost and performance (性能和价格)will require continuing innovations in computer design. We believe such innovations will be founded on this quantitative approach to computer design. (我们相信这种创新是建立在对计算 机设计的定量探求上的) 4/13/2015 48 Introduction The Changing Face of Computing In the 1960s, the dominant form of computing was on large mainframes(大 型主机)—machines costing millions of dollars and stored in computer rooms with multiple operators overseeing their support. Typical applications included business data processing (商务数据处理)and largescale scientific computing(大规模科学计 算). 4/13/2015 49 Introduction The Changing Face of Computing The 1970s saw the birth of the minicomputer, a smaller-sized machine initially focused on applications in scientific laboratories, but rapidly branching out as the technology of time-sharing(分时)— multiple users(多用户) sharing a computer interactively through independent terminals(独立终端)—became widespread. 4/13/2015 50 Introduction The Changing Face of Computing The 1980s saw the rise of the desktop computer(台式机) based on microprocessors, in the form of both personal computers and workstations. The individually owned desktop computer replaced time-sharing and led to the rise of servers(服务器)— computers that provided larger-scale services such as reliable, longterm file storage and access, larger memory, and more computing power(计算能力). 4/13/2015 51 Introduction The Changing Face of Computing The 1990s saw the emergence of the Internet and the World Wide Web, the first successful handheld computing devices (personal digital assistants or PDAs), and the emergence of high-performance digital consumer electronics, from video games to set-top boxes(机顶盒). 4/13/2015 52 Introduction The Changing Face of Computing Not since the creation of the personal computer more than 20 years ago have we seen such dramatic changes in the way computers appear and in how they are used. These changes in computer use have led to three different computing markets(计算市场) ( desktop computing , servers , Embedded computers ), each characterized by different applications(应用), requirements(需求), and computing technologies(计算技术). 4/13/2015 53 Introduction Changing Face for Desktop Computing The first, and still the largest market in dollar terms, is desktop computing. Desktop computing spans from low-end systems that sell for under $1000 to high-end, heavily configured workstations that may sell for over $10,000. Throughout this range in price and capability, the desktop market tends to be driven to optimize priceperformance. 4/13/2015 54 Introduction Changing Face for Desktop Computing This combination of performance (measured primarily in terms of compute performance and graphics performance) and price of a system is what matters most to customers in this market, and hence to computer designers. As a result, desktop systems often are where the newest, highest-performance microprocessors appear, as well as where recently cost-reduced microprocessors and systems appear first. 4/13/2015 55 Introduction Changing Face for Desktop Computing Desktop computing also tends to be reasonably well characterized in terms of applications and benchmarking, though the increasing use of Web-centric, interactive applications poses new challenges in performance evaluation. The PC portion of the desktop space seems recently to have become focused on clock rate as the direct measure of performance, and this focus can lead to poor decisions by consumers as well as by designers who respond to this predilection. 4/13/2015 56 Introduction Changing Face for Servers As the shift to desktop computing occurred, the role of servers to provide larger-scale and more reliable file and computing services grew. The emergence of the World Wide Web accelerated this trend because of the tremendous growth in demand for Web servers and the growth in sophistication of Web-based services. Such servers have become the backbone of large-scale enterprise computing, replacing the traditional mainframe. 4/13/2015 57 Introduction Changing Face for Servers For servers, different characteristics are important. First, availability is critical. The term “availability(有效性),” which means that the system can reliably and effectively provide a service. This term is to be distinguished from “reliability,” which says that the system never fails. Parts of large-scale systems unavoidably fail; the challenge in a server is to maintain system availability in the face of component failures, usually through the use of redundancy. 4/13/2015 58 Introduction Changing Face for Servers Why is availability crucial? Consider the servers running Yahoo!, taking orders for Cisco, or running auctions on eBay. Obviously such systems must be operating seven days a week, 24 hours a day. Failure of such a server system is far more catastrophic than failure of a single desktop. Although it is hard to estimate the cost of downtime, Figure 1.2 shows one analysis, assuming that downtime is distributed uniformly and does not occur solely during idle times. 4/13/2015 59 Introduction Changing Face for Servers As we can see, the estimated costs of an unavailable system are high, and the estimated costs in Figure 1.2 are purely lost revenue and do not account for the cost of unhappy customers! 4/13/2015 60 Introduction Changing Face for Servers A second key feature of server systems is an emphasis on scalability(可扩展性). Server systems often grow over their lifetime in response to a growing demand for the services they support or an increase in functional requirements. Thus, the ability to scale up the computing capacity, the memory, the storage, and the I/O bandwidth of a server is crucial. 4/13/2015 61 Introduction Changing Face for Servers Lastly, servers are designed for efficient throughput(吞吐量). That is, the overall performance of the server—in terms of transactions(交互) per minute or Web pages served per second—is what is crucial. Responsiveness to an individual request remains important, but overall efficiency and cost-effectiveness, as determined by how many requests can be handled in a unit time, are the key metrics for most servers. 4/13/2015 62 Introduction Changing Face for Embedded Computers Embedded computers(嵌入式计算机)—computers lodged in other devices where the presence of the computers is not immediately obvious—are the fastest growing portion of the computer market. These devices range from everyday machines (most microwaves, most washing machines, most printers, most networking switches, and all cars contain simple embedded microprocessors) to handheld digital devices (such as palmtops, cell phones, and smart cards) to video games and digital set-top boxes. 4/13/2015 63 Introduction Changing Face for Embedded Computers Although in some applications (such as palmtops) the computers are programmable, in many embedded applications the only programming occurs in connection with the initial loading of the application code or a later software upgrade of that application. Thus, the application can usually be carefully tuned for the processor and system. 4/13/2015 64 Introduction Changing Face for Embedded Computers This process sometimes includes limited use of assembly language(汇编语言) in key loops, although time-to-market pressures and good software engineering practice usually restrict such assembly language coding to a small fraction of the application. This use of assembly language, together with the presence of standardized operating systems(标准化操作系统), and a large code base has meant that instruction set compatibility(指令系统兼容性) has become an important concern in the embedded market. Simply put, like other computing applications, software costs are often a large part of the total 4/13/2015 65 cost of an embedded system. Introduction Changing Face for Embedded Computers Embedded computers have the widest range of processing power and cost—from low-end (低端)8-bit and 16-bit processors that may cost less than a dollar, to full 32-bit microprocessors capable of executing 50 million instructions per second that cost under 10 dollars, to high-end (高端) embedded processors that cost hundreds of dollars and can execute a billion instructions per second for the newest video game or for a high-end network switch. Although the range of computing power in the embedded computing market is very large, price is a key factor in the design of computers for this space. Performance requirements do exist, of course, but the primary goal is often meeting the performance need at a minimum price, rather than achieving higher performance at a higher price. 4/13/2015 66 Introduction Changing Face for Embedded Computers Often, the performance requirement in an embedded application is a real-time (实时)requirement. A real-time performance requirement is one where a segment of the application has an absolute maximum execution time that is allowed. For example, in a digital set-top box the time to process each video frame (视帧)is limited, since the processor must accept and process the next frame shortly. In some applications, a more sophisticated requirement exists: the average time for a particular task is constrained as well as the number of instances when some maximum time is exceeded. Such approaches (sometimes called soft real-time ) arise when it is possible to occasionally miss the time constraint on an event, as long as not too many are missed. 4/13/2015 67 Introduction Changing Face for Embedded Computers Real-time performance tends to be highly application dependent. It is usually measured (测量)using either from the application or from a standardized benchmark(标准化评 测) .With the growth in the use of embedded microprocessors, a wide range of benchmark kernels requirements exist, from the ability to run small, limited code segments to the ability to perform well on applications involving tens to hundreds of thousands of lines of code. 4/13/2015 68 Introduction Changing Face for Embedded Computers Two other key characteristics exist in many embedded applications: the need to minimize memory(最小化存储器) and the need to minimize power (最小化功耗). Although the emphasis on low power is frequently driven by the use of batteries(电 池), the need to use less expensive packaging (plastic versus ceramic) and the absence of a fan (风扇) for cooling(冷却) also limit total power consumption. 4/13/2015 69 Introduction Changing Face for Embedded Computers In many embedded applications, the memory can be a substantial portion of the system cost, and it is important to optimize memory size in such cases. Sometimes the application is expected to fit totally in the memory on the processor chip; other times the application needs to fit totally in a small off-chip memory. In any event, the importance of memory size translates to an emphasis on code size, since data size is dictated by the application. Some architectures have special instruction set capabilities to reduce code size. Larger memories also mean more power, and optimizing power is often critical in embedded applications. 4/13/2015 70 Introduction Changing Face for Embedded Computers Another important trend in embedded systems is the use of processor cores together with application-specific circuitry(专用电路芯片). Often an application’s functional and performance requirements are met by combining a custom hardware solution(用户硬件解决方 案) together with software running on a standardized embedded processor core, which is designed to interface to such special-purpose hardware. 4/13/2015 71 Introduction Changing Face for Embedded Computers In practice, embedded problems are usually solved by one of three approaches: 1. The designer uses a combined hardware/software solution that includes some custom hardware and an embedded processor core that is integrated with the custom hardware, often on the same chip. 2. The designer uses custom software running on an offthe-shelf(通用) embedded processor. 3. The designer uses a digital signal processor (DSPs) and custom software for the processor. Digital signal processors are processors specially tailored for signalprocessing applications. 4/13/2015 72 The Task of the Computer Designer-1 The task the computer designer faces is a complex one: Determine what attributes (属性)are important for a new machine, then design a machine to maximize performance while staying within cost and power constraints. This task has many aspects, including instruction set design, functional organization, logic design, and implementation. 4/13/2015 73 The Task of the Computer Designer-2 The implementation may encompass integrated circuit design, packaging(封 装), power, and cooling. Optimizing the design requires familiarity with a very wide range of technologies, from compilers and operating systems to logic design and packaging. 4/13/2015 74 The Task of the Computer Designer-3 In the past, the term computer architecture often referred only to instruction set design. Other aspects of computer design were called implementation(实现), often insinuating that implementation is uninteresting or less challenging. We believe this view is not only incorrect, but is even responsible for mistakes in the design of new instruction sets. 4/13/2015 75 The Task of the Computer Designer-4 The architect’s or designer’s job is much more than instruction set design, and the technical hurdles in the other aspects of the project are certainly as challenging as those encountered in instruction set design. This challenge is particularly acute at the present, when the differences among instruction sets are small and when there are three rather distinct application areas. 4/13/2015 76 The Task of the Computer Designer-5 The implementation of a machine has two components: organization and hardware. The term organization includes the high-level aspects of a computer’s design, such as the memory system, the bus structure, and the design of the internal CPU (where arithmetic, logic, branching, and data transfer are implemented). 4/13/2015 77 The Task of the Computer Designer-6 For example, two embedded processors with identical instruction set architectures but very different organizations are the NEC VR 5432 and the NEC VR 4122. Both processors implement the MIPS64 instruction set, but they have very different pipeline and cache organizations. In addition, the 4122 implements the floating-point instructions in software rather than hardware! 4/13/2015 78 The Task of the Computer Designer-7 Hardware is used to refer to the specifics of a machine, including the detailed logic design and the packaging technology of the machine. Often a line of machines contains machines with identical instruction set architectures and nearly identical organizations, but they differ in the detailed hardware implementation. 4/13/2015 79 The Task of the Computer Designer-8 For example, the Pentium II and Celeron are nearly identical, but offer different clock rates and different memory systems, making the Celeron more effective for low-end computers. Term architecture is intended to cover all three aspects of computer design— instruction set architecture, organization, and hardware. 4/13/2015 80 The Task of the Computer Designer-9 Computer architects must design a computer to meet functional requirements as well as price, power, and performance goals. Often, they also have to determine what the functional requirements are, which can be a major task. The requirements may be specific features inspired by the market. 4/13/2015 81 The Task of the Computer Designer-10 Application software often drives the choice of certain functional requirements by determining how the machine will be used. If a large body of software exists for a certain instruction set architecture, the architect may decide that a new machine should implement an existing instruction set. 4/13/2015 82 The Task of the Computer Designer11 The presence of a large market for a particular class of applications might encourage the designers to incorporate requirements that would make the machine competitive in that market. Figure 1.4 summarizes some requirements that need to be considered in designing a new machine. Many of these requirements and features will be examined in depth in later chapters. 4/13/2015 83 The Task of the Computer Designer-12 Once a set of functional requirements has been established, the architect must try to optimize(优化) the design. Which design choices are optimal depends, of course, on the choice of metrics. The changes in the computer applications space over the last decade have dramatically changed the metrics. Although desktop computers remain focused on optimizing costperformance(性能-价格) as measured by a single user, servers focus on availability, scalability, and throughput cost-performance, and embedded computers are driven by price and often power issues. 4/13/2015 84 The Task of the Computer Designer-13 These differences and the diversity and size of these different markets lead to fundamentally different design efforts. For the desktop market, much of the effort goes into designing a leading-edge microprocessor and into the graphics and I/O system that integrate with the microprocessor. 4/13/2015 85 The Task of the Computer Designer-14 In the server area, the focus is on integrating state-of-the-art microprocessors, often in a multiprocessor architecture, and designing scalable and highly available I/O systems to accompany the processors. 4/13/2015 86 The Task of the Computer Designer-15 In the embedded processor market, the challenge lies in adopting the high-end microprocessor techniques to deliver most of the performance at a lower fraction of the price, while paying attention to demanding limits on power and sometimes a need for high-performance graphics or video processing. 4/13/2015 87 The Task of the Computer Designer-16 In addition to performance and cost, designers must be aware of important trends in both the implementation technology and the use of computers. Such trends not only impact future cost, but also determine the longevity of an architecture. 4/13/2015 88 Technology Trends-1 If an instruction set architecture is to be successful, it must be designed to survive rapid changes in computer technology. After all, a successful new instruction set architecture may last decades—the core of the IBM mainframe has been in use for more than 35 years. An architect must plan for technology changes that can increase the lifetime of a successful computer. 4/13/2015 89 Technology Trends-2 To plan for the evolution of a machine, the designer must be especially aware of rapidly occurring changes in implementation technology. Four implementation technologies, which change at a dramatic pace, are critical to modern implementations: 4/13/2015 90 Technology Trends-3 1. Integrated circuit logic technology— Transistor density(密度) increases by about 35% per year, quadrupling in somewhat over four years. Increases in die size (模板大小)are less predictable and slower, ranging from 10% to 20% per year. The combined effect is a growth rate in transistor count on a chip (片上晶体管数)of about 55% per year. Device speed scales(速率) more slowly. 4/13/2015 91 Technology Trends-4 2. Semiconductor DRAM (dynamic randomaccess memory)—Density increases by between 40% and 60% per year, quadrupling in three to four years. Cycle time has improved very slowly, decreasing by about one-third in 10 years. Bandwidth(带宽) per chip increases about twice as fast as latency decreases. In addition, changes to the DRAM interface have also improved the bandwidth. 4/13/2015 92 Technology Trends-5 3. Magnetic disk technology —Recently, disk density has been improving by more than 100% per year, quadrupling in two years. Prior to 1990, density increased by about 30% per year, doubling in three years. It appears that disk technology will continue the faster density growth rate for some time to come. Access time (访问时间)has improved by one-third in 10 years. 4/13/2015 93 Technology Trends-6 4. Network technology —Network performance depends both on the performance of switches and on the performance of the transmission system. Both latency(延迟) and bandwidth(带宽) can be improved, though recently bandwidth has been the primary focus. For many years, networking technology appeared to improve slowly: for example, it took about 10 years for Ethernet technology to move from 10 Mb to 100 Mb. The increased importance of networking has led to a faster rate of progress, with 1 Gb Ethernet becoming available about five years after 100 Mb. The Internet infrastructure in the United States has seen even faster growth (roughly doubling in bandwidth every year), both through the use of optical media(光介质) and through the deployment of much more switching hardware. 4/13/2015 94 Technology Trends-7 These rapidly changing technologies impact the design of a microprocessor that may, with speed and technology enhancements, have a lifetime of five or more years. Even within the span of a single product cycle for a computing system (two years of design and two to three years of production), key technologies, such as DRAM, change sufficiently that the designer must plan for these changes. Indeed, designers often design for the next technology, knowing that when a product begins shipping in volume that next technology may be the most costeffective or may have performance advantages. Traditionally, cost has decreased at about the rate at which density increases. 4/13/2015 95 Technology Trends-8 Although technology improves fairly continuously, the impact of these improvements is sometimes seen in discrete leaps, as a threshold that allows a new capability is reached. . 4/13/2015 96 Technology Trends-9 For example, when MOS technology reached the point where it could put between 25,000 and 50,000 transistors on a single chip in the early 1980s, it became possible to build a 32-bit microprocessor on a single chip. By the late 1980s, first-level caches could go on chip. By eliminating chip crossings within the processor and between the processor and the cache, a dramatic increase in cost-performance and performance/power was possible. This design was simply infeasible until the technology reached a certain point. Such technology thresholds are not rare and have a significant impact on a wide variety of design decisions. 4/13/2015 97 Cost, Price, and Their Trends-1 Although there are computer designs where costs tend to be less important—specifically supercomputers—costsensitive designs are of growing significance: More than half the PCs sold in 1999 were priced at less than $1000, and the average price of a 32-bit microprocessor for an embedded application is in the tens of dollars. Indeed, in the past 15 years, the use of technology improvements to achieve lower cost, as well as increased performance, has been a major theme in the computer industry. 4/13/2015 98 Cost, Price, and Their Trends-2 Textbooks often ignore the cost half of costperformance because costs change, thereby dating books, and because the issues are subtle and differ across industry segments. Yet an understanding of cost and its factors is essential for designers to be able to make intelligent decisions about whether or not a new feature should be included in designs where cost is an issue. 4/13/2015 99 Cost, Price, and Their Trends-3 We focuses on cost and price, specifically on the relationship between price and cost: price is what you sell a finished good for, and cost is the amount spent to produce it, including overhead. We also discuss the major trends and factors that affect cost and how it changes over time. The exercises and examples use specific cost data that will change over time, though the basic determinants of cost are less time sensitive. 4/13/2015 100 An Example Cost, Price, and Their Trends-3 System Cabinet Processor board I/O devices 4/13/2015 Software Subsystem Fraction of total Sheet metal, plastic 2% Power supply, fans 2% Cables, nuts, bolts 1% Shipping box, manuals 1% Subtotal 6% Processor 22% DRAM(128MB) 5% Video card 5% Motherboard with basic I/O support, networking 5% Subtotal 37% Keyboard and mouse 3% Monitor 19% Hard disk(20GB) 9% DVD drive 6% Subtotal 37% 101 OS + Basic Office Suite 20% Amdahl’s Law 1、Make the common case fast (加快经常 性事件的速度) Improving the frequent event, rather than the rare event, will obviously help performance. 4/13/2015 102 Amdahl’s Law Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction(部分、比例) of the time the faster mode can be used. 该定律表示:系统中某一部件由于采用某种 更快的执行方式后整个系统性能的提高与这 种执行方式的使用频率或占总执行时间的比 例有关。 4/13/2015 103 Amdahl’s Law 使用频度:应用对象、统计手段 改进使用频度最高的部件,可获得最大 的效率 形式化描述的主要指标——加速比 4/13/2015 104 Performance for entire task Speedup= using the enhancement when possible Performance for entire task without using the enhancement Execution time for entire task Speedup= without using the enhancement Execution time for entire task using the enhancement when possible 4/13/2015 105 Speedup(加速比) 加速比=(采用改进措施后的性能)/ (没有采用改进措施前的性能) = (没有采用改进措施前执行某任 务的时间)/ (采用改进措施后执行某任务的 时间) 4/13/2015 106 two factors-1 1. The fraction of the computation time in the original machine that can be converted to take advantage of the enhancement。 计算机执行某个任务的总时间中可被改进部 分的时间所占的百分比。 For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. This value, which we will call Fractionenhanced, is always less than or equal to 1. 4/13/2015 107 two factors-2 2. The improvement gained by the enhanced execution mode; that is, how much faster the task would run if the enhanced mode were used for the entire program. 改进部分采用改进措施后比没有采用改进措施 前性能提高倍数。 For example, if 20 seconds of the execution time of a program that takes 60 seconds in total can use an enhancement, the fraction is 20/60. We will call this value, which is always greater than 1, Speedupenhanced. 4/13/2015 108 The execution time using the original machine with the enhanced mode will be the time spent using the unenhanced portion of the machine plus the time spent using the enhancement. 4/13/2015 109 Example 1: Suppose that we are considering an enhancement to the processor of a server system used for Web serving. The new CPU is 10 times faster on computation in the Web serving application than the original processor. Assuming that the original CPU is busy with computation 40% of the time and is waiting for I/O 60% of the time, what is the overall speedup gained by incorporating the enhancement? Answer Fractionenhanced = 0.4 Speedupenhanced = 10 4/13/2015 110 Example 2: A common transformation required in graphics engines is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark.One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for a total of 50% of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these 4/13/2015two design alternatives. 111 Answer: 4/13/2015 112 3、The CPU Performance Equation (CPU性能公式) CPU time =CPU clock cycles for a program(CPU时钟周期总数) ×Clock cycle time(时钟周期) 时钟频率 4/13/2015 113 三要素 CPU性能取决于3个要素: Clock cycle time—Hardware technology and organization clock cycles per instruction(CPI)—Organization and instruction set architecture Instruction count(IC)—Instruction set architecture and compiler technology 4/13/2015 114 CPU time CPU time =Instruction count ×Clock cycle time ×Cycles per instruction 4/13/2015 115 4/13/2015 116 Example Suppose we have made the following measurements: Frequency of FP operations (other than FPSQR) = 25% Average CPI of FP operations = 4.0 Average CPI of other instructions = 1.33 Frequency of FPSQR= 2% CPI of FPSQR = 20 Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives using the CPU 4/13/2015 performance equation. 117 =2-(4.0-2.5) ×25%=1.625 4/13/2015 118 局部性原理 4、程序访问的局部性原理 经统计:一段时间90%的时间去执行10% 的程序代码,即大部分时间是访问程序 的局部空间。 程序访问的局部性是构建存储体系和建立 Cache的理论基础。 4/13/2015 119 控制流程的实现方法 一个信息的处理过程可用控制流程的概念来描 述,常用的实现方法有三种: 1。全硬件的方法,即用组合逻辑设计方法设计 硬件逻辑线路实现控制流程 ; 2。硬件与软件相结合的方法,即部分流程由微 程序实现,而另一部分由硬件逻辑实现 ; 3。全软件的方法,即用某种语言,按流程算法 编制程序实现控制流程。 4/13/2015 120 计算机系统的层次结构 • 描述控制流程的,有一定规则的字符集合的“计算 机语言”。 • 计算机语言并不专属软件范畴,它可以分属计算机 系统的各个层次,分别对该层次的控制流程进行描 述。 • 基于对语言广义的理解,可以把计算机系统看成由 多级“虚拟”计算机所组成。从内向外,层层相套, 形成“洋葱”式结构的功能模型。(见图1-6) 例:用户--建模--应用程序--高级语言--汇编语言-操作系统--机器语言--微程序--硬布线逻辑 4/13/2015 121 4/13/2015 122 虚拟计算机的概念 洋葱模型的每一层都是一个虚拟计算机,它只 对“观察者”而存在,它的功能体现在广义语 言上,对该语言提供解释手段,然后作用在信 息处理或控制对象上,并从对象上获得必要的 状态信息。从某一层次的观察者看来,他只能 是通过该层次的语言来了解和使用计算机,至 于内部任何工作和实现是不必关心的。 (即:虚拟计算机是由软件实现的机器) 虚拟计算机的组成,(见图1-7) 用虚拟计算机观点定义的计算机系统的功能层次,(见 图1-8) 4/13/2015 123 4/13/2015 124 4/13/2015 125 4/13/2015 126 现代计算机组成 现代计算机是一种包括机器硬件、指令 系统、系统软件、应用程序和用户接口 的集成系统。各种求解方法可能需要不 同的计算资源,这与求解问题的性质有 关。 4/13/2015 127 4/13/2015 128