Multi-Core Processors: Architecture & Performance

UNIT-4 Contents  Multi-Core Processors  Super Scalar Processors  Very Long Instruction Word (VLIW)  Vector Processors Multi-core Processors   A multicore processor is an integrated circuit that has two or more processor cores attached for • Enhanced performance and • Reduced power consumption. These processors also enable more efficient simultaneous processing of multiple tasks, • such as parallel processing and multithreading. Multi-core Processors Contd..  A dual core setup is similar to having multiple, separate processors installed on a computer.  However, because the two processors are plugged into the same socket, • the connection between them is faster. Multi-core Processors Contd..  The use of multicore processors is one approach to boost processor performance •  without exceeding the practical semiconductor design and fabrication. limitations of Using multicores also ensure safe operation in areas such as heat generation. How do multicore processors work?  The heart of every processor is an execution engine, also known as a core.  The core is designed to process instructions and data according to •  the software programs in the computer's memory. Over the years, designers found that every new processor design has limits. How do multicore processors work? Contd..  Numerous technologies were performance, as following ones: • Clock Speed • Hyper-threading • More Chips utilized to accelerate How do multicore processors work? (Clock Speed) Clock Speed:  One approach is to make the processor's clock faster.  Clock is the "drumbeat" used to •  synchronize the processing of instructions and data through the processing engine. Clock speeds have accelerated from several megahertz (MHz) to several gigahertz (GHz) nowadays. How do multicore processors work? (Clock Speed) Contd..  However, transistors use up power with each clock tick.  As a result, clock speeds have nearly reached their limits • using current semiconductor fabrication and heat management techniques. How do multicore processors work? (Hyper Threading) Hyper Threading:  Another approach involved the handling of multiple instruction threads.  Intel calls this hyper-threading.  With hyper-threading, processor cores are designed to • handle two separate instruction threads at the same time. How do multicore processors work? (Hyper Threading) Contd..  When properly enabled and supported by both the computer's firmware and operating system, •  hyper-threading techniques enable one physical core to function as two logical cores. Still, the processor only possesses a single physical core. How do multicore processors work? (Hyper Threading) Contd..  The logical abstraction of the physical processor added little real performance to the processor • other than to help streamline the behavior of multiple simultaneous applications running on the computer. How do multicore processors work? (More Chips) More Chips:  The next step is to add processor chips to the processor package, • which is the physical device that plugs into the motherboard.  A dual-core processor includes two separate processor cores.  A quad-core processor includes four separate cores. How do multicore processors work? Contd..  Today's multicore processors can easily include 12, 24 or even more processor cores.  The multicore approach is almost identical to the use of multiprocessor motherboards, • which have two or four separate processor sockets. How do multicore processors work? Contd..  Today's huge processor performance involves the use of processor products • that combine fast clock speeds and multiple hyperthreaded cores. How do multicore processors work? Contd..  However, multicore chips have several issues to consider.  First, the addition of more processor cores doesn't automatically improve computer performance.  The OS and applications must direct software program instructions to • recognize and use the multiple cores. How do multicore processors work? Contd..  This must be done in parallel, •  Some software applications may need to be refactored to •  using various threads to different cores within the processor package. support and use multicore processor platforms. Otherwise, only the default first processor core is used, and any additional cores are unused or idle. How do multicore processors work? Contd..  Second, the performance benefit of additional cores is not a direct multiple.  That is, adding a second core does not double the processor's performance, • or a quad-core processor does not multiply the processor's performance by a factor of four. How do multicore processors work? Contd..  This happens because of the shared elements of the processor, such as access to • Internal memory or caches, • External buses, • and Computer system memory. How do multicore processors work? Contd..  The benefit of multiple cores can be substantial, but there are practical limits.  Still, the acceleration is typically better than a traditional multiprocessor system because • the coupling between cores in the same package is tighter and • there are shorter distances and fewer components between cores. How do multicore processors work? Contd..  Consider the analogy of cars on a road.  Each car can be considered as a processor, •  but each car must share the common roads and traffic limitations. More cars can transport more people and goods in a given time, • but more cars also cause congestion. Why are multicore processors used?  Multicore processors work on any modern computer hardware platform.  Virtually, all PCs and laptops today build in some multicore processor model.  However, the true power and benefit of these processors depend on • software applications designed to emphasize parallelism. Why are multicore processors used? Contd..  A parallel approach divides application work into numerous processing threads, • and then distributes and manages those threads across two or more processor cores. Major Use cases for Multicore Processors There are several major use cases for multicore processors, including the following five: • Virtualization • Databases • Analytics & HPC • Cloud • Visualization Major Use cases for Multicore Processors (Visualization) Virtualization:  A virtualization platform, such as VMware, is designed to •  abstract the software environment from the underlying hardware. Virtualization is capable of abstracting physical processor cores into • virtual processors or central processing units (vCPUs) ❑ which are then assigned to Virtual Machines (VMs). Major Use cases for Multicore Processors (Visualization) Contd..  Each VM becomes a virtual server capable of running its own OS and application.  It is possible to assign more than one vCPU to each VM, • allowing each VM and its application to run parallel processing software if required. Major Use cases for Multicore Processors (Databases) Databases:  A database is a complex software platform that frequently needs to run many simultaneous tasks such as queries.  As a result, databases are highly dependent on multicore processors to • distribute and handle these many task threads. Major Use cases for Multicore Processors (Databases) Contd..  The use of multiple processors in databases is often coupled with extremely high memory capacity • that can reach 1 terabyte or more on the physical server. Major Use cases for Multicore Processors (Analytics & HPC) Analytics and HPC:  Big data analytics, such as • machine learning and High Performance Computing (HPC) both require ❑ breaking large & complex tasks into smaller and more manageable pieces. Major Use cases for Multicore Processors (Analytics & HPC) Contd..  Each piece of the computational effort can then be solved by •  distributing each piece of the problem to a different processor. This approach enables each processor to work in parallel to • solve the overarching problem far faster and more efficiently than with a single processor. Major Use cases for Multicore Processors (Cloud) Cloud:  Organizations building a cloud adopt multicore processors to • support all the virtualization needed to ❑ accommodate the highly scalable, ❑ and highly transactional demands of cloud software platforms such as OpenStack. Major Use cases for Multicore Processors (Cloud) Contd..  A set of servers with multicore processors can allow the cloud to • create and scale up more VM instances on demand. Major Use cases for Multicore Processors (Visualization) Visualization:  Graphics applications, such as games and data-rendering engines, •  have the same parallelism requirements as other HPC applications. Visual rendering is task-intensive, • So visualization applications can make extensive use of multiple processors to distribute the calculations required. Major Use cases for Multicore Processors (Visualization) Contd..  Many graphics applications rely on Graphics Processing Units (GPUs) rather than CPUs.  GPUs are tailored to optimize graphics-related tasks.  GPU packages often contain multiple GPU cores, similar in principle to multicore processors. Pros and cons of multicore processors  Multicore processor technology is mature and well-defined.  However, the technology poses its share of pros and cons, • which should be considered when buying and deploying new servers. Advantages of Multicore Processor Some of the advantages of multicore processors are following:  Better application performance  Better hardware performance Advantages of Multicore Processor (Better Application Performance) Better Application Performance:  The principle benefit of multicore processors is more potential processing capability.  Each processor core is effectively a separate processor that OSes and applications can use.  In a virtualized server, each VM can employ one or more virtualized processor cores, • enabling many VMs to coexist simultaneously on a physical server. and operate Advantages of Multicore Processor Contd.. (Better Application Performance)  Similarly, an application designed for high levels of parallelism may use any number of cores to • provide high application performance that would be impossible with single-chip systems. Advantages of Multicore Processor (Better Hardware Performance) Better Hardware Performance:  By placing two or more processor cores on the same device, it can use shared components such as • Common internal buses, • and Processor caches more efficiently. Advantages of Multicore Processor Contd.. (Better Hardware Performance)  It also benefits from superior performance compared with multiprocessor systems • that have separate processor packages on the same motherboard. Disadvantages of Multicore Processor Some of the disadvantages of multicore processor are following:  Software dependent,  Performance boosts are limited,  Power, heat and clock restrictions. Disadvantages of Multicore Processor (Software Dependent) Software Dependent:  The application uses processors not the other way around.  OSes and applications are always default to use the first processor core, dubbed core 0.  Any additional cores in the processor package will remain unused or idle • until software applications are enabled to use them. Disadvantages of Multicore Processor (Software Dependent) Contd..  Such applications include database applications and big data processing tools like Hadoop.  A business should consider for what a server will be used and the applications it plans to use • before making a multicore system investment ❑ to ensure that the system delivers its optimum computing potential. Disadvantages of Multicore Processor (Performance boosts are limited) Performance boosts are limited:  Multiple processors in a processor package must share common system buses and processor caches.  The more processor cores share a package, • the more sharing take place across common processor interfaces and resources. Disadvantages of Multicore Processor (Performance boosts are limited) Contd..  This results in diminishing returns to performance as cores are added.  For most situations, the performance benefit of having multiple cores • far outweighs the performance lost to such sharing, ❑ but it's a factor to consider when testing application performance. Disadvantages of Multicore Processor (Power, heat and clock restrictions) Power, heat and clock restrictions:  A computer may not be able to drive a processor with many cores •  as hard as a processor with fewer cores or a single-core processor. A modern processor core may contain over 500 million transistors. Disadvantages of Multicore Processor (Power, heat and clock restrictions) Contd..  Each transistor generates heat when it switches, •  and this heat increases as the clock speed increases. All of that heat generation must be safely dissipated • from the core through the processor package. Disadvantages of Multicore Processor (Power, heat and clock restrictions) Contd..  When more cores are running, •  this heat can multiply and quickly exceed the cooling capability of the processor package. Thus, some multicore processors may actually reduce clock speeds for instance, • from 3.5 GHz to 3.0 GHz to help manage heat. Disadvantages of Multicore Processor (Power, heat and clock restrictions) Contd..  This reduces the performance of all processor cores in the package.  High-end multicore processors require • complex cooling systems, • and careful deployment & monitoring to ensure long-term system reliability. Architecture of Multicore Processors Architecture of Multicore Processors Contd.. The components of multicore processors are as follows:  Cores  Processor Support  Caches Architecture of Multicore Processors (Cores) Cores:  Every multicore processor consists of two or more cores along with a series of caches.  Cores are the central component of multicore processors. Architecture of Multicore Processors (Cores) Contd..  Cores contain • all of the registers and circuitry, • sometimes hundreds of millions of individual transistors needed to ❑ perform the closely-synchronized tasks of ingesting data and instruction, ❑ process content and outputting logical decisions or results. Architecture of Multicore Processors (Processor Support) Processor Support:  Processor support circuitry includes an assortment input/output control and management circuitry, such as • Clocks, • Cache consistency, • Power & thermal control, • and External bus access. of Architecture of Multicore Processors (Caches) Caches:  Caches are relatively small areas of very fast memory.  A cache retains often-used instructions or data, • making that content readily available to the core ❑ without the need to access system memory. Architecture of Multicore Processors (Caches) Contd..  A processor checks the cache first.  If the required content is present, •  the core takes that content from the cache, enhancing performance benefits. If the content is absent, the core will access system memory for the required content. Architecture of Multicore Processors (Caches) Contd..  Level 1, or L1, cache is the smallest and fastest cache unique to every core.  Level 2, or L2, cache is a larger storage space shared among the cores.  Some multicore processor architectures may dedicate both L1 and L2 caches. Homogenous vs. Heterogeneous Multicore Processors Homogenous vs. heterogeneous multicore processors:  The cores within a multicore processor may be homogeneous or heterogeneous.  Mainstream Intel and AMD for x86 computer architectures • multicore are homogeneous and provide identical cores. processors Homogenous vs. Heterogeneous Multicore Processors Contd..  However, dedicating a complex device to do a simple job or to get greatest efficiency is often wasteful.  There is a heterogeneous multicore processor market • that uses processors with different cores for different purposes. Homogenous vs. Heterogeneous Multicore Processors Contd..  Heterogeneous cores are generally found in embedded or Arm processors that • might mix microprocessor and microcontroller cores in the same package. Goals for Heterogeneous Multicore Processors There are three general goals for heterogeneous multicore processors:  Optimized performance  Optimized power  Optimized security Goals for Heterogeneous Multicore Processors (Optimized Performance) Optimized Performance:  While homogeneous multicore processors are typically intended to • provide universal processing capabilities, ❑ many processors are not intended for such generic system use cases. Goals for Heterogeneous Multicore Processors (Optimized Performance) Contd..  Instead, they are designed and sold for use in embedded, dedicated or task-specific systems • that can benefit from the unique strengths of different processors. Goals for Heterogeneous Multicore Processors (Optimized Performance) Contd..  For example, a processor intended for a signal processing device • might use an Arm processor ❑ that contains a Cortex-A general-purpose processor, ❑ with a Cortex-M core for dedicated signal processing tasks. Goals for Heterogeneous Multicore Processors (Optimized Power) Optimized Power:  Providing simpler processor cores reduces the transistor count and eases power demands.  This makes the processor package and the overall system cooler and more power-efficient. Goals for Heterogeneous Multicore Processors (Optimized Security) Optimized Security:  Jobs or processes can be divided among different types of cores, • enabling designers to deliberately build high levels of isolation ❑ that tightly control access among the various processor cores. Goals for Heterogeneous Multicore Processors (Optimized Security) Contd..  This greater control and isolation offer better stability and security for the overall system, • though at the cost of general flexibility. Examples of Multicore Processors Examples of multicore processors:  Most modern processors designed and sold for general-purpose x86 computing include multiple processor cores.  Examples of latest Intel processors include the following: 12th-generation multicore • Intel Core i9 12900 family provides 8 cores and 24 threads. • Intel Core i7 12700 family provides 8 cores and 20 threads. • Intel Core i5 12600 processors offer 6 cores and 16 threads. Examples of Multicore Processors Contd..  Examples of latest AMD Zen multicore processors include: • AMD Zen 3 family (provides 4 to 16 cores). • AMD Zen 2 family (provides up to 64 cores). • AMD Zen+ family (provides 4 to 32 cores). Superscalar Processor  The first commercial single-chip superscalar microprocessor MC88100 was developed by Motorola in 1988,  Later, Intel introduced its version I960CA in 1989 & • AMD 29000-series in 1990. Superscalar Processor Contd..  Even though, the implementations of superscalar are heading toward increasing complexity.  The design of these processors normally refers to a set of methods that permit the CPU of a computer to • attain a throughput of above one instruction for ❑ each cycle while executing a single sequential program. What is Superscalar Processor? What is Superscalar Processor?  A type of microprocessor that is used to implement a type of parallelism • known as instruction-level parallelism in a single processor to ❑ execute more than one instruction during a clock cycle by ▪ dispatching simultaneously various instructions to special execution units on the processor. What is Superscalar Processor? Contd..  A scalar processor executes single instruction for each clock cycle;  A superscalar processor can execute more than one instruction during a clock cycle. Features of Superscalar Processors Features of superscalar processors include the following:  Superscalar architecture is a parallel computing technique utilized in various processors.  In a superscalar computer, the CPU manages several instruction pipelines to • perform numerous instructions simultaneously during a clock cycle. Features of Superscalar Processors Contd..  Superscalar architectures include all pipelining features •  Although, there are several instructions executing simultaneously within the same pipeline. Superscalar design methods normally comprise • Parallel register renaming, • Parallel instruction decoding, • Speculative execution & out-of-order execution. Features of Superscalar Processors Contd..  So, these methods are normally used with complementing design methods like • Caching, • Pipelining, • Branch prediction & multi-core in recent microprocessor designs. Superscalar Processor Architecture  A superscalar processor is a CPU that • executes above one instruction for each Clock cycle because ❑  processing speeds are simply measured in Clock cycles for each second. Compared to a scalar processor, this processor is faster. Superscalar Processor Architecture Contd..  Superscalar processor architecture mainly includes parallel execution units •  where these units simultaneously. can implement instructions So first, this parallel architecture was implemented within a Reduced Instruction Set Computer (RISC) processor that • utilizes simple calculations. & short instructions to execute Superscalar Processor Architecture Contd..  Due to their superscalar abilities, • Normally, Reduced Instruction Set Computer (RISC) processors have performed better as compared to ❑  Complex Instruction Set Computer (CISC) processors which run at the same megahertz. But, most CISC processors now like the Intel Pentium comprise some RISC architecture also, • which allows them to perform instructions in parallel. Superscalar Processor Architecture Contd.. Superscalar Processor Architecture Contd..  The superscalar processor is equipped with several processing units for handling •  various instructions in parallel in every processing stage. By using the above architecture, • a number of instructions start execution within a similar clock cycle. Superscalar Processor Architecture Contd..  These processors are capable of •  obtaining an instruction execution output of the one instruction for each cycle. In the previous architecture diagram, a processor is used with two execution units • where one is used for integer & other one is used for the operations of floating point. Superscalar Processor Architecture Contd..  The Instruction Fetch Unit (IFU) is capable of •  instructions reading at a time & stores them within the instruction queue. In every cycle, the dispatch unit fetches & decodes • up to 2 instructions from the queue front. Superscalar Processor Architecture Contd..  If there is a single integer, single floating point instruction & no hazards, • then both instructions are dispatched within a similar clock cycle. Scalar Pipelining Pipelining:  Pipelining is the procedure of breaking down tasks into sub-steps & •  executing them within different processor parts. Pipelining architecture in the scalar processor and the superscalar processor is shown in next slides. Scalar Pipelining Contd.. Scalar Pipelining Contd..  In the previous pipeline architecture, F is fetched, D is decoded, E is executed and W is register write-back.  In this pipeline architecture, I1, I2, I3 & I4 are instructions.  The scalar processor pipeline architecture includes a single pipeline and • four stages fetch, decode, execute & result write back. Scalar Pipelining Contd..   In the single pipeline scalar processor, the pipeline in the instruction1 (I1) works as; • in the first clock period I1 it will fetch, in the second clock period it will decode and • in the second clock period, I2 will fetch. The third instruction I3 in the third clock period will fetch, I2 will decode and I1 will execute. Scalar Pipelining Contd..  In the fourth clock period, I4 will fetch, I3 will decode, I2 will execute and I1 will write in memory.  So, in seven clock periods, 4 instructions executed in a single pipeline. Super scalar Pipelining  The instructions in a superscalar processor are issued from a sequential instruction stream.  It must allow multiple instructions for each clock cycle and • the CPU must check dynamically for data dependencies between instructions. Super scalar Pipelining Contd..  In the following superscalar pipeline, • two instructions can be fetched and dispatched at a time to ❑ complete a maximum of 2 instructions per cycle. Super scalar Pipelining Super scalar Pipelining Contd..  The superscalar processor pipeline architecture includes •  two pipelines and four stages fetch, decode, execute & result write back. It is a 2-issue superscalar processor which means • at a time two instructions will fetch, decode, execute and result write back. Super scalar Pipelining Contd..  The two instructions I1 & I2 will at a time fetch, decode, execute and write back in every clock period.  Simultaneously, in the next clock period, •  the remaining two instructions I3 & I4 will at a time fetch, decode, execute and write back. So, in five clock periods, it will execute 4 instructions in a single pipeline. Super scalar Pipelining Contd..  A scalar processor issues single instruction per clock cycle and •  performs a single pipeline stage per clock cycle whereas a superscalar processor, issues two instructions per clock cycle in previous example and • it executes two instances of each stage in parallel. Super scalar Pipelining Contd..  So, the instruction execution in a scalar processor takes more time • whereas in a superscalar it takes less time to execute instructions. Types of Superscalar Processors  Some of the different types of superscalar processors are as follows: • Intel Core i7 processor • Intel Pentium Processor • IBM Power PC601 Types of Superscalar Processors (Intel Core i7 Processor) Intel Core i7 Processor:  Intel Core i7 is a superscalar processor that is based on the Nehalem micro-architecture.  In a Core i7 design, there are various processor cores where every processor core is a superscalar processor.  This is the fast version of the Intel processor used in consumerend computers & devices. Types of Superscalar Processors (Intel Core i7 Processor) Contd..  Similar to the Intel Core i5, this processor is embedded in Intel Turbo Boost Technology.  This processor is accessible in 2 to 6 varieties which support up to 12 different threads at once. Types of Superscalar Processors (Intel Core i7 Processor) Contd.. Types of Superscalar Processors (Intel Pentium Processor) Intel Pentium Processor:  In Intel Pentium processor superscalar pipelined architecture •  CPU executes a minimum of two or above instructions for each cycle. This processor is widely used in personal computers. Types of Superscalar Processors (Intel Pentium Processor) Contd..   Intel Pentium processor devices are normally built for • Online use, • Cloud computing, • & Collaboration. So this processor perfectly works for tablets and Chromebooks to • provide strong local performance & efficient online interactions. Types of Superscalar Processors (Intel Pentium Processor) Contd.. Types of Superscalar Processors (IBM Power PC601) IBM Power PC601:  The superscalar processor like IBM power PC601 is from the family of PowerPC of RISC microprocessors.  This processor is capable of issuing as well as retiring three instructions for each clock. Types of Superscalar Processors (IBM Power PC601) Contd..  Instructions are totally out of order for improved performance; • but, the PC601 make the execution emerge in order. Types of Superscalar Processors (IBM Power PC601) Contd.. Types of Superscalar Processors (IBM Power PC601) Contd..  The power PC601 processor provides • 32-bit logical addresses, • 16 & 32 bits integer data types, • 32 & 64 bits floating-point data types. Types of Superscalar Processors (IBM Power PC601) Contd..  For the implementation of 64-bit PowerPC, the architecture of this processor provides • 64-bit based floating data types, addressing & other features necessary to ❑ complete the 64-bit based architecture. Characteristics of Super scalar Processor Superscalar processor characteristics include the following:  A superscalar processor is a super-pipelined model •  where simply the independent instructions performed serially without any waiting situation. A superscalar processor fetches & decodes at a time • several instructions of the incoming instruction stream. are Characteristics of Super scalar Processor Contd..  The architecture of superscalar processor exploits • the potential of instruction-level parallelism.  Scalar processors mainly issue the single instruction for every cycle.  The number of instructions issued mainly depends on • the instructions within the instruction stream. Characteristics of Super scalar Processor Contd..  Instructions are frequently reordered to fit the architecture of the processor better.  The superscalar method is usually associated with some identifying characteristics.  Instructions are normally issued from a sequential instruction stream. Characteristics of Super scalar Processor Contd..  The CPU checks dynamically for data dependencies in between instructions at run time.  The CPU executes multiple instructions for each clock cycle. Advantages of Superscalar Processor Advantages of the superscalar processor include the following:  A superscalar processor implements parallelism in a single processor.  These processors are simply made to perform any instruction set. instruction-level Advantages of Superscalar Processor Contd..  The superscalar processor including out-of-order execution, branch prediction & speculative execution can • simply find parallelism above several basic blocks & loop iterations. Disadvantages of Superscalar Processor Disadvantages of the superscalar processor include the following:  Superscalar processors are not used much in small embedded systems due to power usage.  The problem with scheduling can happen in this architecture.  Superscalar processor increases the complexity-level in the designing of hardware. Disadvantages of Superscalar Processor Contd..  The instructions in this processor are simply fetched based on their sequential program order • but this is not the best execution order. Applications of Superscalar Processor Applications of a superscalar processor include the following:  The superscalar execution is frequently used in a laptop or desktop.  This processor simply scans the program in execution to • discover sets of instructions that can be executed as one. Applications of Superscalar Processor Contd..  A superscalar processor includes various data path hardware copies •  which execute various instructions at once. This processor is mainly designed to generate an implementation speed of above one instruction for • each clock cycle of a single sequential program. Introduction to VLIW Architecture  The limitations of the Superscalar processor are prominent • as the task of scheduling instruction becomes complex. Introduction to VLIW Architecture Contd..  Intrinsic parallelism in the instruction stream,  complexity,  cost,  and the branch instruction issue • get resolved by a higher instruction set architecture called the Very Long Instruction Word (VLIW) or VLIW Machines. Introduction to VLIW Architecture Contd..  VLIW uses Instruction Level Parallelism, • i.e., it has programs to control the parallel execution of the instructions. Introduction to VLIW Architecture Contd..  In other architectures, the performance of the processor is improved by using either of the following methods: • pipelining (break the instruction into subparts), • parallel processing (independently execute the instructions in different parts of the processor), • out-of-order-execution (execute instructions differently to the program) Introduction to VLIW Architecture Contd..  But each of the previous methods, add very much complexity to the hardware.  VLIW Architecture deals with it by depending on the compiler.  The programs decide the parallel flow of the instructions to resolve conflicts.  This increases compiler complexity but decreases hardware complexity by a lot. Features of VLIW Architecture Features:  The processors in this architecture have multiple functional units, • fetch from the Instruction cache that have Very Long Instruction Word.  Multiple independent operations are grouped together in a single VLIW Instruction.  They are initialized in the same clock cycle.  Each operation is assigned an independent functional unit. Features of VLIW Architecture Contd..  All the functional units share a common register file.  Instruction words are typically of the length 64 to 1024 bits depending on • the number of execution unit, • and the code length required to control each unit. Features of VLIW Architecture Contd..  Instruction scheduling and parallel dispatch of the word is done statically by the compiler.  The compiler checks for dependencies before scheduling parallel execution of the instructions. Applications of VLIW Architecture Some common applications of VLIW architecture include:  Digital Signal Processing  Multimedia Processing  Scientific Computing  Embedded Systems Applications of VLIW Architecture (Digital Signal Processing) Digital Signal Processing (DSP):  VLIW processors are well-suited for DSP applications because of •  their ability to perform multiple operations in parallel. DSP applications require high computational power • and often involve multiple parallel data streams, ❑ which VLIW processors can handle, efficiently. Applications of VLIW Architecture (Multimedia Processing) Multimedia Processing:  VLIW processors are also used for multimedia applications such as video and audio processing, • where high throughput and parallelism are required. Applications of VLIW Architecture (Scientific Computing) Scientific Computing:  VLIW processors can be used for scientific computing applications, • where high-performance computing is required to solve complex numerical problems. Applications of VLIW Architecture (Embedded Systems) Embedded Systems:  VLIW processors are used in many embedded systems, such as • Automotive control systems, • Medical devices, • and Industrial automation equipment. Applications of VLIW Architecture (Embedded Systems) Contd..  These systems require high-performance processors • that can execute multiple instructions in parallel while consuming minimal power. Advantages of VLIW Architecture Advantages:  Reduces hardware complexity.  Reduces power consumption because of reduction of hardware complexity. Advantages of VLIW Architecture Contd..  Since compiler takes care of • Data dependency check, • Decoding, • Instruction issues, Hence, it becomes a lot simpler. Advantages of VLIW Architecture Contd..  Increases potential clock rate.  Functional units are positioned corresponding to the instruction pocket by compiler. Disadvantages of VLIW Architecture Disadvantages:  Complex compilers are required which are hard to design.  Increased program code size. Disadvantages of VLIW Architecture Contd..  Unscheduled events, •  for example, a cache miss could lead to a stall which will stall the entire processor. In case of un-filled opcodes in a VLIW, • there is waste of memory space and instruction bandwidth. Vector Processor  Vector processor is basically a central processing unit •  that has the ability to execute the complete vector input in a single instruction. More specifically we can say, it is a complete unit of hardware resources • that executes sequential set of similar data items in the memory using a single instruction. Vector Processor Contd..  Elements of the vector are ordered properly to have successive addressing format of the memory. •  This is the reason that it implements the data sequentially. It holds a single control unit but has multiple execution units • that perform the same operation on different data elements of the vector. Vector Processor Contd..  Unlike scalar processors that operate on only a single pair of data, a vector processor operates on multiple pair of data.  However, one can convert a scalar code into vector code.  This conversion process is known as vectorization.  Vector processing allows operation on multiple data elements by the help of single instruction. Vector Processor Contd..  These instructions are said to be single instruction multiple data or vector instructions.  The CPU used in recent time makes use of vector processing as it is advantageous than scalar processing. Architecture and Working  The figure below represents the typical diagram showing vector processing by a vector computer: Architecture and Working Contd.. Architecture and Working Contd..  The functional units of a vector computer are as follows: • IPU or Instruction Processing Unit • Vector register • Scalar register • Scalar processor Architecture and Working Contd.. • Vector instruction controller • Vector access controller • Vector processor Architecture and Working Contd..  As vector computer has several functional pipes thus it can execute the instructions over the operands.  Both data and instructions are present in the memory at the desired memory location.  So, the instruction processing unit i.e., IPU fetches the instruction from the memory. Architecture and Working Contd..  Once the instruction is fetched •  then IPU determines either the fetched instruction is scalar or vector in nature. If it is scalar in nature, then • the instruction is transferred to the scalar register • and then further scalar processing is performed. Architecture and Working Contd..  While, when the instruction is vector in nature •  then it is fed to the vector instruction controller. This vector instruction controller first decodes the vector instruction • then accordingly determines the address of the vector operand present in the memory. Architecture and Working Contd..  Then it gives a signal to the vector access controller about • the demand of the respective operand.  This vector access controller then fetches the desired operand from the memory.  Once the operand is fetched then it is provided to the instruction register • so that it can be processed at the vector processor. Architecture and Working Contd..  At times, when multiple vector instructions are present, •  then the vector instruction controller provides the multiple vector instructions to the task system. And in case, the task system shows that the vector task is very long • then the processor divides the task into sub-vectors. Architecture and Working Contd..  These sub-vectors are fed to the vector processor • that makes use of several pipelines ❑  in order to execute the instruction over the operand fetched from the memory at the same time. The various vector instructions are scheduled by the vector instruction controller. Classification of Vector Processor   The classification of vector processor relies on • the ability of vector formation, • as well as the presence of vector instruction for processing. So, depending on these criteria, vector processor architecture is classified as follows: • Register to Register Architecture • Memory to Memory Architecture Classification of Vector Processor Contd.. Register to Register Architecture  This architecture is highly used in vector computers.  In this architecture, fetching of the operand or the previous results • indirectly takes place through the main memory by the use of registers. Register to Register Architecture Contd..   Several vector pipelines present in the vector computer help in • retrieving data from the registers, • and also storing the results in the register. These vector registers are user instruction programmable. Register to Register Architecture Contd..  This means that according to the register address present in the instruction, •  the data is fetched and stored in the desired register. These vector registers hold fixed length • like the register length in a normal processing unit. Register to Register Architecture Contd..  Some examples of a supercomputer using the register to register architecture are following: • Cray – 1 • Fujitsu Memory to Memory Architecture  In memory to memory architecture, •  the operands or the results are directly fetched from the memory despite using registers. However, it is to be noted here that the address of the desired data to be accessed • must be present in the vector instruction. Memory to Memory Architecture Contd..  This architecture enables the fetching of data of size 512 bits from memory to pipeline.  However, due to high memory access time, • pipelines of the vector computer requires higher startup time, ❑ as higher time is required to initiate the vector instruction. Memory to Memory Architecture Contd..  Some examples of supercomputers that possess memory to memory architecture are following: • Cyber 205 • CDC Characteristics of Vector Processor Characteristics of Vector Processor:  Vector Processors are designed to process multiple data elements in parallel, •  while Scalar Processors process one data element at a time. Vector Processors can be more efficient, • as they can complete a given task with fewer instructions than a Scalar Processor. Characteristics of Vector Processor Contd..  Vector Processors are more complex than Scalar Processors, •  and require more memory as well as power to operate. Vector Processors are used for more demanding tasks, such as • scientific calculations, • 3D game rendering. Characteristics of Vector Processor Contd..   while Scalar Processors are used for simpler tasks, such as • basic calculations, • and web browsing. Vector Processors are more suitable for data-intensive applications, • while Scalar Processors are better suited for applications that require fewer calculations. Characteristics of Vector Processor Contd..  Vector Processors can be more expensive than Scalar Processors, •  as they require more complex hardware and software. Register to register architecture is better than memory to memory architecture • because it offers a reduction in vector access time. Advantages of Vector Processor  Better performance  Highly parallel  High memory bandwidth  Reduced software overhead  Improved accuracy Advantages of Vector Processor Contd.. Better Performance:  Vector processors simultaneously, • can process multiple operations increasing the speed of calculations. Highly Parallel:  Vector processors are able to handle multiple operations in parallel, • allowing for faster computations. Advantages of Vector Processor Contd.. High Memory Bandwidth:  Vector processors are able to access large amounts of data at once, • increasing the speed of computations. Advantages of Vector Processor Contd.. Reduced Software Overhead:  Vector processors can reduce the amount of software code needed to complete tasks, • saving time and resources. Improved Accuracy:  Vector processors are more accurate than scalar processors, • making them ideal for applications that require precision. Advantages of Vector Processor Contd..  Vector processor uses vector instructions •  The sequential arrangement of data helps to handle data •  by which code density of the instructions can be improved. by the hardware in a better way. It offers a reduction in instruction bandwidth. Applications of Vector Processor  Computer Aided Design  Image Processing  Virtual Reality  Scientific Computing  Artificial Intelligence  Data Analysis Applications of Vector Processor Contd.. Computer-Aided Design (CAD):  CAD software allows for the creation of realistic 3D models, •  which can be used for product design, engineering, and architecture. Vector processor power makes it easier to manipulate complex models and make changes quickly. Applications of Vector Processor Contd.. Image Processing:  Vector processors are used to manipulate and analyze images.  This can include tasks such as • edge detection, • object recognition, • and facial recognition. Applications of Vector Processor Contd.. Virtual Reality:  Vector processors are used to render realistic 3D graphics in virtual reality applications.  This allows users to experience a more immersive experience when interacting with virtual environments. Applications of Vector Processor Contd.. Scientific Computing:  Vector processors are used to perform complex calculations in scientific computing applications.  This can include tasks such as calculating weather patterns or complex simulations. Applications of Vector Processor Contd.. Artificial Intelligence:  Vector processors are used in helping train and test neural networks for artificial intelligence applications.  This can include tasks such as • Facial recognition, • Object recognition, • and Natural language processing. Applications of Vector Processor Contd.. Data Analysis:  Vector processors are used to analyze large amounts of data quickly.  This can include tasks such as analyzing customer data or financial data.

Multi-Core Processors: Architecture & Performance

Related documents

Products

Support

Multi-Core Processors: Architecture & Performance

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib