Uploaded by developers433

Computer Architecture UNIT 4 Complete (1)

advertisement
UNIT-4
Contents

Multi-Core Processors

Super Scalar Processors

Very Long Instruction Word (VLIW)

Vector Processors
Multi-core Processors


A multicore processor is an integrated circuit that has two or
more processor cores attached for
•
Enhanced performance and
•
Reduced power consumption.
These processors also enable more efficient simultaneous
processing of multiple tasks,
•
such as parallel processing and multithreading.
Multi-core Processors Contd..

A dual core setup is similar to having multiple, separate
processors installed on a computer.

However, because the two processors are plugged into the same
socket,
•
the connection between them is faster.
Multi-core Processors Contd..

The use of multicore processors is one approach to boost
processor performance
•

without exceeding the practical
semiconductor design and fabrication.
limitations
of
Using multicores also ensure safe operation in areas such as heat
generation.
How do multicore processors work?

The heart of every processor is an execution engine, also known
as a core.

The core is designed to process instructions and data according
to
•

the software programs in the computer's memory.
Over the years, designers found that every new processor design
has limits.
How do multicore processors work?
Contd..

Numerous technologies were
performance, as following ones:
•
Clock Speed
•
Hyper-threading
•
More Chips
utilized
to
accelerate
How do multicore processors work?
(Clock Speed)
Clock Speed:

One approach is to make the processor's clock faster.

Clock is the "drumbeat" used to
•

synchronize the processing of instructions and data
through the processing engine.
Clock speeds have accelerated from several megahertz (MHz)
to several gigahertz (GHz) nowadays.
How do multicore processors work?
(Clock Speed) Contd..

However, transistors use up power with each clock tick.

As a result, clock speeds have nearly reached their limits
•
using current semiconductor fabrication and heat
management techniques.
How do multicore processors work?
(Hyper Threading)
Hyper Threading:

Another approach involved the handling of multiple instruction
threads.

Intel calls this hyper-threading.

With hyper-threading, processor cores are designed to
•
handle two separate instruction threads at the same time.
How do multicore processors work?
(Hyper Threading) Contd..

When properly enabled and supported by both the computer's
firmware and operating system,
•

hyper-threading techniques enable one physical core to
function as two logical cores.
Still, the processor only possesses a single physical core.
How do multicore processors work?
(Hyper Threading) Contd..

The logical abstraction of the physical processor added little real
performance to the processor
•
other than to help streamline the behavior of multiple
simultaneous applications running on the computer.
How do multicore processors work?
(More Chips)
More Chips:

The next step is to add processor chips to the processor package,
•
which is the physical device that plugs into the
motherboard.

A dual-core processor includes two separate processor cores.

A quad-core processor includes four separate cores.
How do multicore processors work?
Contd..

Today's multicore processors can easily include 12, 24 or even
more processor cores.

The multicore approach is almost identical to the use of
multiprocessor motherboards,
•
which have two or four separate processor sockets.
How do multicore processors work?
Contd..

Today's huge processor performance involves the use of
processor products
•
that combine fast clock speeds and multiple hyperthreaded cores.
How do multicore processors work?
Contd..

However, multicore chips have several issues to consider.

First, the addition of more processor cores doesn't automatically
improve computer performance.

The OS and applications must direct software program
instructions to
•
recognize and use the multiple cores.
How do multicore processors work?
Contd..

This must be done in parallel,
•

Some software applications may need to be refactored to
•

using various threads to different cores within the
processor package.
support and use multicore processor platforms.
Otherwise, only the default first processor core is used, and any
additional cores are unused or idle.
How do multicore processors work?
Contd..

Second, the performance benefit of additional cores is not a
direct multiple.

That is, adding a second core does not double the processor's
performance,
•
or a quad-core processor does not multiply the processor's
performance by a factor of four.
How do multicore processors work?
Contd..

This happens because of the shared elements of the processor,
such as access to
•
Internal memory or caches,
•
External buses,
•
and Computer system memory.
How do multicore processors work?
Contd..

The benefit of multiple cores can be substantial, but there are
practical limits.

Still, the acceleration is typically better than a traditional
multiprocessor system because
•
the coupling between cores in the same package is tighter and
•
there are shorter distances and fewer components between
cores.
How do multicore processors work?
Contd..

Consider the analogy of cars on a road.

Each car can be considered as a processor,
•

but each car must share the common roads and traffic
limitations.
More cars can transport more people and goods in a given time,
•
but more cars also cause congestion.
Why are multicore processors used?

Multicore processors work on any modern computer hardware
platform.

Virtually, all PCs and laptops today build in some multicore
processor model.

However, the true power and benefit of these processors depend on
•
software applications designed to emphasize parallelism.
Why are multicore processors used?
Contd..

A parallel approach divides application work into numerous
processing threads,
•
and then distributes and manages those threads across
two or more processor cores.
Major Use cases for Multicore Processors
There are several major use cases for multicore processors,
including the following five:
•
Virtualization
•
Databases
•
Analytics & HPC
•
Cloud
•
Visualization
Major Use cases for Multicore Processors
(Visualization)
Virtualization:

A virtualization platform, such as VMware, is designed to
•

abstract the software environment from the underlying
hardware.
Virtualization is capable of abstracting physical processor cores into
•
virtual processors or central processing units (vCPUs)
❑
which are then assigned to Virtual Machines (VMs).
Major Use cases for Multicore Processors
(Visualization) Contd..

Each VM becomes a virtual server capable of running its own
OS and application.

It is possible to assign more than one vCPU to each VM,
•
allowing each VM and its application to run parallel
processing software if required.
Major Use cases for Multicore Processors
(Databases)
Databases:

A database is a complex software platform that frequently needs
to run many simultaneous tasks such as queries.

As a result, databases are highly dependent on multicore
processors to
•
distribute and handle these many task threads.
Major Use cases for Multicore Processors
(Databases) Contd..

The use of multiple processors in databases is often coupled with
extremely high memory capacity
•
that can reach 1 terabyte or more on the physical server.
Major Use cases for Multicore Processors
(Analytics & HPC)
Analytics and HPC:

Big data analytics, such as
•
machine learning and High Performance Computing
(HPC) both require
❑
breaking large & complex tasks into smaller and
more manageable pieces.
Major Use cases for Multicore Processors
(Analytics & HPC) Contd..

Each piece of the computational effort can then be solved by
•

distributing each piece of the problem to a different
processor.
This approach enables each processor to work in parallel to
•
solve the overarching problem far faster and more
efficiently than with a single processor.
Major Use cases for Multicore Processors
(Cloud)
Cloud:

Organizations building a cloud adopt multicore processors to
•
support all the virtualization needed to
❑
accommodate the highly scalable,
❑
and highly transactional demands of cloud
software platforms such as OpenStack.
Major Use cases for Multicore Processors
(Cloud) Contd..

A set of servers with multicore processors can allow the cloud to
•
create and scale up more VM instances on demand.
Major Use cases for Multicore Processors
(Visualization)
Visualization:

Graphics applications, such as games and data-rendering engines,
•

have the same parallelism requirements as other HPC
applications.
Visual rendering is task-intensive,
•
So visualization applications can make extensive use of
multiple processors to distribute the calculations required.
Major Use cases for Multicore Processors
(Visualization) Contd..

Many graphics applications rely on Graphics Processing Units
(GPUs) rather than CPUs.

GPUs are tailored to optimize graphics-related tasks.

GPU packages often contain multiple GPU cores, similar in
principle to multicore processors.
Pros and cons of multicore processors

Multicore processor technology is mature and well-defined.

However, the technology poses its share of pros and cons,
•
which should be considered when buying and deploying
new servers.
Advantages of Multicore Processor
Some of the advantages of multicore processors are following:

Better application performance

Better hardware performance
Advantages of Multicore Processor
(Better Application Performance)
Better Application Performance:

The principle benefit of multicore processors is more potential
processing capability.

Each processor core is effectively a separate processor that
OSes and applications can use.

In a virtualized server, each VM can employ one or more
virtualized processor cores,
•
enabling many VMs to coexist
simultaneously on a physical server.
and
operate
Advantages of Multicore Processor Contd..
(Better Application Performance)

Similarly, an application designed for high levels of parallelism
may use any number of cores to
•
provide high application performance that would be
impossible with single-chip systems.
Advantages of Multicore Processor
(Better Hardware Performance)
Better Hardware Performance:

By placing two or more processor cores on the same device, it
can use shared components such as
•
Common internal buses,
•
and Processor caches more efficiently.
Advantages of Multicore Processor Contd..
(Better Hardware Performance)

It also benefits from superior performance compared with
multiprocessor systems
•
that have separate processor packages on the same
motherboard.
Disadvantages of Multicore Processor
Some of the disadvantages of multicore processor are following:

Software dependent,

Performance boosts are limited,

Power, heat and clock restrictions.
Disadvantages of Multicore Processor
(Software Dependent)
Software Dependent:

The application uses processors not the other way around.

OSes and applications are always default to use the first
processor core, dubbed core 0.

Any additional cores in the processor package will remain unused
or idle
•
until software applications are enabled to use them.
Disadvantages of Multicore Processor
(Software Dependent)
Contd..

Such applications include database applications and big data
processing tools like Hadoop.

A business should consider for what a server will be used and
the applications it plans to use
•
before making a multicore system investment
❑
to ensure that the system delivers its optimum
computing potential.
Disadvantages of Multicore Processor
(Performance boosts are limited)
Performance boosts are limited:

Multiple processors in a processor package must share common
system buses and processor caches.

The more processor cores share a package,
•
the more sharing take place across common processor
interfaces and resources.
Disadvantages of Multicore Processor
(Performance boosts are limited)
Contd..

This results in diminishing returns to performance as cores are
added.

For most situations, the performance benefit of having multiple
cores
•
far outweighs the performance lost to such sharing,
❑
but it's a factor to consider when testing application
performance.
Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Power, heat and clock restrictions:

A computer may not be able to drive a processor with many
cores
•

as hard as a processor with fewer cores or a single-core
processor.
A modern processor core may contain over 500 million
transistors.
Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Contd..

Each transistor generates heat when it switches,
•

and this heat increases as the clock speed increases.
All of that heat generation must be safely dissipated
•
from the core through the processor package.
Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Contd..

When more cores are running,
•

this heat can multiply and quickly exceed the cooling
capability of the processor package.
Thus, some multicore processors may actually reduce clock
speeds for instance,
•
from 3.5 GHz to 3.0 GHz to help manage heat.
Disadvantages of Multicore Processor
(Power, heat and clock restrictions)
Contd..

This reduces the performance of all processor cores in the
package.

High-end multicore processors require
•
complex cooling systems,
•
and careful deployment & monitoring
to ensure long-term system reliability.
Architecture of Multicore Processors
Architecture of Multicore Processors
Contd..
The components of multicore processors are as follows:

Cores

Processor Support

Caches
Architecture of Multicore Processors
(Cores)
Cores:

Every multicore processor consists of two or more cores along
with a series of caches.

Cores are the central component of multicore processors.
Architecture of Multicore Processors
(Cores) Contd..

Cores contain
•
all of the registers and circuitry,
•
sometimes hundreds of millions of individual transistors
needed to
❑
perform the closely-synchronized tasks of ingesting
data and instruction,
❑
process content and outputting logical decisions or
results.
Architecture of Multicore Processors
(Processor Support)
Processor Support:

Processor support circuitry includes an assortment
input/output control and management circuitry, such as
•
Clocks,
•
Cache consistency,
•
Power & thermal control,
•
and External bus access.
of
Architecture of Multicore Processors
(Caches)
Caches:

Caches are relatively small areas of very fast memory.

A cache retains often-used instructions or data,
•
making that content readily available to the core
❑
without the need to access system memory.
Architecture of Multicore Processors
(Caches) Contd..

A processor checks the cache first.

If the required content is present,
•

the core takes that content from the cache, enhancing
performance benefits.
If the content is absent, the core will access system memory for
the required content.
Architecture of Multicore Processors
(Caches) Contd..

Level 1, or L1, cache is the smallest and fastest cache unique to
every core.

Level 2, or L2, cache is a larger storage space shared among
the cores.

Some multicore processor architectures may dedicate both L1 and
L2 caches.
Homogenous vs. Heterogeneous
Multicore Processors
Homogenous vs. heterogeneous multicore processors:

The cores within a multicore processor may be homogeneous or
heterogeneous.

Mainstream Intel and AMD
for x86 computer architectures
•
multicore
are homogeneous and provide identical cores.
processors
Homogenous vs. Heterogeneous
Multicore Processors
Contd..

However, dedicating a complex device to do a simple job or to
get greatest efficiency is often wasteful.

There is a heterogeneous multicore processor market
•
that uses processors with different cores for different
purposes.
Homogenous vs. Heterogeneous
Multicore Processors
Contd..

Heterogeneous cores are generally found in embedded or Arm
processors that
•
might mix microprocessor and microcontroller cores in
the same package.
Goals for Heterogeneous Multicore Processors
There are three general goals for heterogeneous multicore
processors:

Optimized performance

Optimized power

Optimized security
Goals for Heterogeneous Multicore Processors
(Optimized Performance)
Optimized Performance:

While homogeneous multicore processors are typically intended
to
•
provide universal processing capabilities,
❑
many processors are not intended for such
generic system use cases.
Goals for Heterogeneous Multicore Processors
(Optimized Performance) Contd..

Instead, they are designed and sold for use in embedded,
dedicated or task-specific systems
•
that can benefit from the unique strengths of different
processors.
Goals for Heterogeneous Multicore Processors
(Optimized Performance) Contd..

For example, a processor intended for a signal processing device
•
might use an Arm processor
❑
that contains a Cortex-A general-purpose processor,
❑
with a Cortex-M core for dedicated signal processing
tasks.
Goals for Heterogeneous Multicore Processors
(Optimized Power)
Optimized Power:

Providing simpler processor cores reduces the transistor count
and eases power demands.

This makes the processor package and the overall system cooler
and more power-efficient.
Goals for Heterogeneous Multicore Processors
(Optimized Security)
Optimized Security:

Jobs or processes can be divided among different types of cores,
•
enabling designers to deliberately build high levels of
isolation
❑
that tightly control access among the various
processor cores.
Goals for Heterogeneous Multicore Processors
(Optimized Security) Contd..

This greater control and isolation offer better stability and
security for the overall system,
•
though at the cost of general flexibility.
Examples of Multicore Processors
Examples of multicore processors:

Most modern processors designed and sold for general-purpose
x86 computing include multiple processor cores.

Examples
of
latest
Intel
processors include the following:
12th-generation
multicore
•
Intel Core i9 12900 family provides 8 cores and 24 threads.
•
Intel Core i7 12700 family provides 8 cores and 20 threads.
•
Intel Core i5 12600 processors offer 6 cores and 16 threads.
Examples of Multicore Processors
Contd..

Examples of latest AMD Zen multicore processors include:
•
AMD Zen 3 family (provides 4 to 16 cores).
•
AMD Zen 2 family (provides up to 64 cores).
•
AMD Zen+ family (provides 4 to 32 cores).
Superscalar Processor

The first commercial single-chip superscalar microprocessor
MC88100 was developed by Motorola in 1988,

Later, Intel introduced its version I960CA in 1989 &
•
AMD 29000-series in 1990.
Superscalar Processor
Contd..

Even though, the implementations of superscalar are heading
toward increasing complexity.

The design of these processors normally refers to a set of methods
that permit the CPU of a computer to
•
attain a throughput of above one instruction for
❑
each cycle while executing a single sequential
program.
What is Superscalar Processor?
What is Superscalar Processor?

A type of microprocessor that is used to implement a type of
parallelism
•
known as instruction-level parallelism in a single processor
to
❑
execute more than one instruction during a clock
cycle by
▪
dispatching simultaneously various instructions
to special execution units on the processor.
What is Superscalar Processor?
Contd..

A scalar processor executes single instruction for each clock
cycle;

A superscalar processor can execute more than one instruction
during a clock cycle.
Features of Superscalar Processors
Features of superscalar processors include the following:

Superscalar architecture is a parallel computing technique
utilized in various processors.

In a superscalar computer, the CPU manages several instruction
pipelines to
•
perform numerous instructions simultaneously during a
clock cycle.
Features of Superscalar Processors
Contd..

Superscalar architectures include all pipelining features
•

Although, there are several instructions executing
simultaneously within the same pipeline.
Superscalar design methods normally comprise
•
Parallel register renaming,
•
Parallel instruction decoding,
•
Speculative execution & out-of-order execution.
Features of Superscalar Processors
Contd..

So, these methods are normally used with complementing
design methods like
•
Caching,
•
Pipelining,
•
Branch prediction & multi-core in recent microprocessor
designs.
Superscalar Processor Architecture

A superscalar processor is a CPU that
•
executes above one instruction for each Clock cycle
because
❑

processing speeds are simply measured in Clock
cycles for each second.
Compared to a scalar processor, this processor is faster.
Superscalar Processor Architecture
Contd..

Superscalar processor architecture mainly includes parallel
execution units
•

where these units
simultaneously.
can
implement
instructions
So first, this parallel architecture was implemented within a
Reduced Instruction Set Computer (RISC) processor that
•
utilizes simple
calculations.
&
short
instructions
to
execute
Superscalar Processor Architecture
Contd..

Due to their superscalar abilities,
•
Normally,
Reduced
Instruction
Set
Computer
(RISC) processors have performed better as compared to
❑

Complex Instruction Set Computer (CISC) processors
which run at the same megahertz.
But, most CISC processors now like the Intel Pentium comprise
some RISC architecture also,
•
which allows them to perform instructions in parallel.
Superscalar Processor Architecture
Contd..
Superscalar Processor Architecture
Contd..

The superscalar processor is equipped with several processing
units for handling
•

various instructions in parallel in every processing stage.
By using the above architecture,
•
a number of instructions start execution within a similar
clock cycle.
Superscalar Processor Architecture
Contd..

These processors are capable of
•

obtaining an instruction execution output of the one
instruction for each cycle.
In the previous architecture diagram, a processor is used with
two execution units
•
where one is used for integer & other one is used for the
operations of floating point.
Superscalar Processor Architecture
Contd..

The Instruction Fetch Unit (IFU) is capable of
•

instructions reading at a time & stores them within the
instruction queue.
In every cycle, the dispatch unit fetches & decodes
•
up to 2 instructions from the queue front.
Superscalar Processor Architecture
Contd..

If there is a single integer, single floating point instruction & no
hazards,
•
then both instructions are dispatched within a similar
clock cycle.
Scalar Pipelining
Pipelining:

Pipelining is the procedure of breaking down tasks into sub-steps &
•

executing them within different processor parts.
Pipelining architecture in the scalar processor and the superscalar
processor is shown in next slides.
Scalar Pipelining Contd..
Scalar Pipelining Contd..

In the previous pipeline architecture, F is fetched, D is decoded,
E is executed and W is register write-back.

In this pipeline architecture, I1, I2, I3 & I4 are instructions.

The scalar processor pipeline architecture includes a single
pipeline and
•
four stages fetch, decode, execute & result write back.
Scalar Pipelining Contd..


In the single pipeline scalar processor, the pipeline in the
instruction1 (I1) works as;
•
in the first clock period I1 it will fetch, in the second clock
period it will decode and
•
in the second clock period, I2 will fetch.
The third instruction I3 in the third clock period will fetch, I2
will decode and I1 will execute.
Scalar Pipelining Contd..

In the fourth clock period, I4 will fetch, I3 will decode, I2 will
execute and I1 will write in memory.

So, in seven clock periods, 4 instructions executed in a single
pipeline.
Super scalar Pipelining

The instructions in a superscalar processor are issued from a
sequential instruction stream.

It must allow multiple instructions for each clock cycle and
•
the CPU must check dynamically for data dependencies
between instructions.
Super scalar Pipelining Contd..

In the following superscalar pipeline,
•
two instructions can be fetched and dispatched at a time to
❑
complete a maximum of 2 instructions per cycle.
Super scalar Pipelining
Super scalar Pipelining Contd..

The superscalar processor pipeline architecture includes
•

two pipelines and four stages fetch, decode, execute &
result write back.
It is a 2-issue superscalar processor which means
•
at a time two instructions will fetch, decode, execute and
result write back.
Super scalar Pipelining Contd..

The two instructions I1 & I2 will at a time fetch, decode,
execute and write back in every clock period.

Simultaneously, in the next clock period,
•

the remaining two instructions I3 & I4 will at a time
fetch, decode, execute and write back.
So, in five clock periods, it will execute 4 instructions in a
single pipeline.
Super scalar Pipelining Contd..

A scalar processor issues single instruction per clock cycle and
•

performs a single pipeline stage per clock cycle
whereas a superscalar processor, issues two instructions per
clock cycle in previous example and
•
it executes two instances of each stage in parallel.
Super scalar Pipelining Contd..

So, the instruction execution in a scalar processor takes more
time
•
whereas in a superscalar it takes less time to execute
instructions.
Types of Superscalar Processors

Some of the different types of superscalar processors are as
follows:
•
Intel Core i7 processor
•
Intel Pentium Processor
•
IBM Power PC601
Types of Superscalar Processors
(Intel Core i7 Processor)
Intel Core i7 Processor:

Intel Core i7 is a superscalar processor that is based on the
Nehalem micro-architecture.

In a Core i7 design, there are various processor cores where
every processor core is a superscalar processor.

This is the fast version of the Intel processor used in consumerend computers & devices.
Types of Superscalar Processors
(Intel Core i7 Processor)
Contd..

Similar to the Intel Core i5, this processor is embedded in Intel
Turbo Boost Technology.

This processor is accessible in 2 to 6 varieties which support up
to 12 different threads at once.
Types of Superscalar Processors
(Intel Core i7 Processor)
Contd..
Types of Superscalar Processors
(Intel Pentium Processor)
Intel Pentium Processor:

In Intel Pentium processor superscalar pipelined architecture
•

CPU executes a minimum of two or above instructions
for each cycle.
This processor is widely used in personal computers.
Types of Superscalar Processors
(Intel Pentium Processor)
Contd..


Intel Pentium processor devices are normally built for
•
Online use,
•
Cloud computing,
•
& Collaboration.
So this processor perfectly works for tablets and Chromebooks to
•
provide strong local performance & efficient online
interactions.
Types of Superscalar Processors
(Intel Pentium Processor)
Contd..
Types of Superscalar Processors
(IBM Power PC601)
IBM Power PC601:

The superscalar processor like IBM power PC601 is from the
family of PowerPC of RISC microprocessors.

This processor is capable of issuing as well as retiring three
instructions for each clock.
Types of Superscalar Processors
(IBM Power PC601)
Contd..

Instructions are totally out of order for improved performance;
•
but, the PC601 make the execution emerge in order.
Types of Superscalar Processors
(IBM Power PC601)
Contd..
Types of Superscalar Processors
(IBM Power PC601)
Contd..

The power PC601 processor provides
•
32-bit logical addresses,
•
16 & 32 bits integer data types,
•
32 & 64 bits floating-point data types.
Types of Superscalar Processors
(IBM Power PC601)
Contd..

For the implementation of 64-bit PowerPC, the architecture of
this processor provides
•
64-bit based floating data types, addressing & other
features necessary to
❑
complete the 64-bit based architecture.
Characteristics of Super scalar Processor
Superscalar processor characteristics include the following:

A superscalar processor is a super-pipelined model
•

where simply the independent instructions
performed serially without any waiting situation.
A superscalar processor fetches & decodes at a time
•
several instructions of the incoming instruction stream.
are
Characteristics of Super scalar Processor
Contd..

The architecture of superscalar processor exploits
•
the potential of instruction-level parallelism.

Scalar processors mainly issue the single instruction for every
cycle.

The number of instructions issued mainly depends on
•
the instructions within the instruction stream.
Characteristics of Super scalar Processor
Contd..

Instructions are frequently reordered to fit the architecture of
the processor better.

The superscalar method is usually associated with some
identifying characteristics.

Instructions are normally issued from a sequential instruction
stream.
Characteristics of Super scalar Processor
Contd..

The CPU checks dynamically for data dependencies in between
instructions at run time.

The CPU executes multiple instructions for each clock cycle.
Advantages of Superscalar Processor
Advantages of the superscalar processor include the following:

A superscalar processor implements
parallelism in a single processor.

These processors are simply made to perform any instruction
set.
instruction-level
Advantages of Superscalar Processor
Contd..

The superscalar processor including out-of-order execution,
branch prediction & speculative execution can
•
simply find parallelism above several basic blocks &
loop iterations.
Disadvantages of Superscalar Processor
Disadvantages of the superscalar processor include the following:

Superscalar processors are not used much in small embedded
systems due to power usage.

The problem with scheduling can happen in this architecture.

Superscalar processor increases the complexity-level in the
designing of hardware.
Disadvantages of Superscalar Processor
Contd..

The instructions in this processor are simply fetched based on
their sequential program order
•
but this is not the best execution order.
Applications of Superscalar Processor
Applications of a superscalar processor include the following:

The superscalar execution is frequently used in a laptop or
desktop.

This processor simply scans the program in execution to
•
discover sets of instructions that can be executed as one.
Applications of Superscalar Processor
Contd..

A superscalar processor includes various data path hardware
copies
•

which execute various instructions at once.
This processor is mainly designed to generate an implementation
speed of above one instruction for
•
each clock cycle of a single sequential program.
Introduction to VLIW Architecture

The limitations of the Superscalar processor are prominent
•
as the task of scheduling instruction becomes complex.
Introduction to VLIW Architecture
Contd..

Intrinsic parallelism in the instruction stream,

complexity,

cost,

and the branch instruction issue
•
get resolved by a higher instruction set architecture called
the Very Long Instruction Word (VLIW) or VLIW
Machines.
Introduction to VLIW Architecture
Contd..

VLIW uses Instruction Level Parallelism,
•
i.e., it has programs to control the parallel execution of
the instructions.
Introduction to VLIW Architecture
Contd..

In other architectures, the performance of the processor is
improved by using either of the following methods:
•
pipelining (break the instruction into subparts),
•
parallel processing (independently execute the instructions
in different parts of the processor),
•
out-of-order-execution (execute instructions differently to
the program)
Introduction to VLIW Architecture
Contd..

But each of the previous methods, add very much complexity to
the hardware.

VLIW Architecture deals with it by depending on the compiler.

The programs decide the parallel flow of the instructions to
resolve conflicts.

This increases compiler complexity but decreases hardware
complexity by a lot.
Features of VLIW Architecture
Features:

The processors in this architecture have multiple functional units,
•
fetch from the Instruction cache that have Very Long
Instruction Word.

Multiple independent operations are grouped together in a
single VLIW Instruction.

They are initialized in the same clock cycle.

Each operation is assigned an independent functional unit.
Features of VLIW Architecture
Contd..

All the functional units share a common register file.

Instruction words are typically of the length 64 to 1024 bits
depending on
•
the number of execution unit,
•
and the code length required to control each unit.
Features of VLIW Architecture
Contd..

Instruction scheduling and parallel dispatch of the word is
done statically by the compiler.

The compiler checks for dependencies before scheduling
parallel execution of the instructions.
Applications of VLIW Architecture
Some common applications of VLIW architecture include:

Digital Signal Processing

Multimedia Processing

Scientific Computing

Embedded Systems
Applications of VLIW Architecture
(Digital Signal Processing)
Digital Signal Processing (DSP):

VLIW processors are well-suited for DSP applications because of
•

their ability to perform multiple operations in parallel.
DSP applications require high computational power
•
and often involve multiple parallel data streams,
❑
which VLIW processors can handle, efficiently.
Applications of VLIW Architecture
(Multimedia Processing)
Multimedia Processing:

VLIW processors are also used for multimedia applications such
as video and audio processing,
•
where high throughput and parallelism are required.
Applications of VLIW Architecture
(Scientific Computing)
Scientific Computing:

VLIW processors can be used for scientific computing
applications,
•
where high-performance computing is required to solve
complex numerical problems.
Applications of VLIW Architecture
(Embedded Systems)
Embedded Systems:

VLIW processors are used in many embedded systems, such as
•
Automotive control systems,
•
Medical devices,
•
and Industrial automation equipment.
Applications of VLIW Architecture
(Embedded Systems) Contd..

These systems require high-performance processors
•
that can execute multiple instructions in parallel while
consuming minimal power.
Advantages of VLIW Architecture
Advantages:

Reduces hardware complexity.

Reduces power consumption because of reduction of hardware
complexity.
Advantages of VLIW Architecture
Contd..

Since compiler takes care of
•
Data dependency check,
•
Decoding,
•
Instruction issues,
Hence, it becomes a lot simpler.
Advantages of VLIW Architecture
Contd..

Increases potential clock rate.

Functional units are positioned corresponding to the instruction
pocket by compiler.
Disadvantages of VLIW Architecture
Disadvantages:

Complex compilers are required which are hard to design.

Increased program code size.
Disadvantages of VLIW Architecture
Contd..

Unscheduled events,
•

for example, a cache miss could lead to a stall which will
stall the entire processor.
In case of un-filled opcodes in a VLIW,
•
there is waste of memory space and instruction
bandwidth.
Vector Processor

Vector processor is basically a central processing unit
•

that has the ability to execute the complete vector input
in a single instruction.
More specifically we can say, it is a complete unit of hardware
resources
•
that executes sequential set of similar data items in the
memory using a single instruction.
Vector Processor Contd..

Elements of the vector are ordered properly to have successive
addressing format of the memory.
•

This is the reason that it implements the data sequentially.
It holds a single control unit but has multiple execution units
•
that perform the same operation on different data
elements of the vector.
Vector Processor Contd..

Unlike scalar processors that operate on only a single pair of
data, a vector processor operates on multiple pair of data.

However, one can convert a scalar code into vector code.

This conversion process is known as vectorization.

Vector processing allows operation on multiple data elements
by the help of single instruction.
Vector Processor Contd..

These instructions are said to be single instruction multiple
data or vector instructions.

The CPU used in recent time makes use of vector processing as
it is advantageous than scalar processing.
Architecture and Working

The figure below represents the typical diagram showing vector
processing by a vector computer:
Architecture and Working
Contd..
Architecture and Working
Contd..

The functional units of a vector computer are as follows:
•
IPU or Instruction Processing Unit
•
Vector register
•
Scalar register
•
Scalar processor
Architecture and Working
Contd..
•
Vector instruction controller
•
Vector access controller
•
Vector processor
Architecture and Working
Contd..

As vector computer has several functional pipes thus it can
execute the instructions over the operands.

Both data and instructions are present in the memory at the
desired memory location.

So, the instruction processing unit i.e., IPU fetches the
instruction from the memory.
Architecture and Working
Contd..

Once the instruction is fetched
•

then IPU determines either the fetched instruction is scalar
or vector in nature.
If it is scalar in nature, then
•
the instruction is transferred to the scalar register
•
and then further scalar processing is performed.
Architecture and Working
Contd..

While, when the instruction is vector in nature
•

then it is fed to the vector instruction controller.
This vector instruction controller first decodes the vector
instruction
•
then accordingly determines the address of the vector
operand present in the memory.
Architecture and Working
Contd..

Then it gives a signal to the vector access controller about
•
the demand of the respective operand.

This vector access controller then fetches the desired operand
from the memory.

Once the operand is fetched then it is provided to the
instruction register
•
so that it can be processed at the vector processor.
Architecture and Working
Contd..

At times, when multiple vector instructions are present,
•

then the vector instruction controller provides the
multiple vector instructions to the task system.
And in case, the task system shows that the vector task is very
long
•
then the processor divides the task into sub-vectors.
Architecture and Working
Contd..

These sub-vectors are fed to the vector processor
•
that makes use of several pipelines
❑

in order to execute the instruction over the operand
fetched from the memory at the same time.
The various vector instructions are scheduled by the vector
instruction controller.
Classification of Vector Processor


The classification of vector processor relies on
•
the ability of vector formation,
•
as well as the presence of vector instruction for processing.
So, depending on these criteria, vector processor architecture is
classified as follows:
•
Register to Register Architecture
•
Memory to Memory Architecture
Classification of Vector Processor
Contd..
Register to Register Architecture

This architecture is highly used in vector computers.

In this architecture, fetching of the operand or the previous
results
•
indirectly takes place through the main memory by the
use of registers.
Register to Register Architecture
Contd..


Several vector pipelines present in the vector computer help in
•
retrieving data from the registers,
•
and also storing the results in the register.
These vector registers are user instruction programmable.
Register to Register Architecture
Contd..

This means that according to the register address present in the
instruction,
•

the data is fetched and stored in the desired register.
These vector registers hold fixed length
•
like the register length in a normal processing unit.
Register to Register Architecture
Contd..

Some examples of a supercomputer using the register to register
architecture are following:
•
Cray – 1
•
Fujitsu
Memory to Memory Architecture

In memory to memory architecture,
•

the operands or the results are directly fetched from the
memory despite using registers.
However, it is to be noted here that the address of the desired
data to be accessed
•
must be present in the vector instruction.
Memory to Memory Architecture
Contd..

This architecture enables the fetching of data of size 512 bits
from memory to pipeline.

However, due to high memory access time,
•
pipelines of the vector computer requires higher startup
time,
❑
as higher time is required to initiate the vector
instruction.
Memory to Memory Architecture
Contd..

Some examples of supercomputers that possess memory to
memory architecture are following:
•
Cyber 205
•
CDC
Characteristics of Vector Processor
Characteristics of Vector Processor:

Vector Processors are designed to process multiple data elements
in parallel,
•

while Scalar Processors process one data element at a time.
Vector Processors can be more efficient,
•
as they can complete a given task with fewer instructions
than a Scalar Processor.
Characteristics of Vector Processor
Contd..

Vector Processors are more complex than Scalar Processors,
•

and require more memory as well as power to operate.
Vector Processors are used for more demanding tasks, such as
•
scientific calculations,
•
3D game rendering.
Characteristics of Vector Processor
Contd..


while Scalar Processors are used for simpler tasks, such as
•
basic calculations,
•
and web browsing.
Vector Processors are more suitable for data-intensive
applications,
•
while Scalar Processors are better suited for applications
that require fewer calculations.
Characteristics of Vector Processor
Contd..

Vector Processors can be more expensive than Scalar Processors,
•

as they require more complex hardware and software.
Register to register architecture is better than memory to
memory architecture
•
because it offers a reduction in vector access time.
Advantages of Vector Processor

Better performance

Highly parallel

High memory bandwidth

Reduced software overhead

Improved accuracy
Advantages of Vector Processor
Contd..
Better Performance:

Vector processors
simultaneously,
•
can
process
multiple
operations
increasing the speed of calculations.
Highly Parallel:

Vector processors are able to handle multiple operations in
parallel,
•
allowing for faster computations.
Advantages of Vector Processor
Contd..
High Memory Bandwidth:

Vector processors are able to access large amounts of data at
once,
•
increasing the speed of computations.
Advantages of Vector Processor
Contd..
Reduced Software Overhead:

Vector processors can reduce the amount of software code
needed to complete tasks,
•
saving time and resources.
Improved Accuracy:

Vector processors are more accurate than scalar processors,
•
making them ideal for applications that require
precision.
Advantages of Vector Processor
Contd..

Vector processor uses vector instructions
•

The sequential arrangement of data helps to handle data
•

by which code density of the instructions can be
improved.
by the hardware in a better way.
It offers a reduction in instruction bandwidth.
Applications of Vector Processor

Computer Aided Design

Image Processing

Virtual Reality

Scientific Computing

Artificial Intelligence

Data Analysis
Applications of Vector Processor
Contd..
Computer-Aided Design (CAD):

CAD software allows for the creation of realistic 3D models,
•

which can be used for product design, engineering, and
architecture.
Vector processor power makes it easier to manipulate complex
models and make changes quickly.
Applications of Vector Processor
Contd..
Image Processing:

Vector processors are used to manipulate and analyze images.

This can include tasks such as
•
edge detection,
•
object recognition,
•
and facial recognition.
Applications of Vector Processor
Contd..
Virtual Reality:

Vector processors are used to render realistic 3D graphics in
virtual reality applications.

This allows users to experience a more immersive experience
when interacting with virtual environments.
Applications of Vector Processor
Contd..
Scientific Computing:

Vector processors are used to perform complex calculations in
scientific computing applications.

This can include tasks such as calculating weather patterns or
complex simulations.
Applications of Vector Processor
Contd..
Artificial Intelligence:

Vector processors are used in helping train and test neural
networks for artificial intelligence applications.

This can include tasks such as
•
Facial recognition,
•
Object recognition,
•
and Natural language processing.
Applications of Vector Processor
Contd..
Data Analysis:

Vector processors are used to analyze large amounts of data
quickly.

This can include tasks such as analyzing customer data or
financial data.
Download