Lecture 1

advertisement
+
CS 325: CS Hardware and Software
Organization and Architecture
Introduction
1
+
Outline
 Course
information
 Course
website
 Syllabus
 Term
project
2
+

Dr. Michael Galloway



Specialization: VANETs, MANETs, Network Protocols
B.W.E – Wireless Engineering – Auburn University, December 2005


Specialization: Parallel and Distributed Computing, Cloud Computing
Local Infrastructure as a Service Cloud Architectures
M.S. – Computer Science – The University of Alabama, May 2008


COHH 4134
jeffrey.galloway@wku.edu
Ph.D. – Computer Science – The University of Alabama, Aug. 2013


Myself
3
Specialization: Wireless Communication Protocols, Resource Management of Handheld Devices.
Taught classes at 6 colleges and universities

First time teaching computer organization and architecture

Current Research: Infrastructure Cloud Architectures, Automated Resource Deployment,
Vertical Educational Clouds, Power Modeling

Chair: WKU ACM Student Chapter – Everyone should join!!
+
 This
Course Description
course will provide a means for
Coverage of computer systems and architecture
 Bridge between low-level hardware systems and operating
systems programming
 Means for in-depth coverage of new generation hardware and
computer systems.

 Topics
include
computer number representations
 computer arithmetic
 CPU operations
 instruction sets
 I/O memory management
 system performance
 parallelism.

4
+
Important Information
Course Website:
 http://ip204-5.sth.wku.edu/cs325Web/
Required
Book:
“Computer Organization and Architecture”
by William Stallings, 9th or 10th edition.
Office
Hours:
 8:30am
– 10:30am on Monday, Wednesday
5
+
Why Should We Study Computer
Architecture?
 It’s
6
required 
 Understand
computer performance and cost factors.
 Basis
for understanding of OS and programming
concepts.
 Understand



how to write programs that are:
Faster
Smaller
Less prone to error
 To
appreciate the relative cost of operations and the effect
of programming choices.
 Helps
you to debug.
+
 Digital
The Bad News…
Hardware
 Is
complex
 Cannot be fully understood in one course
 Requires background in electrical engineering, physics,
chemistry
 The
CPU is the most complex device created by
humans.
 Over
10 Billion transistors (2015)
 Transistor switching speed of over 4 billion/sec (4Ghz)
 14nm fabrication process, and getting smaller
 10nm scale projected for 2017
 ~43 Si atoms!
7
+
The Good News!
 It
8
is possible to understand the architectural
components without knowing all of the low-level
details.
 Programmers
 Characteristics
only need to know the essentials
of major components
 Role in overall system
 Consequences for programmers
+
Why Study Computer
Architecture?
 Understand
where computers are going
 Future capabilities drive the (computing) world
 Real world-impact: no computer architecture  no
computers!
 Understand
high-level design concepts
 The best architects understand all the levels
 Devices, circuits, architecture, compiler, applications
 Write
better software
 The best software designers also understand hardware
 Need to understand hardware to write fast software
 Design
hardware
 Intel, AMD, IBM, ARM, Qualcomm, NVIDIA, Samsung
9
+
Course Goals
See
the big ideas in computer
architecture
 Pipelining, parallelism, caching, abstraction, …
 Exposure
to examples of good (and some bad) engineering
Get
exposure to research and cutting
edge ideas
 Read
10
some research papers
+

Coursework
Research Paper (4 drafts throughout the semester)



Homework assignments (~8 throughout semester)




Short answer questions related to topics covered in class
48-hour “grace” periods
 Hand in late, no questions asked
No assignments accepted after solutions posted
3 Exams


11
In-depth research on an architecture related topic of your choice
Oral presentation of your findings
In class
 Typically 5 questions, equally weighted.
 Answer 4 out of 5
Cumulative final
+
Grading
Paper
reviews: 50%
Drafts: 10% each, x4
Final presentation: 10%
Homework
assignments: 15%
Exams: 35%
In-class: 20%
Final: 15%
Smiling: 5%
12
+
Computer Architecture Design
Goals & Constraints
 Functional
 Needs to be correct
 And unlike software, difficult to update once deployed
 What functions should it support
 Reliable
 Does it continue to perform correctly?
 Hard fault vs transient fault
 Space satellites vs desktop vs server
 High performance
 Not just “Gigahertz” – 2.6GHz ARM vs. 2GHz Intel Xeon
 Impossible goal: fastest possible design for all programs
13
+
Design Goals & Constraints
 Low cost
 Per unit manufacturing cost (wafer cost)
 Cost of making first chip after design (mask cost)
 Design cost
 Low power/energy
 Energy in (battery life, cost of electricity)
 Energy out (cooling and related costs)
 Cyclic problem, very much a problem today
 Challenge: balancing
these goals


14
the relative importance of
And the balance is constantly changing
Our focus: performance, only touch on cost, power, reliability
+Shaping Force: Applications/Domains
 Another
shaping force:
 applications
(usage and context)
 Scientific: weather
prediction, genome sequencing
 First
computing application domain: naval ballistics firing
tables
 Need: large memory, heavy-duty floating point
 Examples: CRAY T3E, IBM BlueGene, Intel Xeon Phi, GPUs
 Commercial: database/web
serving, e-commerce,
Google
 Need: data
movement, high memory + I/O bandwidth
15 Examples: Sun Enterprise Server, AMD Opteron, Intel Xeon

+More Recent Applications/Domains

Desktop: home office, multimedia, games



Mobile: laptops, tablets, phones






Need: low power, low cost
Examples: ARM chips, dedicated digital signal processors (DSPs)
10 billion ARM chips sold in 2013
Deeply Embedded: disposable “smart dust” sensors

16
Need: low power, computational performance, integrated wireless
Laptops: Intel Core i*, Atom, AMD APUs
Smaller devices: ARM chips by Samsung, Qualcomm, Apple
Embedded: microcontrollers in automobiles, door knobs, robotics


Need: Increasing memory bandwidth, computational performance,
integrated graphics/network?
Examples: Intel Core i*, AMD Athlon
Need: extremely low power, extremely low cost
+
Application Specific Designs
 This
class is about general-purpose CPUs
 Processor
that can do anything, run a full OS, etc.
 E.g., Intel Core i7, AMD Athlon, IBM Power, ARM, Intel Xeon
 In
contrast to application-specific chips
 Examples:
 General
Video encoding, 3D graphics
rules
Hardware is less flexible than software
+ Hardware more effective (speed, power, cost) than software
+ Domain specific more “parallel” than general purpose
• But general mainstream processors becoming more
parallel
17
-
+Technology Trends
 Moore’s

Continued (so far) transistor miniaturization

Number of transistors in an integrated circuit has doubled approximately every 18 months
 Some





Law
technology-based ramifications
Annual improvements in density, speed, power, costs
SRAM/logic: density: ~30%, speed: ~20%
DRAM: density: ~60%, speed: ~4%
Disk: density: ~60%, speed: ~10% (non-transistor)
Big improvements in flash memory and network bandwidth, too
 Changing


18
quickly and with respect to each other!!
Example: density increases faster than speed
Re-evaluate/re-design for each technology generation
+
Revolution I: The Microprocessor
 Microprocessor



revolution
One significant technology threshold was crossed in 1970s
Enough transistors (~25K) to put a 16-bit processor on one chip
Huge performance advantages: fewer slow chip-crossings
 Microprocessors

Desktops, CD/DVD players, laptops, game consoles, set-top boxes,
mobile phones, digital camera, mp3 players, GPS, automotive
 And

19
have allowed new market segments
replaced incumbents in existing segments
Microprocessor-based system replaced supercomputers, “mainframes”,
“minicomputers”, etc.
+
First Microprocessor
 Intel
4004 (1971)
 Application: calculators
 Technology: 10000
 2300
transistors
 13 mm2
 108 KHz
 12 Volts
 4-bit
20
data
nm
+ Pinnacle of Single-Core Microprocessors
 Intel










21
Pentium4 (2003)
Application: desktop/server
Technology: 90nm (1/100th of 4004)
55M transistors (20,000x)
101 mm2 (10x)
3.4 GHz (10,000x)
1.2 Volts (1/10th)
32/64-bit data (16x)
22-stage pipelined datapath
Two levels of on-chip cache
hyperthreading
+
Modern Multicore Processor
 Intel












22
Core i7 (2013)
Application: desktop/server
Technology: 22nm (25% of P4)
1.4B transistors
177 mm2
3.5 GHz to 3.9 Ghz
1.8 Volts
256-bit data
14-stage pipelined datapath
4 instructions per cycle
Three levels of on-chip cache
hyperthreading
Four-core multicore (4x)
???
+
Tracing the Microprocessor
Revolution
 How
were growing transistor counts used?
 Initially
 4004: 4
…
to widen the datapath
bits  Pentium4: 64 bits
and also to add more powerful instructions
 To
reduce overhead of fetch and decode
 To simplify assembly programming (which was done by
hand then)
23
+
IBM System
370 Architecture
 IBM

24
System/370 architecture

Was introduced in 1970

Included a number of models

Could upgrade to a more expensive, faster model without having to abandon
original software

New models are introduced with improved technology, but retain the same
architecture so that the customer’s software investment is protected

Architecture has survived to this day as the architecture of IBM’s mainframe
product line
370 Architecture still in use by IBM today!

IBM zSeries

IBM zEnterprise Series
+
Intel x86
Backward compatible instruction set
architectures
 1978

80186, 80286
 1985

– 80386, 32 – bit
80486, Pentium, Pentium MMX
 1995

– Introduced 8088, 16 – bit
– Pentium Pro
Pentium II, Pentium III
 2000
– Pentium 4
 2006
– Core 2, 64 – bit, multi-core
 2008
– Core i3/i5/i7

Atom
25
+
Function

A computer can perform
four basic functions:
●
Data processing
●
Data storage
●
Data movement
●
Control
27
The
Computer
+
There are four
main structural
components
of the computer:
 CPU – controls the operation
of the computer and performs
its data processing functions
 Main Memory – stores data
 I/O – moves data between
the computer and its external
environment
 System Interconnection –
some mechanism that
provides for communication
among CPU, main memory,
and I/O
+
 Control
CPU
Major structural
components:
Unit
 Controls the operation of the
CPU and hence the computer
 Arithmetic
and Logic Unit (ALU)
 Performs the computer’s data
processing function
 Registers
 Provide
storage internal to the
CPU
 CPU
Interconnection
 Mechanisms that provide
communication among the
control unit, ALU, and registers
+
30
Internet Resources
- Web site for book
 http://WilliamStallings.com/COA/COA10e.html


Links to sites of interest
Links to sites for courses that use the book
 http://WilliamStallings.com/StudentSupport.html




Math
How-to
Research resources
Misc
+
31
First Assignment - in class
Determine
your term project topic
 Talk
with others in the class about good ideas for a research
topic
 Remember, all projects are to be completed individually
 No overlapping of focus in any specific research area
Record
your name and research
topic/focus on the project sign-up sheet
Download