AMD & Intel

advertisement
The overview of Networking
Technology
&
New Generation Processors
Boxuan Gu
Chi Chau
CS-521
2-5-2004
Part 1
Networking Technology
The lecture consists of two parts


Network Architecture
Ethernet technology
Network Architecure-OSI
reference model
OSI


The OSI model provides a conceptual framework
for communication between computers, but the
model itself is not a method of communication.
Actual communication is made possible by using
communication protocols.
In the context of data networking, a protocol is a
formal set of rules and conventions that governs
how computers exchange information over a
network medium. A protocol implements the
functions of one or more of the OSI layers.
OSI-Interaction
OSI-Encapsulation
TCP/IP
TCP/IP-IP
The Internet Protocol (IP) is a networklayer (Layer 3) protocol that contains
addressing information and some control
information that enables packets to be
routed.
IP has two primary responsibilities:
1. providing connectionless
2. best-effort delivery of datagrams

IP Packet Format
IP address format
IP address…
TCP/IP-TCP
Transmission Control Protocol
•The TCP provides reliable transmission of data in an IP
environment. TCP corresponds to the transport layer
(Layer 4) of the OSI reference model. Among the
services TCP provides are stream data transfer,
reliability, efficient flow control, full-duplex operation,
and multiplexing.
•TCP offers reliability by providing connection-oriented,
end-to-end reliable packet delivery through an
internetwork.
TCP/IP-UDP
User Datagram Protocol


The User Datagram Protocol (UDP) is a
connectionless transport-layer protocol
(Layer 4) that belongs to the Internet
protocol family.
UDP is basically an interface between IP and
upper-layer processes. UDP protocol ports
distinguish multiple applications running on a
single device from one another.
UDP-packet header
IPV6
Disadvantage of IP v4:
1. 32 bits address is limited
2. Routing is not efficient
3. Bad support for mobile device
4. Security needs grow
IPv6 Packet Header Format
4bits
version
8bits traffic
class
16 bits payload
length
20 bits flow label
8 bits next
header
8 bits hop limit
128 bits source address
128 bits destination address
IPV6


Version Number: The version is a 4-bit field as
in IPv4. The field contains the number 6 for
IPv6, instead of the number 4 for IPv4.
Traffic Class: The Traffic Class field is an 8-bit
field similar to the type of service (ToS) field in
IPv4. The Traffic Class field tags the packet with
a traffic class that can be used in Differentiated
Services. The functionalities are the same in
IPv4 and IPv6.
IPv6


Flow Label: The Flow Label field can be used
to tag packets of a specific flow to
differentiate the packets at the network layer.
Hence, the Flow Label field enables
identification of a flow and per-flow
processing by the routers in the path.
Payload Length: Similar to the Total Length
field in IPv4, the Payload Length field
indicates the total length of the data portion
of the packet.
IPV6


Next Header: Similar to the Protocol field in the
IPv4 packet header, the value of the Next Header
field in IPv6 determines the type of information
following the basic IPv6 header.
Hop Limit: Similar to the Time to Live field in the
IPv4 packet header, the value of the Hop Limit field
specifies the maximum number of routers (hops)
that an IPv6 packet can pass through before the
packet is considered invalid.
IPV6


Source Address: The IPv6 source address
field is similar to the Source Address field in
the IPv4 packet header, except that the field
contains a 128-bit source address for IPv6
instead of a 32-bit source address for IPv4.
Destination Address: The IPv6 destination
address field is similar to the Destination
Address field in the IPv4 packet header,
except that the field contains a 128-bit
destination address for IPv6 instead of a 32-bit
destination address for IPv4.
IPv6-extension header
IPv6-extension header
1.
2.
3.
4.
5.
6.
Hop-by-Hop Options header.
Destination Options header.
Routing header.
Fragment header.
Authentication header and
Encapsulating Security Payload
header
Upper-Layer header.
IPv6-Addressing scheme


IPv6 uses 16-bit hexadecimal number
fields separated by colons (:) to represent
the 128-bit addressing format making the
address.
2031:0000:130F:0000:0000:09C0:876A:13
0B.
IPv6-Addressing scheme






IPv6 addresses consist of a prefix and a
local part
(like in IPv4)
- Example:
3FFE:400:280:0:0:0:0:1/48
here the first 48 bits a fixed (prefix) and
the other 80
bits will be assigned in the local subnet
IPv6-Addressing scheme
In IPv6, there 3 types of addresses:
1. Unicast
2. Multicast
3. Anycast (new in IPv6)
IPv6-Addressing scheme
-unicast
IPv6-Addressing scheme
-Multicast
IPv6-Addressing scheme
-Anycast

Packets sent to an anycast address or list of
addresses are delivered to the nearest interface
identified by that address. Anycast is a
communication between a single sender and a
list of addresses,
Part 2: Ethernet
Ethernet
Ethernet
MAC Data Frame Format
Ethernet-10gigabit Ethernet

10 Gigabit Ethernet is Ethernet. 10 Gigabit
Ethernet uses the IEEE 802.3 Ethernet media
access control (MAC) protocol, the IEEE 802.3
Ethernet frame format, and the IEEE 802.3
frame size. 10 Gigabit Ethernet is full duplex.
Ethernet-10gigabit Ethernet
Technology and Standard


The IEEE 802.3ae 10 Gigabit Ethernet Task
Force was chartered with developing the 10
Gigabit Ethernet Standard.
This group is a subcommittee of the larger
802.3 Ethernet Working Group. In contrast to
previous Ethernet standards, 10 Gigabit
Ethernet targets three application spaces: the
LANs, MANs, and WANs.
Cont.



Gigabit Ethernet is no longer a shared domain, halfduplex technology.
Because there are no packet collisions in a fullduplex link, the link distances are determined by
optics and not by the diameter of an Ethernet
collision domain.
10 Gigabit Ethernet will also be a full-duplex,
switched technology, maintaining compatibility with
the 802.3 Ethernet MAC protocol and the Ethernet
frame format.
Cont.
10 gigabit ethernet
Layer 1: Physical Layer Devices
Contained within the PHY are several sublayers
that perform these functions, including the
physical coding sublayer (PCS) and the optical
transceiver or physical media dependent (PMD)
sublayer for fiber media. The PCS is made up
of coding (for example, 8b/10b) and serializer
or multiplexing functions.
Cont.
10g Ethernet define two kinds of PHY:
 the LAN PHY
 the WAN PHY
WAN PHY



SONET Friendly
Enables use of SONET
infrastructure for Layer 1
transport:
SONET ADMs,
DWDM Transponders,
optical regenerators


Not SONET
Compliant
Connects to SONET
access devices but not
directly to SONET
infrastructure
Cont. Not SONET
Compliant





SONET Friendly
Requires some SONET
features:
OC-192 link speed
SONET framing
MinimalPath/Section/Li
ne overheard processing





Avoids most costly
aspects of SONET:
No TDM support
Concatenated OC-192c
only
Does not require
meeting SONET grid
laser specifications, jitter
requirements, stratum
clocking
Minimal operations,
administration,
maintenance, and
provisioning (OAM&P)
LAN PHY


10 Gigabit defines a LAN PHY that, with
simple encoding, will transmit Ethernet packets
on dark fiber and dark wavelengths.
The LAN PHY is intended to support the
existing Ethernet applications at ten times the
bandwidth with the most cost-effective solution.
Cont.
Cont.


Both the LAN and WAN PHY will support
each physical medium-dependend (PMD)
sublayer and, therefore, support the same
distances. These PHYs are distinguished
solely by the PCS.
The WAN PHY differs from the LAN PHY
by the inclusion of a simplified SONET
framer.
Cont.
10 Gigabit Ethernet Link Distance and Media Goals
At least 65 meters over multimode fiber
At least 300 meters over installed multimode fiber
At least 2 km over single-mode fiber
At least 10 km over single-mode fiber
At least 40 km over single-mode fiber
Application of 10GE
10 Gigabit in the LAN
Cont.
10 Gigabit Ethernet Metropolitan Network
Part 2
AMD & Intel
Latest Desktop & Server Processors



AMD
Desktop: AMD Athlon
64 FX, AMD Athlon 64
Server: AMD Opteron



Intel
Desktop: Intel Pentium 4
w/ HT, Intel Pentium 4
Extreme Edition
Server: Intel Itanium 2,
Xeon
Desktop Processor Pricing




AMD Athlon 64 FX-51
$733
AMD Athlon 64 3400+
$417
AMD Athlon 64 3200+
$278
AMD Athlon 64 3000+
$218



Intel Pentium 4 Extreme
Edition 3.4Ghz $999
Intel Pentium 4 3.4Ghz
w/ HT $424
Intel Pentium 4 3.2 Ghz
(Prescott) w/ HT $417
Processor Timeline
Date
Intel
2/2/2004
P4 3.4Ghz, P4 3.2E Ghz, P4 EE 3.4Ghz
1/6/2004
Athlon 64 3400+
9/24/2003
P4 EE 3.2 Ghz
6/23/2004
P4 3.2 Ghz
5/13/2003
4/14/2003
AMD
Athlon 64 FX-51,
3200+
Athlon XP 3200+
P4 3.0 Ghz 800Mhz
2/10/2003
11/14/2002 P4 HT 3.06Ghz
Athlon XP 3000+
Traditional Intel roadmap



Intel historically would move to a smaller
process, double the cache, increase clock speeds
It was true until first generation of Pentium 4
and when AMD was still struggling
It is not the case for Prescott
Intel Pentium 4 (Prescott)




Intel launched Pentium 4 Prescott on February
2nd
Not P5 just 3rd generation of P4
Intel CEO Paul Otinelli discuss about 64-bit
extension on Prescott
With enough cooler Prescott can overclock to
5Ghz
P4 Prescott
New Changes









Prescott use 90 nm process instead of 130 nm process
Double the L2 cache to 1 MB
Expand L1 data cache to 16 KB to improve AGUs
(address generation units)
Add 13 new instructions aka SSE3
Extend pipeline from 20 to 31 stages
Process and die size drop
Increasing scheduler queue size
Add a dedicated integer multiplier
A new shifter/rotator logic block is replace in ALUs
SSE3






After great success with the P4 SSE2 instruction set
(144 instructions) , SSE3 added 13 more to make
programmer’s life easier
fisttp: fp to int conversion
addsubps, addsubpd, movsldup, movshdup, movddup:
complex arithmetic
lddqu: video encoding
haddps, hsubps, haddpd, hsubpd: graphics (SIMD FP /
AOS)
monitor, mwait: thread synchronization
31 Pipeline Stages
Hyper-Threading Technology



Could increase performance up to 40%
HT enables multi-threaded software to execute
threads in parallel. It split instructions into
multiple streams so that multiple processors
could work on it.
The problem is not many software is taking
advantage of HT. HT is big in graphic arena ex:
Adobe taking big advantage of HT
Prescott Problems






90 nm process not yet mature unlike 130 nm
90 nm process has heat and power problem
Hold back 3.4E Ghz
Intent to produce limited edition
SSE3 will be useful down the road, but today’s
software is not ready for it
31 stages pipeline would slow perfermance with
wrong prediction
Should you get Prescott?



The real strength of Prescott is in its HyperThreading performance
Great for multitasking
Some applications Prescott beat Extreme
Edition in multitasking
Pentium 4 Extreme Edition




Intel top of the line desktop processor
“Xeon” processor with P4 Extreme Edition
label
It is more like “Emergency Edition” rather than
“Extreme Edition” to repose AMD 64
Optional 2 MB L3 cache
Intel Roadmap
AMD 64








AMD 64 building a bridge from the 32 to 64-bit
world
Provide great performance without parallel
Simultaneous 32 and 64 bit computing
More physical address 1 TB not limited to 4GB
Applications can use up to 4GB instead of 2GB
Worry-Free on memory
A lot less swapping to virtual memory
A single architecture designed fit all
AMD Athlon 64 & 64 FX
Athlon 64 is 754-pin Athlon 64 FX is 940-pin
New Changes









1 MB L2 cache
Integrated memory controller
HyperTransport channel
Less power need
New AMD Core
Double the registers
Integrated DDR Memory Controller
Enlarge Look-Aside Buffer (TLB)
Extend pipeline from 10 to 12 stages
AMD 64 Processor Architecture
Integrated Memory Controller






Provide sufficient low-latency memory bandwidth to processor
core
With integrated memory controller it changed the way
processors access main memory
It greatly increase bandwidth and reduce latencies thus speed up
process
Run memory controller at processor speeds rather than FSB
speeds
Boosts performance for many applications with intensive
memory use
Available memory bandwidth up to 6.4GB/s with Opteron and
FX and 3.2GB/s with AMD 64
AMD 64 Core






Enables simultaneous 32 and 64 bit computing
There are 3 main categories in AMD 64 Core
1. 32-bit applications under a 32-bit OS
2. 32-bit applications under a 64-bit OS
3. 64-bit applications under a 64-bit OS
Great for migration
HyperTransport




Increase overall system performance by reducing
I/O bottlenecks, increasing system bandwidth
and reducing system latency
High-speed I/O communication
Up to 6.4GB/s bandwidth per link, improve
interconnection with system components
Up to 3 HyperTransport link (only on Opteron)
SSE/SSE2 Registers


Double the number of registers
Double SSE registers to improve floating point
calculations
Enlarge Look-Aside Buffer (TLB)

With enlarge look-aside buffer it reduce transmitting
between system memory and physical address
Pipeline


Extended the pipeline to 12 states from 10 to
increase the clock speeds
Rework the predictions
Problems



AMD partner with Nvidia, but NForce 3 chipset
is not mature
With nForce 3 low AGP performance bug w/
HyperTransport channel interface
It comes up VIA is a better chipset for AMD 64
AMD 64 FX-51





“Opteron” processor with FX label
Slight change on the DDR400 support (reduce
validation)
Major difference from Athlon 64 is 128-bit
memory controller vs 64-bit
Works with dual-channel Registered memory
Athlon 64 works with single-channel unbuffered
DDR memory
Final Word of FX-51




Athlon 64 3400+ bring the death of the FX-51
According to benchmarks from different areas
Athlon 64 3400+ come very closely behind FX51
But the price is half of FX-51
Or you can wait until FX-53 to come out
Watch Out!

AMD is talking about new Socket-939 around
late this year
AMD Roadmap 754
AMD Roadmap 940
AMD Roadmap 939
Benchmarks - OpenGL
Benchmarks -
Benchmarks -
Benchmarks – Business App
Result Summary


AMD is good comes to business/gaming/2D
work with perspective to price/performance
ratio
Intel offers the best in encoding and 3D
performance as well as multitasking
Conclusion




It is very hard to compare new processors
With AMD 64 lack of true 64-bit applications
With Intel Prescott lack of SS3 enhance
applications and “out-to-day” video driver and
DirectX
Hardware open the future door but not until
software catch up, we won’t be able to truly
experience the great enhancement
Sources








Intel Corp - www.intel.com
AMD Corp – www.amd.com
Toms’s Hardware – www.tomshardware.com
AnandTech – www.anandtech.com
ExtremeTech – www.extremetech.com
Tech Report – www.techreport.com
Xbit Lab – www.xbitlabs.com
Opteronics – www.opteronics.com
Download