ppt

advertisement
ECE 526 – Network
Processing Systems Design
IXP XScale and Microengines
Chapter 18 & 19: D. E. Comer
Overview
• Recalled
─
─
─
─
Packet processing functions (forwarding, queuing…)
Traditional network processing systems (CPU + NICs)
General network processor architecture and tradeoffs
Intel IXP network processors overall architecture
• Focus on individual components of Intel IXP chip
─ Control processor (slow path): XScale core
• Overall architecture
• Typical functions
• Processor features
─ Packet processing processor (fast path): Microengines
• Architecture and features
• Differences to conventional processors
• Pipelining and multi-threading
Ning Weng
ECE 526
2
Purpose of Control Processor
• Functions typically executed by embedded control proc:
─
─
─
─
─
─
─
─
Bootstrapping
Exception handling
Higher-layer protocol processing
Interactive debugging
Diagnostics and logging
Memory allocation
Application programs (if needed)
User interface and/or
interface to the GPP
─ Control of packet processors
─ Other administrative functions
Ning Weng
ECE 526
3
XScale Memory Architecture
• Memory architecture
─ Uses 32-bit linear address space
─ configurable endian mode
─ Byte addressable
• Memory Mapping
─ Allocation of address space (2^32) to different system
components
─ Accesses to memory is translated into access to component
─ Needs to be carefully crafted
• XScale assumes byte addressable memory
─ Underlying memory uses different size (SDRAM)
─ How does this work?
• Support for Virtual Memory
─ For demand paging to secondary storage
Ning Weng
ECE 526
4
Shared Memory Address Issues
• Memory is shared between XScale and Microengines
• Same data, but different addresses
• What impact does this have?
─ Pointers need to be translated
─ Data structures with pointers can not be shared
Ning Weng
ECE 526
5
Microengines
•
•
•
•
Microengines are data-path packet processors IXP
IXP 2400 have 8 Microengines
Simpler than XScale
Low level device
as a micro-sequencer
• Optimized for
packet processing
• More complex to use
• Often abbreviated as uE
Ning Weng
ECE 526
6
uE Functions
• uEs handle ingress and egress packet processing:
─
─
─
─
─
─
─
─
Packet ingress from physical layer hardware
Checksum verification
Header processing and classification
Packet buffering in memory
Table lookup and forwarding
Header modification
Checksum computation
Packet egress to physical layer hardware
Ning Weng
ECE 526
7
uE Architecture
• uE characteristics:
─
─
─
─
─
─
─
─
─
─
Programmable microcontroller
RISC design
256 general-purpose registers
512 transfer registers
128 next neighbor registers
Hardware support for 8 threads and context switching
640 words of local memory
Control of an Arithmetic and Logic Unit
Direct access to various functional units
A unit to compute a Cyclic Redundancy Check (CRC)
Ning Weng
ECE 526
8
uE as Micro-sequencer
• Micro-sequencer does not contain native instructions for
possible operations
─ Instead of using instructions, uE invokes functional units to
perform operations
─ Control unit is much “simpler”
• Example 1:
─
─
─
─
uE does not have ADD R2,R3 instruction
Instead: ALU ADD R2, R3
“ALU” indicates that ALU should be used
“ADD” is a parameter to ALU
• Example 2:
─ Memory access not by simple LOAD R2, 0xdeadbeef
─ Instead: SRAM LOAD R2, 0xdeadbeef
• Altogether similar to normal processor, but more basic
Ning Weng
ECE 526
9
uE Instruction Set
• General
─ ALU and etc
• Brach and Jump
─ BR: branch unconditionally
• CAM
─ CAM_CLEAR: clear all entries in local memories
• I/O and context swap
─ SCRATCH (read and write)
• For detail see Figure 19.1, 19.2, Comer.
Ning Weng
ECE 526
10
uE Memories
• uEs: viewing memories differently than XScale does
─ Does not map memories and I/O devices into a liner address
space
─ Does not view memories as a seamless, uniform repository
• uE ISA: requiring a separate instruction for each type of
memory and I/O device
─ SRAM[read, $$x, address1, address2…]
• Programmer: required binding of data items to specific
type of memory permanently.
Ning Weng
ECE 526
11
Execution Pipeline
• What is pipeline?
• Why pipeline is employed?
─ One instruction is executed per cycle if pipeline is proper
designed
• uEs use five-stage or six-stage pipeline:
Ning Weng
ECE 526
12
Pipelining
Ning Weng
ECE 526
13
Pipelining Problems
• Possible sources of pipelining problems
─
─
─
─
Data dependencies
Control dependencies
Resource dependencies
Memory accesses
• How pipelining problem impact system performance
• How these impact can be removed or reduced
─ Remove the sources so that no stall happened
─ Hide the impact of pipelining stall
Ning Weng
ECE 526
14
Pipeline Stalls
• K:
• K+1
ALU ADD R2, R1, R2
ALU ADD R3, R2, R3
• Control dependencies, memory have even bigger impact
Ning Weng
ECE 526
15
Threading Illustration
Ning Weng
ECE 526
16
Hardware Threads
• uEs support 8 hardware thread contexts
─ One thread can execute at any given time
─ When stall occurs, uE can switch to other thread (if not stalled)
• Very low overhead for context switch
─ “Zero-cycle context switch”
─ Effectively can take around three cycles due to pipeline flush
• Switching rules
─ If thread stalls, check if next is ready for processing
─ Keep trying until ready thread is found
─ If none is available, stall uE and wait for any thread to unblock
• Improves overall throughput
• Questions:
─ Why not 16, 32 threads
─ why not have 48 uEs with 1 thread?
Ning Weng
ECE 526
17
Summary
• Control processor (slow path): XScale core
• Overall architecture
• Typical functions
• Processor features
• Packet processing processor (fast path): Microengines
• Architecture and features
• Differences to conventional processors
• Pipelining and multi-threading
Ning Weng
ECE 526
18
Lab3 Brief
• Intel Reference Systems
• SDK Tutorial
• Lab 3
Ning Weng
ECE 526
19
Intel Reference Systems
• Hardware Testbed
─
─
─
─
─
─
IXP2400 network processors
QDRM-SRAM, Flash ROM and other memories
1G optical ethernet ports
100M ethernet management port
Serial interface
PCI interfaces
• SDK (software development kit)
─
─
─
─
Compiler
Assembler, linker
Simulator
Reference codes
Ning Weng
ECE 526
20
Lab3: Forwarding, Counting & Classification
• Goal: to explore the basic functionalities of the IXP2400 software
development kit and Microengines.
• 3 parts:
─ Part I: collecting a number of workload statistics from the IXP SDK
simulator. Follow steps of lab instruction.
─ Part II: adding one counting block to count the number of packets.
─ Part III: implementing a simple packet classification mechanism.
• Tools: All three parts require access to a machine that has the Intel
SDK installed. If you want, you can also request an installation CD
for your own machine, check with TA.
Ning Weng
ECE 526
21
Part I: Forwarding Simulation
• run an implementation of IP forwarding on the IXP2400
simulator. All the code is provided to you.
• collect a set of workload statistics that are reported by
the simulator.
Ning Weng
ECE 526
22
Part II: Forwarding and Counting
• modify above applications by adding counter block
• store how many packets are received.
Ning Weng
ECE 526
23
Part III: Classification and Counting
• classifying packets based on the packet header information.
There are four types of traffic that are considered in this lab:
─
─
─
─
Web traffic over TCP over IPv4
Non-Web traffic over TCP over IPv4
UDP over IPv4
IPv6
• modifying the code to report the number of packets
in each type.
Ning Weng
ECE 526
24
How to do Lab3
• Windows machine with SDK installed
• Download lab instructions and source code from
blackboard
• Start early.
• Very exciting lab.
• Due day
─ Part I and Part II 10/13
─ Part III 10/20
Ning Weng
ECE 526
25
Download