Oblivious Routing

advertisement

Routing Algorithms

ECE 284

On-Chip Interconnection Networks

Spring 2014

1

Routing

• Will assume 2D mesh in this talk

• How flits are routed from source to destination can greatly impact network congestion

• Two types of routing:

• Oblivious routing: routing does not consider or depend on the current traffic condition

• Adaptive routing: takes into consideration current traffic condition to determine the routing path

(tries to get around congested areas)

• Oblivious routing simpler (less expensive) to implement

• This talk will review existing oblivious routing algorithms

2

Routing Algorithm Objectives

• Maximize throughput

– How much load the network can handle

• Minimize latency

– Minimize routing delay between source and destination

3

Dimension-Ordered Routing (DOR)

(also called XY routing)

D

S either minimal XY or YX routing to the destination

(here it uses XY route with probability 1.0)

4

DOR (XY) Routing with Uniform Traffic

• For an N = K x K mesh, N/2 nodes are in the top half.

• 1/2 of its traffic will cross the bisection.

• Traffic crossing bisection uniformly distributed across K channels.

• Therefore, maximum channel load for DOR with uniform traffic is:

ϒ(DOR, uniform) = [ (N/2) * (1/2) ] / K = K/4

5

Problem with DOR (XY) Routing

• Minimal hop count

• But, in the worst-case, the links can get overly congested. e.g., transpose traffic pattern.

ϒ wc

(DOR) = K – 1 >> ϒ(DOR, uniform) in the worst-case.

6

Valiant Load-Balancing (VAL) [1981]

D

S randomly chosen intermediate node minimal XY routing to any intermediate node, then minimal XY routing to destination node

7

Valiant Load-Balancing (VAL)

• Works by turning any traffic pattern into

2 phases of uniform traffic patterns, even adversarial or worst-case traffic patterns.

• In effect, it evenly load-balances the traffic.

• Worst-case channel load

ϒ wc

(VAL) = 2 * ϒ(DOR, uniform) = 2 * (K/4) = K/2 which is 1/2 network capacity relative to DOR and uniform traffic.

8

Valiant Load-Balancing (VAL)

• Effective network capacity normalized throughput is 1/2 capacity.

• However, average hop count is 2X DOR.

• 1/2 capacity was thought to be the optimal worst-case throughput for any routing algorithm.

9

ROMM [1995]

S

D intermediate node randomly chosen only in the minimal direction to destination minimal XY routing to an intermediate node only in the minimal direction, then minimal XY routing to the final destination node

10

ROMM

• Tries to load-balance traffic by randomly distributing traffic along all possible minimal paths.

• Good that minimal number of hops is guaranteed.

• But, turns out in the worst-case, ROMM performs about as bad as DOR.

11

O1TURN [2005]

D

S use both minimal XY and YX routing to the destination

(0.5 XY + 0.5 YX )

12

O1TURN

• Even though it only considers XY or YX path, not all possible paths in VAL or all possible minimal paths in ROMM, it is guaranteed to achieve 1/2 capacity for the even radix case, which has been shown to be optimal.

• For the odd radix case, O1TURN is very near the optimal 1/2 capacity.

• Unlike VAL, O1TURN only uses minimal routing paths, thus no penalty in hop count.

13

0.5

0.4

0.3

0.2

0.1

VAL

DOR

ROMM

O1TURN

Comparison

Even Radix : Opt * 1

Odd Radix : Opt * (1 - 1 / K 2 )

[adapted from Seo et al., O1TURN talk, ISCA 2005] 14

Simulation Results

• 4 x 4 2D MESH – Uniform Random Traffic Pattern

200

150

DOR

ROMM

O1TURN

DUATO

100

50

0

0 0.2

0.4

0.6

0.8

Throughput (flits / node / cycle)

1

[adapted from Seo et al., O1TURN talk, ISCA 2005]

15

Simulation Results

• 4 x 4 2D MESH – Matrix Transpose Traffic Pattern

– One of the worst-case traffic pattern for DOR

200

150

DOR

ROMM

O1TURN

DUATO

100

50

0

0 0.2

0.4

0.6

0.8

Throughput (flits / node / cycle)

1

16

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Simulation Results

• 4 x 4 2D MESH – Bit Complement Traffic Pattern

– Already balanced traffic pattern

200

150

100

50

DOR

ROMM

O1TURN

DUATO

0

0 0.2

0.4

0.6

0.8

Throughput (flits / node / cycle)

1

17

[adapted from Seo et al., O1TURN talk, ISCA 2005]

Simulation Results

• 4 x 4 2D MESH – HOT SPOT Traffic Pattern

– 2 nodes have 20% of traffic

200

150

100

50

DOR

ROMM

O1TURN

DUATO

0

0 0.2

0.4

0.6

0.8

Throughput (flits / node / cycle)

1

[adapted from Seo et al., O1TURN talk, ISCA 2005]

18

Simulation Results

• Delay penalty of adaptive routing

– How the complexity of router implementation affects on latency

– Hot Spot Traffic Pattern

2000

1500

DOR

ROMM

O1TURN

DUATO

1000

500

0

0 0.2

0.4

0.6

0.8

Throughput (flits / node / cycle)

[adapted from Seo et al., O1TURN talk, ISCA 2005]

1

19

U2TURN [2012]

• 1/2 capacity has been thought to be optimal worst-case throughput for both odd and even radices, and O1TURN is the state-of-the-art for achieving this for the even radix case.

• But, turns out 1/2 capacity is not optimal for the odd radix case.

20

U2TURN

• U2TURN considers all possible XYX and YXY 2-TURN paths, and selects these paths with equal probability.

• XYX paths: randomly select a node on the same row and route to it, followed by minimal YX routing to final destination.

• YXY paths: randomly select a node on the same column and route to it, followed by minimal XY routing to final destination.

21

Analytical Results

• For the even radix case, worst-case capacity of

U2TURN = 1/2, same as VAL and O1TURN, which is optimal.

• But, for the odd radix case, worst-case capacity of

U2TURN =

(K+1)/(2K+1) > 1/2 which is better than any existing routing algorithm.

22

Worst-Case Throughput

ROMM

VAL

DOR

U2TURN

O1TURN

Optimal routing

23

Throughput Comparison for Odd Radix

3X3 mesh

Worst-case

VAL DOR O1TURN U2TURN

0.5

0.33

0.44

0.57

Average-case

Transpose

Random

DOR-WC

0.5

0.5

0.5

0.5

0.405 0.477

0.33

1

0.33

0.67

1

0.67

Complement 0.5

0.67

0.67

Nearest-Neighbor 0.5

1.33

1.33

0.604

0.8

0.72

0.8

0.57

0.75

5X5 VAL DOR O1TURN U2TURN

0.5

0.3

0.48

0.55

0.5

0.44

0.53

0.5

0.3

0.6

0.5

1

0.5

0.3

0.5

0.6

0.5

2.4

1

0.6

0.6

2.4

0.632

0.75

0.685

0.75

0.55

1.17

24

Throughput Comparison for Even Radix

4X4 mesh

Worst-case

VAL DOR O1TURN U2TURN

0.5

0.33

0.5

0.5

Average-case

Transpose

0.5

0.48

0.54

0.5

0.33

0.67

Random

DOR-WC

Complement

0.5

0.5

0.5

1

0.33

0.5

Nearest-Neighbor 0.5

2

1

0.67

0.5

2

0.64

0.8

0.7

0.8

0.5

1.1

6X6 VAL DOR O1TURN U2TURN

0.5

0.3

0.5

0.5

0.5

0.47

0.556

0.5

0.3

0.5

1

0.6

1

0.5

0.3

0.5

0.5

0.5

3

0.6

0.5

3

0.65

0.75

0.682

0.75

0.5

1.27

25

Latency

• A potential concern about U2TURN is that it uses non-minimal routing paths.

• However, U2TURN does a better job of loadbalancing traffic than O1TURN for difficult traffic patterns.

• Hence, the queuing delay can be very high for

O1TURN for difficult traffic patterns, hence longer latency despite fewer number of hops.

• Surprisingly, latency better for both odd and even radix cases for difficult traffic patterns.

26

Latency for 7 x 7 Mesh

27

Latency for 8 x 8 Mesh

28

References

• Valiant [L.G.Valiant et. al, ACM 1981]

• ROMM [T.Nesson et. al, ACM 1995]

• O1TURN [D. Seo et. al, ISCA 2005]

• U2TURN [G. Sun et. Al, ICCD 2012]

29

Download