Orbit1000 Technology Discussion Overview

advertisement
Opnix Smart Routing Technology
Overview
”There is more then one way to skin a cat…”
Aaron D. Britt
Opnix, Inc.
Orbit1000 Technology Discussion
NANOG -1-
Orbit1000 Technology Discussion Overview
•
•
•
•
•
Orbit1000 Technology Discussion
Orbit1000 CPE Overview
Probing Method in More detail
Orbit1000 CORE Overview
Things to Come…
Lets Review - Q & A
NANOG -2-
Orbit1000 CPE High Level Architecture
Subscriber
AS 100
Orbit
AS 64701
IP Block Advertised - (24.10.0.0/16)
24.10.1.1
LAN
A
IBGP
C
30.30.30.2
OSPF
Area 0
24.10.4.1
IBGP
ENCRYPTED
10.10.10.2
Orbit 1000
Opnix CORE
B
20.20.20.2
EBGP
EBGP
30.30.30.1
Carrier C
AS 300
Orbit1000 Technology Discussion
EBGP
20.20.20.1
Carrier B
AS 200
NANOG -3-
Functions of the Orbit1000 CPE
Probe stuff
Receive BGP Feed and Set Routes
Communicate with the CORE
– Send Raw Probe Data
– Receive Optimized Routes
CORE
ENCRYPTED
•
•
•
Orbit1000 CPE
Customer
Router(s)
Set BGP Routes
Orbit1000 Technology Discussion
Internet
QA Probes
Discovery Probes
NANOG -4-
How we become one with the Packet
•
UDP Probes – Proactive Philosophy using patented ActiveScan
– Tried ICMP - routers drop ICMP despite what RFC says
– We tried TCP – set off IDS systems all over the place
– We tried the force - but none of us had enough metaclorians.
– We now use a UDP probe, though proprietary in nature, very
similar to that of a typical traceroute.
– We found that during testing, routing policy set using UDP Probe
data is within 2% of the routing policy set using TCP probe data,
but it doesn’t set off IDS systems!
Orbit1000 Technology Discussion
NANOG -5-
Probing Mechanism
•
•
Where do we probe?
– Prefix List based on prefixes important to each Customer
• Top 500 Trafficked Sites/ News Groups etc…
• Route Feed from Customer Routers
• Traffic Flow Data (Netflow, Span Port <sniff sniff>)
• Logs (Web, DNS etc…)
• Capable of probing 110,000+ routes, but it doesn’t make sense to
(most of the time)
– discovery.ignore and discovery.include lists.
– ’Prefix + 1’ methodology, unless a more specific ip address is specified in
the configuration.
We probe multiple prefixes over multiple upstreams in parallel, configurable
amount – how much bandwidth do you want to spend on Probes?
Orbit1000 Technology Discussion
NANOG -6-
Metrics Gathered
•
•
OpScore (Algorithm based on the probe data weighted, and calculated
based on customer defined settings)
– Latency
– Unreliability
• Link Unreliability
• Probe Closure
Prefix 216.183.192.0/19 Over Carrier "B"
Prefix 216.183.192.0/19 Over Carrier "C"
Carrier Preferenc (Range 100 - 1)
Carrier Preference (Range 100 - 1)
• Packet Loss
Actual
Weight
Result
Actual
Weight
Result
25
25%
6.25
75
25%
18.75
• Routing Loops
Layer 3 Hops (range 2 to 30)
Layer 3 Hops (range 2 to 30)
Actual
Weight
Result
Actual
Weight
Result
– Bad Hops
15
10.00%
1.5
20
10.00%
2
Bad Hops (range 1 to 5)
Bad Hops (range 1 to 5)
– Layer 3 Hops
Actual
Weight
Result
Actual
Weight
Result
1
10.00%
0.1
0
10.00%
0
– Carrier Preference
Unreliability (Range 1 - 100)
Unreliability (Range 1 - 100)
Actual
Weight
Result
Actual
Weight
Result
Lowest score wins
50
25.00%
12.5
25
25.00%
6.25
Latency (5 to 300 ms)
Actual
Weight
Result
125
30.00%
OpScore
Orbit1000 Technology Discussion
37.5
57.85
Latency (5 to 300 ms)
Actual
Weight
Result
50
30.00%
OpScore
15
42.00
NANOG -7-
QA Process (Testing the Active Link)
•
•
•
•
•
•
UDP Based (Just like our Discovery Probes)
We QA everything!
We send the QA probe to a TTL based on where we think the endpoint
is based on our discovery data.
We check the latency and unreliability against the probe data we used
to set the route.
How many QA routes do we send, and how fast?
– The QA Limit is configurable like Carrier Limit in the Client Config –
which means you control how many routes we can QA in parallel.
QA happens much faster then Discovery.
Orbit1000 Technology Discussion
NANOG -8-
Customer Portal
Orbit1000 CORE
•
5 Pieces
– Balancer (Communicates w/CPE)
– Optimizer (Crunches Numbers)
– View (Keeps Latest and Greatest Views per CPE)
– SQL dB (Stores Stuff)
– Customer Portal (Looks stuff up)
Portal
CORE
SQL dB
CPE
Orbit1000 Technology Discussion
Balancer
Optimizer
View
NANOG -9-
Data Access
•
•
Portal
– Access to Data, raw and graphical (Current and Historical)
– All metrics and weights represented
– Access to each CPE Client Config
– RouteVision (Visualize over Multiple Paths)
– Aggregate Summarizations
SQL dB
– Raw Data
• Transactional Data (Real Time)
• Warehoused Data (Portal)
• Archival Data
Orbit1000 Technology Discussion
NANOG -10-
Fault Tolerance Stuff…
•
•
•
•
•
•
If it goes up in smoke, the Customer router reverts back to standard BGP.
Discovery Probes halt if the CPE loses the CORE connection, if keep-alives fail
within a period of time, product removes routes and “sleeps” until
communication with the CORE is reestablished.
Fault Tolerant reasoning behind storing CPE config on central dB
Heartbeat / fail over process between CPE’s
SNMP traps, early warning system (RAM, Hard Disk, CPU etc..)
Always working on additional MIB support
Orbit1000 Technology Discussion
NANOG -11-
Things to Come…
•
•
•
•
•
•
•
•
•
•
Probes to support Jumbo Frames (Adjustable Frame Size)
Dedicated Jitter Metrics
Black- hole and Routing Loop Discovery/reports via Website
TCP Slow Start Algorithm emulation
TCP and/or UDP probes (Pick your poison)
TCP Sniffing for Active Links (Monitor Actual Data – Replace QA)
Multicast Support
IPV6 Support
Additional MIB support
NEBS Compliant (just kidding)
Orbit1000 Technology Discussion
NANOG -12-
Contact Information
If you have any questions or would like to comment and/or critique this
method of ‘Cat Skinning’ (I would love for some hecklers to drop me a
line, with-out peer review no progress is possible) here is my contact
info…
http://www.opnix.com
aaron@opnix.com
Case Studies available today…
• Tier 1 ISP
• Fortune 5 Enterprise
• Fortune 100 Financial Institution
• Internet2/Abilene Deployment
Orbit1000 Technology Discussion
NANOG -13-
Layer 3 Hops vs latency (30 day Summary)
Orbit1000 Technology Discussion
0.3
0.25
0.2
0.15
Series1
0.1
0.05
19
17
15
13
11
9
0
7
0.020716
0.024832
0.033791
0.045662
0.055674
0.079405
0.109979
0.131937
0.141727
0.142373
0.143105
0.151558
0.177103
0.196629
0.216883
0.231439
0.244841
0.263682
0.268043
5
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
latency:
3
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
NANOG -14-
Prefixes are how many hops away?
Orbit1000 Technology Discussion
16000
14000
12000
10000
8000
Series1
6000
4000
2000
19
17
15
13
11
9
0
7
2047
473
660
1621
2726
3601
4340
5527
7831
8761
9111
13756
9506
7743
7174
4679
4321
2881
1339
5
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
# prefixes:
3
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
ttl:
NANOG -15-
Other Questions to ask…
•
•
•
•
Is there a direct correlation between Hops and Latency? Hop
count seems anecdotal, yet the numbers are quite convincing…
How accurate does UDP measurements compare with TCP
measurements when talking about Latency, Packet Loss and
Throughput?
How much does Asymmetrical routing, play a part in the world
of Sub optimal routing?
With Netflow stats, on average it seems that Routers only
forward packets to 10% or so of the Global Rib, yet our routing
Tables are tenfold +. Seems we can do something here, I just
don’t know what, yet…
Orbit1000 Technology Discussion
NANOG -16-
Download