Evaluating the Latency Impact of IPv6 on a High Frequency Trading

advertisement
Evaluating the Latency Impact of IPv6 on a High Frequency
Trading System
Nereus Lobo, Vaibhav Malik, Chris Donnally, Seth Jahne, Harshil Jhaveri
Nereus.Lobo@colorado.edu, Vaibhav.Malik@colorado.edu,
Chris.Donnally@colorado.edu, Seth.Jahne@colorado.edu,
Harshil.Jhaveri@colorado.edu
A capstone paper submitted as partial fulfillment of the requirements for the
degree of Masters in Interdisciplinary Telecommunications at the University of
Colorado, Boulder, 4 May 2012. Project directed by Dr. Pieter Poll and Professor
Andrew Crain.
1
Introduction
Employing latency-dependent strategies, financial trading firms rely on trade execution
speed to obtain a price advantage on an asset in order to earn a profit of a fraction of a cent per
asset share [1]. Through successful execution of these strategies, trading firms are able to realize
profits on the order of billions of dollars per year [2]. The key to success for these trading
strategies are ultra-low latency processing and networking systems, henceforth called High
Frequency Trading (HFT) systems, which enable trading firms to execute orders with the
necessary speed to realize a profit [1]. However, competition from other trading firms magnifies
the need to achieve the lowest latency possible. A 1 µs latency disadvantage can result in
unrealized profits on the order of $1 million per day [3]. For this reason, trading firms spend
billions of dollars on their data center infrastructure to ensure the lowest propagation delay
possible [4]. Further, trading firms have expanded their focus on latency optimization to other
aspects of their trading infrastructure including application performance, inter-application
messaging performance, and network infrastructure modernization [5].
As new networking technologies are introduced into the market, it is imperative that these
technologies are evaluated to assess the impact on the end-to-end system latency. The objective
of this research is to determine if there is a significant impact on HFT system latency from
emerging networking technologies that could result in a competitive disadvantage. Specifically,
this paper develops a latency-optimized HFT system performance model and contrasts latency
performance between IPv4 and IPv6 through this model. For the purposes of measurement and
evaluation, this paper defines a latency of 1 µs as significant based on the magnitude of forgone
profit that can occur from such a disadvantage. Our research contributes a latency optimized
end-to-end HFT system model for comparing latency performance with other HFT systems and
platforms. Additionally, this paper provides recommendations for HFT system optimization.
This research is important because the profitability of low latency trading strategies is highly
coupled to end-to-end HFT system latency performance, where a small change in latency
performance can result in a large swing in realized profits. Further, IP is a foundational
networking technology implemented across an HFT system. Changing the IP implementation
from IPv4 to IPv6 alters the latency characteristics of these devices due to additional overhead
1
processing and serialization delays. While the change in latency is arguably small, these small
latencies, accumulated over an end-to-end HFT system, may result in a significant 1 µs latency
increase.
The salient feature of an HFT system is the speed in which it is able to execute a trade. In
an HFT system, pre-trade analytics occur to determine if a trade should be executed or not. The
latency of pre-trade analytics, which occurs at the application layer inside an HFT market
analysis engine, is beyond the scope of this paper. Additionally, the inter-software messaging
and the trade execution software processing latency are beyond the scope of this paper. Instead,
this research focuses solely on the trade execution path latency performance across the
processing platform and networking devices that provide connectivity to the financial exchange.
The main audiences for this research are financial trading firms that employ low latency
trading strategies and network service providers that service these firms. The findings from this
research serve to assist with a financial trading firm’s HFT system IP modernization planning.
Further, the resulting latency-optimized HFT system model can be used by trading firms and
network service providers as a point of comparison to assess the performance of their HFT
systems and to identify latency optimization opportunities for those systems.
2
Assumptions
This paper makes two assumptions to establish a common networking interface across an
HFT system. The first assumption is the message size for a financial trading firm’s trade
execution order is 256 bytes. This assumption is based on the largest average trade order
message size used by a financial exchange [6]. The second assumption is that all processing and
networking devices implement a 10G networking adapter, which is derived from trading firms’
data center equipment expenditures [4].
3
Prior Study
Our initial literature review served to confirm the novelty of our research question. We
were able to find general latency studies for specific computing and networking technologies, but
did not find any studies specific to the latency of an end-to-end HFT system. We were also able
to find general studies on latency issues caused by IPv6, especially as they relate to transition
technologies; however, we were unable to find any studies that described potential IPv6 latency
issues in an HFT network.
4
Methodology and Results
To determine the latency impact of IPv6 on an HFT system, we decomposed the system into
three segments: the trade execution computing platform segment, the IP network segment, and
the optical transport segment. Next, we developed latency models to identify sources of latency
for processing, networking, and transmission platforms used in the HFT system segments. From
there, a literature survey was conducted to identify high performance platforms likely deployed
in HFT systems, and to identify where differences between IPv4 and IPv6 implementations
would be latency impacting.
With the HFT system segment platforms identified, our literature survey expanded to obtain
latency performance measures for each platform and to assess any performance differential
between IPv4 and IPv6. Finally, with consideration to the latency contributions of IPv4 and
2
IPv6, platform latency performances are compiled into the HFT system model to establish a
latency-optimized performance baseline.
4.1
Trade Execution Computing Platforms
Trade execution computing platforms are simply high performance computers that run a
financial trading firm’s trade execution software application [7]. To identify potential sources of
latency on the computing platform, an IP packet processing model was developed to illustrate
different potential paths through an operating system (OS) available to trade execution software
executing a trade order. The first path is through the OS’s networking stack, where the IP
protocol overhead processing occurs, along with processing of other protocols. The second path
is through a TCP Offload Engine (TOE), which instantiates the networking stack on the
platform’s network adapter. The computing platform IP packet processing model is illustrated in
Figure 1.
From this model, the pertinent computing platform processing aspects that contribute to
trade execution latency are the OS networking stack, TOE, IPSec and IPv6 transition protocol
processing. Further, HFT literature identifies that trade execution computing platform
deployment configurations are non-standard and can be implemented with a variety of OS and
hardware configurations [7]. Therefore, the literature survey focused on locating latency
performance measures for High Performance Computing (HPC) platforms, which run modern
versions of either Windows or Linux OS configured with a latency-optimized networking stack.
Application Layer
Socket Layer
Transport Layer
TCP OffLoading
Zero
Copy
Network Layer
-IPSec
-v6 Transition
Device Driver
Interrupt Coalescing
S
Y
S
T
E
M
C
A
L
L
S
T
A
S
K
T
A
S
K
I
N
T
E
R
R
U
P
T
S
S
C
H
E
D
U
L
I
N
G
Legend
Packet Processing
NIC Hardware
Operating System
Figure 1: Computing Platform IP Packet Processing Model
The literature survey produced two independent research studies identifying round-trip time
(RTT) latency performance on similar HPC platforms. The first study was conducted on a 3
GHz Intel Xeon 5160 processor with the following configuration: Linux 2.6.18, latencyoptimized network stack, and a Chelsio T110 10 Gigabit Ethernet (GigE) adapter. The
platform’s RTT latency performance was 10.2 µs [8]. The second study was conducted on a 2.2
3
GHz AMD Opteron processor with the following configuration: Linux 2.6.6, latency-optimized
network stack, and a Chelsio T110 10GigE adapter. The platform’s RTT latency performance
was 10.37 µs [9]. Further, the first study contrasted the platform’s OS networking stack latency
performance against the TOE on the Chelsio T110 10GigE adapter. This resulted in a 1.3 µs
latency improvement, which lowered the total platform latency performance to 8.9 µs [9]. In
contrast, non-optimized computing platforms and networking stacks have a latency performance
on the order of 40 µs, which is significantly higher than that of optimized platforms [10][11].
Finally, the computing platform’s total latency performance is decomposed into transmit and
receive latency performance. Utilizing the TOE, the computing platform’s transmit latency is
2.9 µs and the receive latency is 6.0 µs [8][9].
The measured IPv4 processing latency for the optimized networking stack on the Intel
platform, which utilized the network adapter for IPv4 header checksum offloading, is 450 ns [8].
While latency performance data for IPv6 is unavailable from the platform study, investigation
into the Linux 2.6 network stack shows increased processing is needed for IPv6 socket creation
due to the inclusion of ICMP, NDP, and IGMP into IPv6 [12]. An independent study contrasting
IPv4 and IPv6 socket creation times confirms this finding [13]. However, a financial firm has a
persistent connection to a financial exchange during trading hours; i.e., socket creation latency is
not a factor in trade order execution speed. Continuing, once the socket is created, and assuming
IPv4 checksum offloading, the main processing performed at the IP layer is to fragment large
TCP segments [14]. Based on the assumption of a 256 byte message size, the IP layer will not
need to fragment the TCP segment. Therefore, the OS processing demands on IPv6 are
equivalent to IPv4 and will not incur a significant latency penalty. However, a latency penalty is
incurred when transmitting an IPv6 packet due to the serialization delay from the larger overhead
size. The IPv6 serialization latency increases the computing platform latency performance by
.016 µs. Table 1 identifies the trade execution computing platform latency performance values
for IPv4 and IPv6 used in the latency-optimized HFT system model.
Table 1: Trade Execution Computing Platform Latency Performance Model
Trade Execution
Computing Platform
Model Tx latency
Model Rx latency
Total Latency
IPv4 Latency
(µs)
2.9
6.0
8.9
IPv6 Latency
(µs)
2.916
6.016
8.932
Other optional IP technologies that may impact computing platform latency performance are
IPSec and IPv6 transition mechanisms. IPSec appends additional headers to IPv4 and IPv6 for
increased IP communications security [15]. When IPSec is configured to have the lowest impact
to latency, i.e., is configured for tunnel mode and AES-128 Encapsulating Security Payload
(ESP), the Intel platform incurs an additional 1.9 µs latency performance penalty [8][16]. The
IPSec latency performance penalty is equal across IPv4 and IPv6 [15]. Further, IPSec use is
optional in both IPv4 and IPv6, even though it implementation in IPv6 is mandatory [17].
Finally, a survey of IPv6 transition mechanism performance studies exposed the potential for up
to a 200 ms latency penalty on non-optimized platforms [18][19].
4
4.2
IP Network
From the perspective of a financial trading firm, the main purpose of networking devices in
an HFT system is to processes and route trading orders to the financial exchange as fast as
possible. A general architecture for networking devices was constructed to identify potential IP
packet processing latency sources [20]. Based on the functions identified in this architecture, the
potential latency sources are the networking device’s processing delay, fragmentation delay,
serialization delay, queuing delay and checksum delay. This architecture is illustrated in Fig. 2.
Figure 2: Networking Platform IP Packet Processing Model
When an IP packet enters a networking device, a processing latency is incurred because the
device must read in the IP packet header and make a routing decision based on the IP header
address [20]. Additionally, IPv4 packets incur further latency due to the verification of the
header checksum, which determines if any header errors were incurred during the transmission
of the packet [21]. However, IPv6 does not contain a header checksum and is therefore not
subject to the additional checksum latency penalty [16]. This paper combines IPv4 checksum
processing into the IPv4 processing latency for evaluation purposes.
The processing delay for a networking device is defined as the inverse of the device’s packet
processing speed [22]. Once the path through the networking device is determined, the IP packet
is sent to the queue on the selected egress port for transmission. The queuing delay is the
amount of time a packet is held prior to transmission and is determined by the queuing algorithm
efficiency and egress port buffer size [23]. The queuing delay for a networking device is defined
as the product of the network load and the store and forward switching latency [24]. As the IP
packet exits the networking device’s egress port, the packet is serialized and transmitted one bit
at a time onto the physical transmission medium. The serialization delay is defined as the sum of
the message size and the networking overhead, in bits, divided by the data rate [24].
The last source of latency identified is fragmentation delay, where a transmitting networking
device fragments a large IP packet into smaller packets for transmission over a path with a
smaller Maximum Transmission Unit (MTU). The receiving networking device must wait until
5
all packet fragments are received before a routing decision can be made. Based on the assumed
trade order packet size, IP fragmentation will not occur in an HFT system [6]. Additionally, it is
worth noting that while fragmentation is supported in IPv4, IPv6 implements MTU discovery
which mitigates the need for packet fragmentation on intervening devices such as routers [16].
From this model, the relevant networking device processing aspects that contribute to trade
execution latency are the processing delay, queuing delay, and serialization delay. The main
objective of the literature survey was to identify latency performance measures for the highest
performing networking devices. The router literature survey produced a study comparing the
performance between the Cisco NX7000 and Juniper EX8000 series routers, which are marketed
as high performance data center routers [25]. Based on the data provided in this study, and using
the processing delay calculation from the model, the processing delay for the Cisco NX7000
series router was calculated to be 16.67 ns for IPv4 and 33.3 ns for IPv6. The processing delay
for the Juniper EX8000 series router was calculated to be 8.3 ns for IPv4 and IPv6.
To find the worst case queuing and serialization latency for a trade execution order, the
largest average message size of 256 bytes is used in the calculations [6]. The queuing latency for
IPv4 is 66.2 ns and for IPv6 is 66.8 ns. The serialization latency over a 10GigE network adapter
using IPv4 is 251 ns and using IPv6 is 267 ns.
Based on the latency sources identified in the model, the aggregate latency performance for
the Cisco NX7000 is 333.9 ns for IPv4 and 367.1 ns for IPv6. The aggregate latency
performance for the Juniper EX8000 is 325.5 ns for IPv4 and 342.1 ns for IPv6. Therefore, the
latency performance difference between IPv4 and IPv6 is not significant. However, a large
performance discrepancy exists between the networking device IP packet processing model and
the measured overall device latency. From the study, the aggregate latency performance of the
Juniper EX8000 ranges between 8 to 15 µs and the performance of the Cisco NX7000 ranges
between 20 to 40 µs [25]. Investigation into the difference between the modeled and measured
latency performance of the Cisco and Juniper routers is left to further study. Instead, the
literature survey expanded scope to compare the latency performance of TCP/IP on latencyoptimized technologies such as Infiniband and Myrinet. From the literature survey, a study
measured the latency performance for TCP/IP at 60 µs, for Myrinet at 8.3 µs, and for Infiniband
at 5.3 µs [8]. Another study evaluated networking devices specifically optimized for latency
performance. This study evaluated the Fujitsu XG1200 10GigE switch and measured the latency
performance at 0.45 µs [9]. The latency performance from optimized networking devices and
technologies more closely match the model’s calculated latency performance. Table 2 contains
the modeled IP network device latency performance values used in the latency-optimized HFT
system model.
Table 2: IP Network Device Latency Performance Model
IP Network Device
Processing Delay
Queuing Delay
Serialization Delay
Total Latency
4.3
IPv4 Latency
(µs)
0.008
0.066
0.251
0.326
IPv6 Latency
(µs)
0.008
0.067
0.267
0.342
Optical Network
Financial trading firms are conscious of propagation delay and have been actively reducing
6
it, primarily through co-locating their datacenters with the financial exchanges [4]. Pragmatism
unfortunately dictates that trading firms cannot co-locate a datacenter at every exchange and that
they must therefore rely on high-speed long-haul optical networks for many connections. Given
the ubiquitous need for trade execution speed, latency sources in the long-haul optical network
need to be evaluated for impact on HFT system performance. The architecture model, illustrated
in Figure 3, identifies three categories of potential latency sources present in the optical network.
Processing Delay
ADM / Multiplexor
Dispersion
Compensation
Optical
Amplifier
80 to 150km
ADM / Multiplexor
80 to 150km
Distance Delay
Figure 3: Optical Network Latency Model
The first category is distance delay, which is a function of lightwave propagation delay,
fiber cable construction, and cable routing directness. Lightwave propagation delay is a function
of the speed of light and the optical fiber refraction index. Modern optical fiber cables, including
OFS TrueWave Reach and Corning LEAF, exhibit latency characteristics of approximately 4.9
ms over 1,000 km [26][27]. Air Core Fiber is a future generation fiber technology that exhibits
significantly improved latency characteristics of 3.36 ms over 1,000 km [28]. Next, fiber cable
construction impacts latency because most deployed fiber cables are constructed with loose
buffer tube design, where several fiber filled tubes are twisted around a central core. This
twisting approach increases the lightwave’s travel distance with respect to the fiber cable length.
Alternatively, fiber cables can be constructed with a single central tube that reduces the
lightwave’s travel distance within the fiber cable [29]. The loose buffer tube design can add 59
to 398 µs of latency over 1000 km, whereas single-tube ribbon cable adds only 15 to 24 µs over
1000 km [29]. Finally, cable routing directness relates the total installed fiber cable length to the
shortest distance between two geographic points. For example, based on the Level 3 fiber route
map between New York and Chicago, we estimated a distance of 1,485 km and a resultant
latency of 7.28 ms. One approach to reduce the overall fiber distance is to remove installed fiber
maintenance coils which would reduce the overall distance by 36 km and lower latency to 7.1 ms
[30]. Alternatively, new fiber routes could be installed to further reduce the fiber route distances
between two points. Spread Networks has applied this approach and was able to reduce the fiber
distance by 158 km and lower their network latency to 6.5 ms [31].
To construct an optimized distance latency model, the minimum distance between the New
York and Chicago exchange is calculated based on the distance between their latitudinal and
longitudinal coordinates. Using this method, the distance between the New York and Chicago
exchange is 1,040 km. This distance is the optimized lightwave propagation distance and is used
along with the optimized fiber index of refraction and cable construction type to find the
7
optimized lightwave distance latency. Table 3 identifies the lightwave distance latencies for the
optimized and non-optimized models.
Table 3: New York Exchange to Chicago Exchange Lightwave Distance Latency Model
Optical Network
Distance
Refraction Index
Helix Factor
Routing Directness
Total Distance Latency
Optimized Latency
3.36 ms per 1,000 km
(Air Core Fiber)
0.015 ms per 1,000 km
(Single Tube Ribbon)
1040 km
(Theoretical minimum)
3.510 ms
Non-Optimized Latency
4.9 ms per 1,000 km
(OFS TrueWave)
0.398 ms per 1,000 km
(Loose Buffer Tube)
1485 km
(Level 3 Fiber Map)
7.868 ms
The second optical network latency category is the deployed optical transmission systems,
which include optical amplifiers, Add/Drop Multiplexers (ADMs), muxponders, dispersion
compensation modules (DCMs), and Forward Error Correction (FEC). For optical amplifiers,
the implementation of an Erbium Doped Fiber Amplifier (EDFA) adds approximately 30m of
erbium doped fiber per amplifier, which serves as an all optical amplification medium for the
optical signal. Over 1000 km, approximately 10 EDFAs are needed, which would increase the
overall fiber length by 300m resulting in an additional latency of 1 µs [32]. However, Raman
amplifiers are able to amplify the optical signal without introducing additional fiber length to the
route. For muxponders, which adapt layer 2 networking protocols to and from the optical
medium, the processing delay and queuing latency for adapting a 10GigE input onto the layer 1
optical network is 6 µs [33]. ADMs, which perform a similar adaption function for SONET
networks, did not have any 10GigE latency data available. Next, DCMs correct the dispersion of
chromatic light which would otherwise cause an optical signal to be intelligible to the receiver.
The latency characteristics of dispersion systems are technology dependent. Based on Level 3’s
New York to Chicago fiber route, older style DCMs on SMF-28e fiber will increase the latency
performance by 1.09 ms [29]. Alternatively, along the same route, current generation coherent
networking compensations, based on specialized ASIC/DSPs, reduce the latency penalty for
dispersion compensation to 3 µs. Finally, FEC processing, which is employed to improve the
optical signal to noise ratio via coding gains, incur a latency performance penalty, ranging from
15 µs to 150 µs [28][34]. Table 4 identifies the optical equipment processing latencies for the
optimized and non-optimized models.
The final latency category is the serialization of the optical protocol onto the fiber, which is
impacted by the overhead differences in the IP header and the choice of layer 2 adaption
protocol. To contrast the latency performance between IPv4 and IPv6, a 256 byte packet was
modeled through an OC-192c (10G) Packet of SONET (POS) link and a 10G optical Ethernet
link. For the 10G POS link, we calculated the latency performance for IPv4 at 246 ns and for
IPv6 at 263 ns. For the 10G optical Ethernet link, we calculated the latency performance for IPv4
at 254 ns and for IPv6 at 270 ns. Table 5 identifies the optical network IP serialization latencies
for the optimized and non-optimized models.
8
Table 4: Optical Network Equipment Processing Latency Model
Optical Network
Equipment
Optical Amplifier
Terminal Equipment
Processing
Dispersion Compensation
FEC Processing
Total Processing Latency
Optimized
Latency
0.0 µs
(RAMAN)
6.0 µs
(ADM)
3.0 µs
(ASIC/DSP)
15.0 µs
24.0 µs
Non-Optimized
Latency
1.5 µs
(EDFA)
6.0 µs
(Muxponder)
1,090.0 µs
(DCM)
150.0 µs
1,247.5 µs
Table 5: Optical Network IP Serialization Latency Model
Optical Network
IP Serialization
Optimized Latency
(10G POS)
IPv4 Latency IPv6 Latency
(µs)
(µs)
0.246
0.263
Non-Optimized Latency
(10G Ethernet)
IPv4 Latency IPv6 Latency
(µs)
(µs)
0.254
0.270
Table 6 identifies the composite optimized optical network model comprised of the
optimized lightwave distance latency, optical equipment processing latency, and IP serialization
latency. The optimized optical network latency performance values are used in the latencyoptimized HFT system model.
Table 6: Composite Optimized Optical Network Latency Performance Model
Optical Network Latency
Lightwave distance
Equipment processing
IP serialization
Total
4.4
IPv4 Model
Latency (µs)
3,510.00
24.00
0.25
3,534.25
IPv6 Model
Latency (µs)
3,510.00
24.00
0.26
3,534.26
HFT System Model
Based on our research, the latency-optimized HFT system model provides the minimum
latency achievable between a trading firm and financial exchange. Some aspects of the HFT
system model are not immediately achievable in practice because they are based on either future
technologies or direct paths between two geographic locations. This paper examines two trading
scenarios for latency-dependent trading strategies to characterize the HFT system model’s
performance.
The first strategy is low latency arbitrage, which exploits a condition at financial exchanges
where trade orders in the front of a queue receive a lower asset price, in the range of a cent or
two per share, than orders deeper in the queue [1]. A typical HFT system implementation
strategy is to co-locate the trade execution computing platform with the financial exchange [1].
Under this scenario, the transmit latency of a trade order is critical. From the developed HFT
system model and our collocation assumption, the HFT system segments that apply to this
9
scenario are the trade execution computing platform and the IP network. Figure 4 illustrates the
HFT low latency arbitrage scenario.
Co-located HFT Firm
Financial
Exchange
Firm A
Firm B
Buy Order
Buy Order
Exchange
Queue
Firm A Order – Stock Price $10.00
Firm B Order – Stock Price $10.01
Figure 4: High Frequency Trading Scenario – Low Latency Arbitrage
For the computing platform, the transmit latency for IPv4 is 2.9 µs and for IPv6 is 2.916 µs
[8][9]. For the IP network, the transmit latency for IPv4 is 326 ns and for IPv6 is 342 ns.
Therefore, based on the developed HFT system latency performance model for this scenario, the
total latency for IPv4 is 3.226 µs and for IPv6 is 3.258 µs. Table 7 contains the latencyoptimized HFT system model values for IPv4 and IPv6. Under the low latency arbitrage
scenario, the HFT system latency model shows a non-significant latency performance penalty for
IPv6 network implementations.
Table 7: Latency Optimized HFT System Model for Low Latency Arbitrage Scenario
HFT System Model
Trade Execution
Computing Platform
IP Network Device
Total Latency
IPv4 Latency IPv6 Latency
(µs)
(µs)
2.900
2.916
0.326
3.226
0.342
3.258
The second strategy is HFT scalping where a trading firm identifies, between two distant
exchanges, a higher asset bid price on one exchange than the asking price on the other. The firm
then purchases the asset at the cheaper asking price on the one exchange and immediately resells
the asset at the higher bid price on the other exchange, resulting in a profit of a few cents per
share [1]. A trading firm may capitalize on these scalping opportunities at any financial
exchange. Under this scenario, the round trip latency is critical because a trading firm must
complete the buy transaction before the selling opportunity disappears. In addition to the round
trip latency from the buy transaction, the transmit latency to the distance exchange for the sell
transaction must also be considered under this scenario. From the developed HFT system model,
10
the HFT system segments that apply to this scenario are the trade execution computing platform,
the IP network, and the optical network. Additionally, for this scenario we model the HFT
system latency performance of a trading firm in New York performing HFT scalping in Chicago.
Figure 5 illustrates the HFT scalping scenario.
Figure 5: High Frequency Trading Scenario – Scalping
For the trade execution computing platform, the total latency for IPv4 is 11.8 µs and for
IPv6 is 11.848 µs. For the IP network, the total latency for IPv4 is .978 µs and for IPv6 is 1.026
µs. For the optical network, the total latency for IPv4 is 3,534.25 µs and for IPv6 is 3,534.26 µs.
Therefore, based on the developed HFT system latency performance model for this scenario, the
total latency for IPv4 is 3,547.028 µs and IPv6 is 3,547.134 µs. Table 8 contains the latencyoptimized HFT system model values for IPv4 and IPv6. Under the scalping scenario, the HFT
system latency model shows a non-significant latency performance penalty for IPv6 network
implementations.
Table 8: Latency Optimized HFT System Model for Scalping Scenario
HFT System Model
Trade Execution
Computing Platform
IP Network Device
Optical Network
Total Latency
IPv4 Latency IPv6 Latency
(µs)
(µs)
11
11.800
11.848
0.978
3,534.250
3,547.028
1.026
3,534.260
3,547.134
5
Conclusion
Based on 1 µs as the latency of significance for HFT systems, we conclude that a trading
firm implementing a latency-optimized native IPv6 HFT system will not incur a latency
disadvantage when compared to a native IPv4 implementation. Our results show that among all
of the various delay types, the serialization of the additional IP header bits is the prominent
performance differentiator between IPv4 and IPv6, producing a performance difference on the
order of tens of nanoseconds. Additionally, our results show that the optical layer is the largest
source of latency in an HFT network and provides the greatest opportunity for latency
optimization.
5.1
Recommendations
For HFT systems administrators reviewing their path to IPv6, we offer a few
recommendations. First, our research finds that transitional deployments will be disadvantaged
significantly; administrators would need to make plans for a direct cutover to a native IPv6
environment. Second, high performance platforms, which are generally hardware based and
software optimized, are essential to eliminate any processing delay differential between IPv4 and
IPv6. Third, faster interfaces, such as the 10GigE used in our model, are essential to minimize
the serialization delay difference between IPv4 and IPv6. Finally, rigorous lab testing, based on
the firm’s production network, will need to be performed to ensure that platforms and their
interoperation do not contain any latency degrading hardware or software bugs.
5.2
Further Study
During the course of our research, we found several areas that warrant further study. The
first area to investigate is the latency performance difference between IPv4 and IPv6 on layer 4
processing platforms, including firewalls and WAN accelerators. While it is expected, based on
our research findings, that the performance difference is negligible, a systematic study of these
platforms would serve to enhance the end-to-end HFT system latency performance model.
Next, our research focused on a static model of an HFT system. Dynamic modeling,
defined as characterizing the latency performance under different load scenarios, presents
another opportunity for further study. Due to the cost and complexity of latency-optimized
equipment, a study of this nature would need significant funding to be performed in an academic
environment.
Finally, while HFT firms are currently focused on microsecond optimizations, we project
that nanosecond latencies will be emphasized in the future. As latency becomes optimized at the
optical layer, or for direct co-location applications, the IPv4 to IPv6 serialization delay
differential, even at higher future interface speeds, may again become an open question of
significance. For this future possibility, further study into the latency advantages and feasibility
of Infiniband and Myrinet would be warranted.
12
References:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
A. Golub, “Overview of high frequency trading,” presented at the Marie Curie Initial Training Network on
Risk Management and Risk Reporting Mid-Term Conference, Berlin, 2011.
C. Reed, (2012, February 27). A fair transaction tax for U.S. stock trading (1st ed.) [Online]. Available:
http://www.seekingalpha.com
J. Goldstein, (2010, June 8). The million dollar microsecond (1st ed.) [Online]. Available:
http://www.npr.org/
K. McPartland, (2010, June 21). Long distance latency: Straightest and fastest equals profit (1st ed.)
[Online]. Available: http://www.tabbgroup.com
M. Rabkin, (2010, Sep. 27). TABB says banks are taking holistic approach to finding weak links within a
trade’s lifecycle (1st ed.) [Online]. Available: http://www.tabbgroup.com
M. Simpson, “Market data optimized for high performance,” presented at the FIA Futures and Options
Expo Conference, Chicago, IL, 2005.
“Low latency framework,” presented at the Oracle Trading Applications Developer Workshop, London,
2011.
S. Larsen et al., “Architectural breakdown of end-to-end latency in a TCP/IP network,” in 19th Intl. Symp.
on Computer Architecture and High Performance Computing., Rio Grande do Sol, 2007, pp. 195-202.
W. Feng et al., “Performance characterization of a 10-gigabit ethernet TOE,” in 13th Symp. on High
Performance Interconnects., 2005, pp. 58-63.
S. Narayan and Y. Shi, “TCP/UDP network performance analysis of Windows operating systems with IPv4
and IPv6,” in 2nd Intl. Conf. on Signal Processing Systems., Dalian, 2010, pp. 219-222.
S. Narayan et al., “Network performance evaluation of internet protocols IPv4 and IPv6 on operating
systems,” in Intl. Conf. on Wireless and Optical Communications Networks., Cairo, 2009, pp. 1-5.
T. Herbert, “Internet protocol version 6 (IPv6),” in The Linux TCP/IP Stack: Networking for Embedded
Systems, 1st ed. Hingham, 2004, ch. 11, sec. 11.8, pp. 425-426.
S. Zeadally and L. Raicu, “Evaluation IPv6 on Windows and Solaris,” in IEEE Internet Computing., 2003,
pp. 51-57.
T. Herbert, “Internet protocol version 6 (IPv6),” in The Linux TCP/IP Stack: Networking for Embedded
Systems, 1st ed. Hingham, 2004, ch. 11, sec. 11.5.1, pp. 417-420.
S. Kent and R. Atkinson, (1998, November). Security architecture for the internet protocol [Online].
Available: http://www.ietf.org/rfc/rfc2401.txt
H. Niedermayer et al., “The networking perspective of security performance – a measurement study,” in
13th GI/ITG Conference on Measuring, Modeling and Evaluation of Computer and Communication
Systems., 2006, pp. 1-17.
S. Deering and R. Hinden, (1998, Dec.). Internet protocol, version 6 (IPv6) specification [Online].
Available: http://www.ietf.org/rfc/rfc2460.txt
S. Narayan and S. Tauch., “IPv4-v6 transition mechanisms network performance evaluation on operating
systems,” in 3rd IEEE Intl. Conf. on Computer Science and Information Technology., Chengdu, 2010, pp.
664-668.
S. Tauch, “Performance evaluation of IP version 4 and IP version 6 transition mechanisms on various
operating systems,” M.S. thesis, Computing & Technology, Unitec Inst. of Technology, NZ, 2010.
J. Aweya. “IP router architecture: An overview,” J. of Systems Architecture, vol. 46, pp.483-511, 1999.
J. Postel, (1981, Sep.). Internet protocol: DARPA internet protocol specification [Online]. Available:
http://www.ietf.org/rfc/rfc791.txt
J. Kurose and K. Ross. “Delay and loss in packet-switched networks,” in Computer networking: a topdown approach featuring the Internet, Boston, Addison-Wesley, 2000.
Latency on a switched Ethernet network, Ruggedcom, 2008 [Online]. Available:
http://www.ruggedcom.com/pdfs/application_notes/latency_on_a_switched_ethernet_network.pdf
E. Gamess and N. Morales. "Modeling IPv4 and IPv6 Performance in Ethernet Networks," International
Journal of Computer and Electrical Engineering, vol. 3, no. 2, pp. 285, 2011.
Juniper ex8200 vs. Cisco nexus 7000, Great Lakes Computer, 2012 [Online]. Available: www.glcomp.com
Corning LEAF optical fiber product information, Corning Incorporated, 2002.
TrueWave REACH fiber, OFS Fitel, LLC, 2011.
13
[28]
[29]
[30]
[31]
[32]
[33]
[34]
P. Schoenau, (2011, Apr. 28). Optical networks for low latency applications [Online]. Available:
http://www.a-teamgroup.com/
J. Jay, “Low signal latency in optical fiber networks,” in Proc. of the 60th IWCSConference, Charlotte, NC,
2011, pp. 429-437.
B. Quigley, (2011, May 4). Another building block of low-latency trading: efficient optical transport
[Online]. Available: http://blog.advaoptical.com/another-building-block-low-latency-trading-efficientoptical-transport/
Network Map, Spread Networks, 2012 [Online]. Available: http://www.spreadnetworks.com/network-map/
Low latency—how low can you go?, Transmode, 2011 [Online]. Available:
http://www.transmode.com/doc_download/262-low-latency-design
Cisco ONS 15454 40Gbps enhanced FEC full band tuneable muxponder cards, Cisco Systems, Inc., 2011.
A sensible low-latency strategy for optical transport networks , Optelian, 2011.
14
Download