Uploaded by vader721

Providing Reliability in Data Center Ethernet

advertisement
Providing Reliability in Data Center Ethernet
QIN Dan
Department of Electrical and Electronic Engineering
The University of Hong Kong
May 4, 2019
ABSTRACT
With the vigorous development of various network technologies, the data
center network has been widely used. Ethernet has become the main
infrastructure of the data center network because of its wide deployment,
simplicity, low cost, and maturity. However, Ethernet cannot provide reliable
transmission, and some applications in the data center network cannot use the
Transmission Control Protocol (TCP) for reliable transmission. Therefore, it is
necessary to provide a link-layer congestion notification mechanism (CN) for
Data Center Ethernet. To support this congestion notification technique,
researchers have developed some approaches, such as Backward Congestion
Notification mechanism (BCN), Forward Explicit Congestion Notification
mechanism (FECN), Enhanced Forward Explicit Congestion Notification
mechanism (E-FECN) and Quantized Congestion Notification mechanism
(QCN). Each method has its own pros and cons. In this paper, we make a
survey of the four mechanisms in the Data Center Ethernet and also propose a
possible optimization scheme to Quantized Congestion Notification
mechanism.
1. Introduction
Since the invention of Ethernet in the mid-1970s, Ethernet has continued to
develop and grow and has maintained its leading position in the field of
communication technology [1]. With the rise of the Internet, the degree of
automation in various workplaces has become increasingly high, and Ethernet
has also become increasingly common. Nowadays, Ethernet has become the
most prevalent wired LAN technology so far, almost all organizations use
Ethernet to connect computers in their workplaces [2]. The reason why
Ethernet is so successful is that Ethernet is not only the first widely deployed
high-speed local area network but also simple and cost efficient compared to
other technologies such as token ting, FDDI and ATM [3]. Due to so many
inherent advantages of Ethernet, it has become the main infrastructure in the
Data Center Ethernet.
2
Since the original design of traditional Ethernet is to provide best-effort
communication in the local area network, it cannot guarantee the delivery of
data packets to the destination, which makes Ethernet an unreliable
networking technology [4-7]. A reliable network structure needs to provide
reliability for each data packet injected into the network, and this characteristic
of Ethernet is provided by the upper layer protocol of the network [7-9]. In
fact, Ethernet relies on the Transmission Control Protocol (TCP) to provide
reliability [7, 10]. When network congestion occurs, the switch discards the
data packets, and the TCP protocol determines whether network congestion
occurs by detecting the discarding of data packets. The main reason why
Ethernet cannot provide reliability is that the path between the source and the
destination of the Ethernet consists of multiple hops, and the intermediate hops
did not feedback its buffer occupancy in time [6-8]. When the buffer of the
intermediate hop is full, the hop discards the data packets that arrive later, and
the sender cannot know this. Therefore, a mechanism that can provide
Ethernet reliability should be provided.
Some applications, such as the File Transfer Protocol (FTP), use the TCP
protocol of the transport layer to provide reliable transmissions. The
discarding of packets at the data link layer is acceptable because the TCP
protocol retransmits the drops. However, some applications cannot incorporate
reliability into upper-layer protocols, such as real-time applications or storage
traffic in the Data Center Ethernet [2]. These applications require high
throughput and low latency. In order to keep the latency low and avoid a lot of
queuing delays, the queue length should be kept small. In addition, packet loss
is unacceptable because every packet is critical and the retransmission of the
packet will cause unacceptable delay. Therefore, a hop-by-hop flow control
mechanism is needed to provide reliability for such applications [5, 7, 11]. The
mechanism can provide hop-by-hop reliability regardless of the transport
protocol. For this mechanism, some researchers have proposed a technology
called Congestion Notification (CN).
Congestion notification is a technology that runs on the link layer. This
technology detects the congestion level by monitoring the length of output
queue on the switch in the Data Center Ethernet and pushes the congestion
message to the rate regulators running on the edge of the network. The rate
regulators adjust the flow rate according to the congestion feedback message
received from the switch to avoid frame loss [12]. In this survey, we
summarize and discuss the congestion notification mechanisms in the Data
Center Ethernet. Researchers have proposed some approaches, such as BCN
[13, 14], FECN [15, 27], enhanced FECN (E-FECN) [16], and QCN [17].
3
The remainder of this paper is organized as follows. Section 2 elucidates the
Data Center Ethernet and its congestion control. Backward Congestion
Notification mechanism (BCN), Forward Explicit Congestion Notification
mechanism (FECN), Enhanced Forward Explicit Congestion Notification
mechanism (E-FECN) and Quantized Congestion Notification mechanism
(QCN) are described in Section 3, 4, 5 and 6 respectively. Section 7 analyzes
the pros and cons of the four congestion notification mechanisms. Finally,
conclusion and future work are discussed in Section 8.
2. Background
2.1 Data Center Ethernet
With the vigorous development of various network technologies in recent
years, many companies have established large data centers composed of a
large number of storage devices, which can store, access and process large
amounts of data [18]. What plays a vital role in a large data center is the Data
Center Ethernet that supports internal communication between a large number
of computing resources. The Data Center Ethernet not only contains storage
devices, but also has servers which are used to manage data and Ethernet
switches for connecting the devices. It mainly provide not only online services
(such as web search), but also distributed storage systems (such as MapReduce
[20]) and distributed file systems (such as GFS). Due to the powerful functions
of the Data Center Ethernet, it is widely used in the government, business,
education and financial sectors [19].
2.2 Congestion Control in Data Center Ethernet
Although the data center network is different from the traditional local area
network and wide area network, it still faces a series of problems. For example,
the number of servers in the data center network is increasing at an
exponential rate, which leads to problems in the interconnection, congestion
notification, and robustness of the network [21, 22]. In addition, various
applications in the data center network have different requirements for traffic,
such as avoiding data packet loss and providing low latency [1], which further
brings challenges to the network. More importantly, the data center network is
equipped with a large number of terminal stations, which makes the network
susceptible to link congestion. For example, frequent congestion of the output
link of the core switch will cause the loss and delay of a large number of data
packets, thereby greatly reducing the throughput of the entire data center
system. In order to achieve high reliability, the data center network chooses to
use fiber channel. However, due to the high cost of the technology, the IEEE
802.1 Standards Committee decided to use Ethernet as the data center network
infrastructure [23, 24].
4
Although Ethernet has been invented for many years, it still lacks a congestion
control mechanism. TCP is used to provide reliability for Ethernet. It infers the
occurrence of congestion by detecting the drops of data packets, but the
retransmission mechanism of data packets leads to an increase in delay [25].
At the same time, the reliability of some applications using UDP cannot be
guaranteed. Therefore, the Data Center Ethernet must provide a congestion
control mechanism at the data link layer. According to the network topology,
each frame usually needs to pass through multiple intermediate switches [1].
The output queue of the intermediate switches grows in length when network
congestion occurs and discards frames when the length exceeds a certain value
defined. When the downstream switch lacks sufficient resource buffers to
receive frames, it is necessary to use a congestion control mechanism, which
can reduce the loss of the frames in the network.
To support these congestion control technique, researchers have developed
some approaches, such as Backward Congestion Notification mechanism
(BCN) [13,14], Forward Explicit Congestion Notification mechanism (FECN)
[15,27], Enhanced Forward Explicit Congestion Notification mechanism (EFECN) [16] and Quantized Congestion Notification mechanism (QCN) [17].
Among them, BCN and QCN perform flow control by sending a feedback
message to the upstream device. However, FECN and E-FECN send
congestion messages to end stations for flow control.
3. Backward Congestion Notification Mechanism (BCN)
Bergamasco firstly proposed the backward congestion notification mechanism
(BCN) at Cisco [13, 14]. It is also called the Ethernet Congestion Management
(ECM). BCN is a rate-based closed-loop feedback control mechanism. As
shown in Fig. 1, the sources are equipped with rate regulators, which are used
to adjust the rate of individual flows. The core switch in which the congestion
detector is integrated determines whether there is congestion by monitoring
the length of its output queue which indicates the buffer utilization. When
congestion happens, the core switch sends a BCN feedback message, which
requires the source to adjust the rate, to the sources. The sources update the
rate of its regulator according to the BCN feedback message received from
core switch.
There are mainly three processes in Backward Congestion Notification
mechanism: Congestion Detection, Congestion Signaling and Source
Reaction.
A. Congestion Detection
BCN has congestion detection by checking the buffer utilization level which is
measured by two thresholds Qeq and Qsc. Qeq indicates equilibrium queue
length. Qsc indicates severe congestion queue length [26]. The core switch
samples the incoming packets with probability Pm. After sampling, the switch
5
calculates the congestion measure 𝑒𝑖 to determine the congestion level on the
link. The core switch sends different BCN feedback messages according to
different congestion levels.
The congestion measure 𝑒𝑖 is calculated as following:
𝑒𝑖 = π‘žπ‘œπ‘“π‘“ (𝑑) − Wπ‘žπ‘‘π‘’π‘™π‘‘π‘Ž (𝑑) = (π‘„π‘’π‘ž − π‘ž(𝑑)) − π‘Š(π‘žπ‘Ž− π‘žπ‘ )
Here, π‘žπ‘œπ‘“π‘“ (𝑑)is the queue offset at a certain moment defined as:
π‘žπ‘œπ‘“π‘“ (𝑑) = π‘ž(𝑑) − π‘„π‘’π‘ž
W is a non-negative constant weight, π‘žπ‘‘π‘’π‘™π‘‘π‘Ž is the change in queue length
compared to the previous sample and is defined as the difference between the
queue length at this moment π‘žπ‘Ž and the queue length at the last sampling π‘žπ‘ . If
𝑒𝑖 is positive, the sources should increase their rates, otherwise, the sources
decrease their rates.
Fig -1: Backward Congestion Notification (BCN)
B. Congestion Signaling
BCN has two types of signals: BCN normal messages, and BCN STOP
messages. The congestion measure 𝑒𝑖 and the ID for the switch (CPID) are
included in the BCN normal message. It is noted that the source tags each
packet that passes through it after receiving the BCN message. The different
BCN messages, which uses the 802.1Q tag format [13], are generated as
follows:
If the packet is not tagged by the source:
ο‚· If q(t)<Qeq, the core switch will not send BCN message.
ο‚· If Qeq<q(t)≤Qsc, the core switch sends a normal BCN message.
ο‚· If q(t)> Qsc, the switch sends a BCN STOP message.
If the packet is tagged by the source:
ο‚· If q(t)<Qeq and the CPID field of the packet matches the CPID of the
core switch, the switch sends a positive BCN message.
ο‚· If Qeq<q(t)≤ Qsc, the core switch sends a normal BCN message.
ο‚· If q(t)> Qsc, the core switch sends a BCN STOP message.
The source sets the rate of the rate regulator to 0 and continues for any period
of time after receiving the BCN STOP message and then sets the rate to C/K
to restore the rate, where K is the number of flows in the network and C is the
capacity of the bottleneck link.
6
C. Source Reaction
When the source receives a normal BCN message, it calculates the new rate by
using 𝑒𝑖 and adapts the settings for the rate regulator. These adjustments are
defined by the Additive Increase Multiplicative Decrease (AIMD) algorithm.
4. Forward Explicit Congestion Notification Mechanism (FECN)
Forward Explicit Congestion Notification mechanism (FECN) is a closed-loop
explicit rate feedback control scheme [15, 27]. As shown in Fig. 2, the sources
use the rate-based load sensor to periodically detect congestion on the path
from the source to the destination. For this purpose, the sources periodically
send the probe message which contains the rate field. During the transmission
of the probe message from the source to the destination, each switch on the
path checks the rate field of the probe message. If the value of the rate field is
greater than the available bandwidth, the switch modifies the value and
advertises the rate to all flows which pass through the switch, so all flows are
treated fairly in the FECN mechanism. When the probe message finally
returns from the destination to the source, the value in its rate field is the exact
rate that the flow should follow. The rate regulator of the source updates the
sending rate according to this value. As the source adjusts the rate by using an
explicit rate, the FECN mechanism has a slow response and depends on the
round trip time (RTT). It is noted that Rate Discovery Tag (RDTag) is
included in the probing message.
Fig -2: Forward Explicit Congestion Notification (FECN)
5. Enhanced Forward Explicit Congestion Notification Mechanism (E-FECN)
Enhanced Forward Explicit Congestion Notification (E-FECN) mechanism is
a hybrid notification congestion management scheme [16]. As shown in Fig. 3,
E-FECN requires not only the normal probing message of FECN, but also the
BCN feedback message of BCN. When severe congestion happens, that is, the
queue length is greater than the congestion severe threshold (Qsc), the switch
sends BCN messages to the source directly [28]. This BCN message requires
the rate regulator at the source to reduce to a low initial rate.
7
The basic algorithm is as follows:
At the sources:
a. The sources start at full rate, and the rate drops after receiving BCN
messages or FECN messages.
b. The rate regulators limit the rates of some congested flow.
C. If the source receives the FECN message, flows are set to the proper
rates as indicated in the FECN message.
At the switches:
a. E-FECN behaves the same operations as those in FECN.
b. The switches monitor the length of the queue.
c. If the queue length becomes greater than Qsc, which is the severe
congestion threshold, the BCN message is sent back to the source. In order to
maintain the rate consistency, the switch sets the advertised rate, which is sent
in FECN message, for all sources to the minimum value.
Fig -3: Enhanced Forward Explicit Congestion Notification (E-FECN)
6. Quantized Congestion Notification Mechanism (QCN)
Quantized Congestion Notification mechanism (QCN) is very similar to BCN
mechanism. It is also a closed-loop rate based congestion control scheme. The
basic principle of the QCN is to dynamically update the rate of flow according
to the state of the network. When network congestion occurs, the switch
(congestion point) sends the feedback message containing congestion
information to the source (reaction point) [18]. This scheme is mainly
composed of two parts: Congestion Point and Reaction Point.
As shown in the Fig. 4, congestion point detects congestion by checking the
output queue of switch. The purpose is to keep the size of the queue at a
balanced value Qeq. The congested point sends a congestion signal to the
source by sending a feedback message reflecting the congestion level at that
point. Reaction point is used to reduce the rate of flow after receiving the
feedback message from the congestion point. Unlike the BCN, it increases the
rate when it detects additional available bandwidth or needs to recover the lost
bandwidth. In fact, the QCN congestion point algorithm is almost the same as
the BCN. The only difference is that when there is no congestion in the
network, the congestion point in QCN will not send positive feedback to the
source, that is, the reaction point needs to evaluate by itself to increase the
rate.
8
The reaction point of QCN increases the rate through three steps: Fast
Recovery (FR), Active Increase (AI) and Hyper Active Increase (HAI). FR is
used to quickly recover the last reduced rate of the source's rate regulator. AI
is mainly for detecting additional bandwidth. HAI occurs when multiple
sessions at a congestion point are closed and the bandwidth released by them
can be consumed by other sessions.
Fig -4: Quantized Congestion Notification (QCN)
7. Performance Analysis and Discussion
We discuss and analyze the advantages and disadvantages of BCN, FECN, EFECN, and QCN congestion notification mechanisms in the following aspects.
1. Fairness
BCN and QCN only send the feedback message to the source of the sampled
packet, so they achieve only proportional fairness. However, since the
switches in FECN and E-FECN set all flows passing through it to the same
rate, and all sources have received the same feedback, FECN and E-FECN can
reach a perfect fair state.
2. Feedback Control
BCN and QCN use backward feedback control, that is, the core switch sends
feedback messages to upstream devices. FECN uses forward feedback control,
that is, the source sends probe messages to the core switch. In addition to
using forward feedback control, E-FECN notifies the source to perform flow
control when the core switch is severely congested.
3. Rate of Convergence to Fair State
BCN converges to a fair state very slowly, because the source of BCN
calculates a new rate based on the AIMD algorithm which can only achieve
fairness in a long-term sense. In contrast, as all sources get the same feedback,
FECN and E-FECN can reach a perfect fair state after a few round-trip times.
4. Overhead
The overhead of BCN is not only expensive but also unpredictable. Although
the overhead of QCN is also unpredictable, it is moderate compared to the
overhead of BCN, because QCN only sends negative feedback messages to the
source to reduce the sending rate. The overhead of FECN is low and
predictable, because the payload of the FECN feedback message is only about
20 bytes and the FECN message is sent every 1 millisecond. The overhead of
E-FECN is larger than that of FECN, because E-FECN needs to send BCN
message to the source compared to FECN.
9
5. Congestion Regulation
The rate of the source rate regulator in BCN and QCN can be adjusted quickly,
because feedback messages are sent directly from the congested point to the
source once network congestion occurs. The adjustment of source rate in
FECN is very slow because the probe message sent by the source must take a
round trip time before it can return to the source. The adjustment of source rate
in E-FECN is medium, because it adds BCN feedback message on the basis of
FECN, that is, when severe congestion occurs, feedback message is sent
directly from the congestion point to the source.
6. Throughput Oscillation
The oscillations in throughput of BCN are large. The throughput oscillation of
QCN is improved compared to that of BCN, because the source rate increase
is determined by byte counter and rate increase timer together. However, the
oscillations in throughput of FECN and E-FECN are small.
7. Fast Start
BCN, E-FECN, and QCN all support a fast start. The source is initialized at
full rate. After receiving the negative feedback message transmitted by the
congested point, the source's rate will decrease. In FECN, the source starts at a
lower rate and gradually moves to the equilibrium rate as a series of probe
messages return.
9. Link Disconnection
If the link in the network is suddenly disconnected, BCN, E-FECN and QCN
can use feedback message to notify the source to reduce or stop the
transmission of data packets. There is a lack of feedback messages in FECN,
and the probe messages may not be able to return to the source due to the
failure of the network link, so the source will maintain the sending rate,
resulting in packet loss.
8. Conclusion and Future Work
With the demand for high reliability and low latency in data centers, providing
a congestion control mechanism has become an urgent problem to be solved in
the Data Center Ethernet. This paper studies the congestion control problem of
the Data Center Ethernet. Backward congestion control, forward congestion
control and hybrid congestion control are currently the three main types of
technologies. We discussed and analyzed the advantages and disadvantages of
the four congestion notification mechanisms: BCN, FCN, E-FECN and QCN.
In BCN, the source updates the rate of its regulator according to the BCN
feedback message sent by the core switch to perform congestion control. In
FECN, the rate of the probe message periodically sent by the source is
modified on the way from the source to the destination. When the source
receives the probe message returned from the target, it updates the sending rate
according to the instructions in the probe message. E-FECN combines BCN
and FECN. When severe congestion happens, the switch sends BCN messages
to the source directly. QCN is very similar to the BCN. It also dynamically
adjusts the flow according to the network status. Unlike the BCN, when there
10
is no congestion in the network, the congestion point in QCN will not notify
the source to increase the sending rate.
Since QCN samples data packets randomly at the congestion point, and
only sends feedback messages to a source that is considered to cause
congestion, which prevents the convergence of the fairness of the sending rate.
So we may solve this problem through the following steps. We can first find
out the flows whose sending rate is greater than the fair share, then monitor
each flow, and finally feedback the congestion message of each flow to
themselves to ensure fairness.
References
[1] Mliki, Hela, Lamia Chaari, and Lotfi Kamoun. "A comprehensive survey
on carrier ethernet congestion management mechanism." Journal of
Network and Computer Applications 47 (2015): 107-130.
[2] Khurshid, Waqar, et al. "Comparative study of congestion notification
techniques for hop-by-hop-based flow control in data centre Ethernet."
IET Networks 7.4 (2018): 248-257.
[3] Zhang, Yan, and Nirwan Ansari. "On architecture design, congestion
notification, TCP incast and power consumption in data centers." IEEE
Communications Surveys & Tutorials 15.1 (2012): 39-64.
[4] Lee, Duke, et al. "FLORAX-flow-rate based hop by hop backpressure
control for IEEE 802.3 x." 5th IEEE International Conference on High
Speed Networks and Multimedia Communication (Cat. No. 02EX612).
IEEE, 2002.
[5] Wechta, Jerzy, Martin Fricker, and Fred Halsall. "Hop-by-hop flow control
as a method to improve QoS in 802.3 LANs." 1999 Seventh International
Workshop on Quality of Service. IWQoS'99.(Cat. No. 98EX354). IEEE,
1999.
[6] DeSanti, Claudio. "IEEE 802.1: 802.1 Qbb-Priority-based Flow Control."
2011.
[7] Cisco White Papers: "Priority Flow Control: Build Reliable Layer 2
Infrastructure." CISCO, 2010.
[8] IEEE: "IEEE p802.1Qbb/D2.3, May 2010 (DRAFT Amendment to IEEE
Std 802.1Q-2005). " 2010, pp. 1–40.
[9] Malhotra, Richa, et al. "Modeling the interaction of IEEE 802.3 x hop-byhop flow control and TCP end-to-end flow control." Next Generation
Internet Networks, 2005. IEEE, 2005.
[10] de los Santos, G. Rodriguez, et al. "Buffer design under bursty traffic with
applications in FCoE storage area networks." IEEE communications
letters 17.2 (2013): 413-416.
[11] Feuser, Oliver, and Andre Wenzel. "On the effects of the ieee 802.3 x
flow control in full-duplex ethernet lans." Proceedings 24th Conference
on Local Computer Networks. LCN'99. IEEE, 1999.
11
[12] Pawar, Satish D., and Pallavi V. Kulkarni. "A survey on congestion
notification algorithm in data centers." International Journal of Computer
Applications 975 (2014): 8887.
[13] Bergamasco, Davide. "Data center ethernet congestion management:
Backward congestion notification." IEEE 802.1 Meeting. 2005.
[14] Bergamasco, Davide, and Rong Pan. "Backward congestion notification
version 2.0." IEEE 802.1 Meeting. 2005.
[15] Jiang, Jinjing, Raj Jain, and Chakchai So-In. "An explicit rate control
framework for lossless ethernet operation." 2008 ieee international
conference on communications. IEEE, 2008.
[16] So-In, Chakchai, Raj Jain, and Jinjing Jiang. "Enhanced forward explicit
congestion notification (E-FECN) scheme for datacenter Ethernet
networks." 2008 international symposium on performance evaluation of
computer and telecommunication systems. IEEE, 2008.
[17] Mliki, Hela, Lamia Chaari, and Lotfi Kamoun. "Modelling and evaluation
of QCN using colored petri nets." Peer-to-Peer Networking and
Applications 11.3 (2018): 486-503.
[18] Devkota, Prajjwal, and AL Narasimha Reddy. "Performance of quantized
congestion notification in TCP incast scenarios of data centers." 2010
IEEE International Symposium on Modeling, Analysis and Simulation of
Computer and Telecommunication Systems. IEEE, 2010.
[19] Zhang, Yan, and Nirwan Ansari. "On architecture design, congestion
notification, TCP incast and power consumption in data centers." IEEE
Communications Surveys & Tutorials 15.1 (2012): 39-64.
[20] Zhang, Yan. "Congestion control, energy efficiency and virtual machine
placement for data centers." (2014).
[21] Li, Weihe, et al. "Survey on Traffic Management in Data Center Network:
From Link Layer to Application Layer." IEEE Access 9 (2021): 3842738456.
[22] Xia, Wenfeng, et al. "A survey on data center networking (DCN):
Infrastructure and operations." IEEE communications surveys & tutorials
19.1 (2016): 640-656.
[23] O‫׳‬Hanlon R, Data center Ethernet, TERENA workshop, 2006.
[24] Cisco System Inc, Data Center Ethernet (DCE), www.cisco.com/go,
2008.
[25] Gusat, Mitchell, Cyriel Minkenberg, and Gergely János Paljak. "Flow and
congestion control for datacenter networks." RZ3742 2009 (2009).
[26] Jiang, J., and Raj Jain. "Simulation Modelling of BCN V2. 0 Phase 1:
Model Validation." IEEE 802.1 Congestion Group Meeting. 2006.
[27] Jiang J, Jain R, So-In C. "Forward explicit congestion notification
(FECN) for datacenter Ethernet networks." IEEE 802.1au congestion
notification group meeting, 2007.
12
[28] Joglekar, Rohit P., and P. Game. "Managing congestion in data center
network using congestion notification algorithms." International Research
Journal of Engineering and Technology (IRJET) 3.5 (2016): 195-200.
Download