COM 15 – LS 435 – E INTERNATIONAL TELECOMMUNICATION UNION TELECOMMUNICATION STANDARDIZATION SECTOR English only STUDY PERIOD 2009-2012 Question(s): Original: English 9/15 LIAISON STATEMENT Source: ITU-T Study Group 15 Title: Recommendation ITU-T G.8131/Y.1382 revision – Linear protection switching for MPLS-TP networks LIAISON STATEMENT For action to: IETF MPLS WG For comment to: - For information to: Approval: Agreed to at Study Group 15 meeting (Geneva, 10-21 September 2012) Deadline: 31 December 2012 Contact: Ghani Abbas Ericsson UK Tel: +44 7710 370 367 Email: Ghani.Abbas@ericsson.com Thank you for the reply liaison ‘Reply to ITU Liaison Statement regarding MPLS-TP Linear Protection(IETF MPLSLS77-E). We have discussed the responses in your liaison regarding RFC6378 ‘MPLS Transport Profile (MPLS-TP) Linear Protection’. We also have identified additional issues, based on the technical background of linear protection as defined in G.808.1 ‘Generic protection switching - Linear trail and subnetwork protection’ and G.8031 ‘Ethernet linear protection switching’. This liaison provides feedback on your liaison response and list additional issues that have been identified. The numbers used below on this liaison correspond to those used in the reply liaison and new numbers are added as new issues. For convenience again, below we refer to RFC 6378 as PSC and G.808.1/G.8031 as APS. However, any decision has not been reached on the format of the protection switching messages at this meeting. We request that you review the following issues and provide a solution to each issue. We look forward to working with you to complete the Recommendation. 1. This is to clarify an issue with your reply to the priority of FS and SF-P. Three issues that need to be solved in PSC have been identified in terms of minimizing service impacts. a. At first, working path(WP) and protection path(PP) are normal. Then, Forced Switch(FS) command is issued for maintenance on the WP and the traffic moves from WP to PP. When Signal Fail occurs on PP, service cannot recover and is interrupted. This could occur for example as a result of accidentally un-plugging a PP fiber. Attention: Some or all of the material attached to this liaison statement may be subject to ITU copyright. In such a case this will be indicated in the individual document. Such a copyright does not prevent the use of the material for its intended purpose, but it prevents the reproduction of all or part of it in a publication without the authorization of ITU. -2COM 15 – LS 435 – E b. If there is an existing signal fail on a protection path (SF-P),and FS command is issued by accident the traffic on WP will move to PP. This results in an interruption of service from which you will not automatically recover, because PSC should not have switched the traffic from WP to PP. c. Another issue with the priority of FS and SF-P is identified in Annex 1 of this liaison. It was noted that the operational ways shown above which are common in APS are very important in transport networks. Further analysis on RFC4427 revealed that there is a discrepancy between the RFC and G.808.1/G.841 concerning the description of ‘Forced switch-over for normal traffic’. It is noted that G.841 and G.808.1 are listed as references in RFC4427. 2. Two issues are identified in Annex 2 of this liaison. In summary, these issues are: a. In your liaison response, there was a claim that local status is supplied at all times. Figure 1 in Annex 2 shows that there is a case that local status is not correctly reported. b. As a result of this incorrect reporting of status, an example of sequence of events (as shown in Figure 2 in Annex 2) can result in unexpected protection switching behaviour, as described in the Annex 2. 3. After the review of Appendix B of RFC6378, it was identified again that an exercise operation is incorrectly supported in PSC and there seems a misunderstanding of exercise operation in transport networks. The objective of exercise (EXER) operation is to test if the APS communication is operating correctly, in other words both APS process logic including state machine and APS channel on protection path, without service disruption and without affecting any protection operation, unless the protection transport entity is in use. The functional requirement is described in 11.14 of G.8031. Some operators periodically (ex. once a week or once a month) send a command and check the normality of the APS communication. In the Appendix B of RFC6378, the Exercise operation is described and utilizes the Lockout of Protection (LO) or Forced Switch (FS) in combination of OAM functionalities. However, we don’t think that the scenario fulfills the purpose of the EXER operation. Moreover, the exercise operation described in RFC6378 has a potential risk of losing traffic as a signal failure might occur during the exercise operation. In that case, LO or FS has to be canceled to allow the PSC protocol to provide proper switching. Solution provided Appendix B of RFC 6378 doesn’t seem to satisfy requirement 84 of RFC5654. 4. The concerns on the difference of priority between local and remote triggers are related to the signal degrade (SD) operation, possibly MS also. As we see from RFC6378, there is no clear description on operation of SD, or support of MS to working path command. Under multiple failure conditions depending on the sequence of the failures, there can be some operational and/or technical issues due to the difference of priority between local and remote triggers. In order to continue any further analysis on the issue related to the difference of priority between local and remote triggers, we would like to know your plan to support SD and MS to working path. MS to working path has to be supported to be able to initially align at both sides in case of nonrevertive switching mode. ITU-T\COM-T\COM15\LS\435E.DOC -3COM 15 – LS 435 – E 5. No specific issue has been identified so far. We understand that Clear Signal Fail (SFc) in local request logic described in section 3.1. is used to trigger the WTR in section 4.3.3.5 of RFC 6378. Is this understanding correct? 6. In ITU-T SG15, a Recommendation for linear protection (G.8131/Y.1382, as an example) needs to define the reporting of failures of protocol so that corrective action could be taken. The detection and the alarm report of failures of the linear protection protocol are not specified in PSC. Specific failure of protocol that should be reported is specified in the section 11.15 of G.8031 and in section 6.2.7 of G.806. It is requested to specify the reporting of failures of protocol. 7. Both APS and PSC allow local configuration of a number of similar parameters that control the operation of the protection scheme. Normally these parameters are set to the same values at each end of the link and the protection scheme operates as intended. Provisioning mismatch (G.8031 – Sec 11.4) refers to the situation where, as an example, one end of a link is configured for revertive switching and the other is configured for non revertive switching. Mismatch behavior is described for APS in section 11.4 of G.8031 as follows: a. Bridge Type: If the Bridge Type configured at each end of the service is not the same, a Failure of Protocol (FOP) is declared. b. Bi-Directional verses Uni-Directional Protection: in case of mismatch, the service operates in Uni-Directional Protection Switching mode. c. Revertive verses Non-Revertive Protection: in case of mismatch, the service operates in Revertive Protection Switching mode. Please specify the expected behavior of the protection scheme in the event of mismatch. We recognize that RFC6378 has language describing notification to management systems of configuration inconsistencies but we do not see any specification of the protection scheme’s behavior in such circumstances. 8. This is a new item which is described in Annex 3. When two end nodes are operated in a revertive mode in the PSC protocol, there is a case that the reversion doesn’t work appropriately, when SF-W is cleared at two end nodes simultaneously. The details are shown in Annex 3 of this liaison. We would like to request to solve this issue that is shown in Annex 3. It was noted that if WTR timer is initiated by remote request, this issue could be solved. However, it is not clarified in the current version of RFC6378. It was also noted that if WTR timer can be started by a remote request, it could cause another issue. We would like to have the clarification on whether WTR timer can be started by a remote request, such as a WTR message, or not. 9. "Manual switch-over for recovery LSP/span" and "Freeze" seem not to be implemented in PSC. As some operators currently use them in their operational processes it is requested that PSC include such functionalities. It was also noted that such external commands are considered to be mandatory in RFC 5654, requirement 83. ITU-T\COM-T\COM15\LS\435E.DOC -4COM 15 – LS 435 – E Annex 1 Figure 1 depicts an example of technical issues in PSC linear protection. - In Normal state, the end point transmits NR (0,0) messages. - When a local Forced Switch command is issued on node B, node B goes into local Protecting Administrative state (PA:F:L) and begins transmission of an FS (1,1) messages. - A remote Forced Switch message causes node A to go into remote Protecting Administrative state (PA:F:R), and node A begins transmitting NR (0,1) messages. - When node A detects a unidirectional Signal Fail on the protection path, node A keeps sending NR (0,1) message because SF-P is ignored under the state PA:F:R. - When a local Clear command is issued on node B, node B clears the local Forced Switch command, goes into Normal state and begins transmission of NR (0,0) messages. - But node A cannot receive PSC message because of local unidirectional Signal Fail indication on the protection path. Because no valid PSC message is received, over a period of several continual messages intervals, the last valid received message remains applicable and the node A continue to transmit an NR (0,1) message in the state of PA:F:R. ITU-T\COM-T\COM15\LS\435E.DOC -5COM 15 – LS 435 – E Node A Node B N N NR 0,0 NR 0,0 Local(FS) PA:F:R Local(SF: Protection path) PA:F:R NR 0,0 FS 1,1 NR 0,1 FS 1,1 FS 1,1 NR 0,1 PA:F:L No Local SF(Protection path) FS 1,1 NR 0,1 Clear NR 0,1 NR 0,0 NR 0,1 NR 0,0 N Figure 1 – Example of technical deficiency in PCS linear protection Now, there exists a mismatch between the bridge/selector positions of node A (transmitting an NR (0,1)) and node B (transmitting an NR (0,0)). It results in out-of-service even when there is neither signal fail on working path nor FS. The main cause of the problem seems to come from the fact that the operator command has a higher priority than SF-P, where the protocol message for the operator command itself cannot be seen on the other end. ITU-T\COM-T\COM15\LS\435E.DOC -6COM 15 – LS 435 – E Annex 2 One of the differences between the APS and PSC protocols is the setting of the “Request/state” field in the protocol message. In the APS protocol, if the priority of the local request is lower than that of the remote request received from a far-end node, then the local request is not sent to the far-end node. In this case, the “Request/state” field in the APS message is always filled with NR, DNR or RR. This is a consistent behaviour, i.e. this principle is applied in any situation. However, in the PSC protocol, the “Request” field in the PSC message reflects the local request even when the priority of the local request is lower than that of the remote request received from a far-end node for some conditions. For example, if a near-end node detects a SF on a working path (SF-W) while it is in the “Protecting Administrative (PA)” state due to a remote Forced Switch (FS) command issued at the far-end node, the near-end node reflects its current request (SF-W) in the “Request” field of the PSC message and starts to send SF (1,1) to the far-end node. However, if the near-end node detects a SF on protection path (SF-P) instead of SF-W in the above example, the near-end node does not reflect its current request (SF-P) in the “Request” field of the PSC message and keeps sending NR (0,1) as if there is no SF-P detected. This inconsistent definition of the protocol can cause the “Request” field of the PSC message to contain incorrect information as shown in Figure 1 and results in an unintended situation. Node A Node B NR00 Node A NR00 N N FS11 NR01 PA:F:R Node B NR00 NR00 N N FS FS11 PA:F:L SF-W NR01 SF11 FS11 FS11 SF-W SF-P NR01 SF11 FS11 SF-W Clear PA:F:L NR01 PA:F:R SF-P FS FS11 SF-W Clear NR01 SF11 FS11 FS11 (a) (b) Figure 1 Figure 1(a) is the case where the PSC protocol operates as intended, and Figure 1(b) is the problematic case. The only difference between the two cases is the order in which SF-W and SF-P are detected at a node in PA:F:R state (Protecting Administrative state due to a remote FS). In Figure 1(a), if node A detects SF-P while it is receiving FS (1,1) from node B, it ignores the detected SF-P and keeps sending NR (0,1). After that, if node A additionally detects SF-W, the detected SF-W does not enter to the PSC control logic because the priority of SF-W is lower than that of SF-P, and node A keeps sending NR (0,1). Likewise, the SF-W clear event also does not enter the PSC control logic, and node A keeps sending the same message, NR (0,1). ITU-T\COM-T\COM15\LS\435E.DOC -7COM 15 – LS 435 – E On the other hand, in Figure 1(b), if node A detects SF-W first while it is receiving FS (1,1) from node B, the local request (SF-W) is reflected in the “Request” field of the PSC message, and node A starts to send SF (1,1) to node B. After that, if node A additionally detects SF-P, the detected SF-P becomes the highest local request and enters to the PSC control logic. However, the detected SF-P is ignored because node A is in PA:F:R state, and node A keeps sending the current PSC message, SF (1,1). Later, if SF-W is cleared, it does not trigger any action and node A keeps sending the same PSC message, SF (1,1). This means that node A is wrongly reporting SF-W through the “Request” field in the PSC message although there is no failure on the working path. The mismatch between the “Request” field in the PSC message and the real local request can cause the protocol to operate incorrectly. Figure 2 shows a scenario after the case shown in Figure 1(b). Node A Node B NR00 NR00 N N FS11 FS PA:F:L NR01 PA:F:R SF-W SF11 FS11 SF-P SF11 FS11 SF-W Clear SF11 FS11 NR01 SF11 SF-P Clear Clear (N) PF:W:R NR01 NR01 NR01 NR01 Figure 2 If an operator wants to resume the protection operation after SF-W is cleared at node A, he/she should issue a Clear command at node B to cancel FS command. The expected behaviour in this case is that both nodes goes into “Unavailable” state due to SF-P at node A and the normal traffic signal should be switched to the working path. However, the current PSC protocol does not operate as expected. In Figure 2, if a Clear command is issued at node B, node B goes into Normal state as an intermediate state. Since the previous request received from node A was SF (1,1), node B goes into “Protecting Failure (PF)” state due to a remote SF-W (PF:W:R), and starts to send NR (0,1) to node A. ITU-T\COM-T\COM15\LS\435E.DOC -8COM 15 – LS 435 – E The NR (0,1) is not delivered to node A because the protection path from node B to node A has failed. Later, if SF-P is cleared at node A, the SF clear event enters to the PSC control logic and node A starts to send NR (0,1). According to the PSC protocol, the received NR (0,1) message is ignored when a node is in either PA:F:R or PF:W:R state. This means that both node A and B are sending NR (0,1) messages but remain on the protection path. It is the expected and correct behaviour that in the revertive mode of operation, the normal traffic should be on the working path when there is no failure on the working path and no command requests to switch to the protection path.. ITU-T\COM-T\COM15\LS\435E.DOC -9COM 15 – LS 435 – E Annex 3 Let us assume that two end nodes are operated in revertive mode and are experiencing a bidirectional signal fail on working path (SF-W). When two end nodes clear the SF-W simultaneously right after transmitting SF(1,1) messages, each node transmits WTR(0,1) message and enters in the WTR state. Due to the propagation delay from the remote node to the local node, the SF(1,1) message transmitted from the remote node just before the Clear SF-W event can arrive while the local node is in WTR state. When SF(1,1) message is arrived, the local node stops the WTR timer, transmits NR(0,1) message, and enters in a remote SF-W state, which is defined as (PF:W:R) state in RFC6378. When WTR(0,1) message is arrived, according to the state transition defined in RFC6384, the local node make transition to WTR state and continue to send the current messages, which is NR(0,1) messages. Now, the WTR timer has been cancelled, each node is in WTR state and keep sending NR(0,1) messages. The reversion is failed. The sequence diagram of the aforementioned scenario is depicted in Figure 1. West Clear SF Go to WTR state, send WTR Go to PF:W:R state, stop WTR timer, send NR 0,1 East Stable SF state SF 1,1 SF 1,1 SF 1,1 SF 1,1 WTR 0,1 WTR 0,1 Clear SF Go to WTR state, send WTR WTR 0,1 WTR 0,1 Go to PF:W:R state, stop WTR timer, send NR 0,1 Go to WTR state Send NR 0,1 Go to WTR state, send NR 0,1 NR 0,1 NR 0,1 NR 0,1 NR 0,1 Reversion Fails Figure 1. Reversion fails on simultaneous clear SFs The “simultaneous clear SFs” in this Annex means the two clear SF events at both ends occur within the propagation delay of the SF message from one end to the other end. It does not mean the two clear SF events occur exactly at the same instance. _________________ ITU-T\COM-T\COM15\LS\435E.DOC