3G Cell Optimisation INTRODUCTION Objective Statistical RNC/RXI/RBS counters reflect the performance of a 3G cell within the network. Specific counters are analysed to detect various faults. The objective of the procedure is therefore to trace any network-related faults to the source of the problem by analysing the fault symptoms evident in the counter values. Scope This procedure identifies and recommends solutions for network faults identified through the analysis of RNC/RXI/RBS statistical counters. References Additional references to this procedure are as follows: ALEX Libraries: o Radio Network Controller (RNC) 3810 (CXP 901 2011 RXX) o RXI 820 ATM R4.1 (CXP 901 102/3 RXX) o Radio Base Station (RBS) 3202/3206/3402/3412 (CXP 901 0811/X RXX) o WCDMA RAN (CXS 101 06/4 RXX) TEMS User’s Manual For RNC/RXI/RBS counter descriptions refer to the Performance Statistics document within the relevant RNC/RXI/RBS ALEX Library. PROCEDURE ANALYSING COUNTERS 1. Available on an RNC, RXI and RBS basis, the FACTS tool is available to interface to and represent the relevant counters. Counters are collected on a quarterly (15 minute) basis (known as a reporting period) and are stored from the operational date of the cell thereby allowing for past analysis. 2. There are numerous counters available from an RNC/RXI/RBS. However, this procedure concentrates on counters reflecting the critical performance of the cells. These counters (and formulae derived there from) are best analysed graphically through the use of FACTS. The formulae used for statistics such as DCR and CSSR may be obtained within the FACTS. 3. Both the NMC and the Planning & Optimization Engineer are responsible for monitoring counters. The NMC has the responsibility of maintaining the active status of all cells and therefore must act in accordance to all such related counters. The Planning and Optimization Engineer monitors and acts on counters reflecting the cell’s active performance. 4. It is possible to configure alarms to be generated for counters exceeding specific values. These alarms would then be monitored by the NMC. 5. For the Radio Planning & Optimisation Engineer the focus is on maintaining adequate cell performance in terms of Accessibility (call setup analysis), Retainability (drop call analysis) and Integrity (speech quality/video quality/packet throughput analysis) ACCESSIBILITY 6. If a cell has poor accessibility it is typically due to either some form of congestion or a hardware/software fault or a misconfiguration. It is also possible that there is some external source of interference (such as a microwave link on the same frequency) affecting the accessibility. 7. Accessibility should be monitored independently for the different RAB types (e.g. Speech, CS Video, PS Interactive R99, PS Interactive HSDPA, etc.) as in certain situations only one of the RAB types will be affected. For example, a disabled HS-TXB will affect the accessibility of the PS Interactive HSDPA RAB, but if the RBS also has a TXB (non-HS) installed then the other RABs may continue to have an acceptable accessibility. 8. When a low CSSR is detected on a cell the first thing to check is if Admission Control is rejecting the RRC/RAB setup attempt (pmNoReqDeniedAdm) or if it is failing after admission (pmNoFailedAfterAdm). For high pmNoReqDeniedAdm refer to the “Admission Control” sections below. For high pmNoFailedAfterAdm refer to the “Failure After Admission” sections below. Example: FACTS Report showing a low CSSR Speech caused by a high pmNoReqDeniedAdm. Note that pmNoReqDeniedAdm is not RAB specific so other RABs will most likely be affected in this case too. Admission Control: DL Power 9. If Admission Control rejects a RAB establishment due to a lack of DL power then the counter pmNoFailedRabEstAttemptLackDlPwr is incremented. Check that the feeder losses are configured correctly in the RBS and that the parameter maximumTransmissionPower is set correctly (typically to maxDlPowerCapability minus 0.2dBm). The value of parameter pwrAdm should also be verified (typically set to 75%). Also, check for MCPA alarms i.e. sometimes RBS 3202 sites with high feeder losses are configured with two MCPAs per sector and one of the MCPAs fails resulting in a large reduction in the maxDlPowerCapability of the sector and, hence, causing a lack of DL power. Long term solutions are to increase the power capability of the sector by adding or upgrading an MCPA (RBS 3203) or RU (RBS 3206), re-engineering the site to reduce feeder lengths, or perhaps to change the RBS type to one using RRUs (RBS 3402 or RBS 3412) if this provides higher power at the reference point. The short term solution is to reduce the traffic carried by the site (See the “Traffic Offload” sections). Example: FACTS Report showing a high no of RAB establishment failures due to Admission Control rejections caused by a lack of DL power. In this situation the site had only one of the two MCPAs in sector 1 functioning correctly causing the DL power congestion. This is shown in the cabinet viewer snapshot below (Red LED on MCPA). There was also an alarm in the RBS for the faulty MCPA. Admission Control: DL Channelisation Codes 10. If Admission Control rejects a RAB establishment due to a lack of DL channelisation codes then the counter pmNoFailedRabEstAttemptLackDlChnlCode is incremented. This will typically affect the PS Interactive R99 (DCH/FACH) CSSR worse than the Speech CSSR as the PS Interactive R99 RAB requires channelisation codes at a lower spreading factor (using more of the code tree). In the P4 software release a cell that supports R99 and HSDPA typically has 5 spreading factor 16 DL channelisation codes reserved for HSDPA. This means that approximately 32% of available codes are reserved for HSDPA. When this is the case it is common for DL channelisation code congestion too occur. Check the setting of parameter dlCodeAdm (typically set to 85% on MTN’s network). The long term solution is to add another cell in the coverage area to take some of the traffic; this may be achieved by introducing a second carrier, another sector, or another site. The short term solution is to reduce the traffic carried by the site (See the “Traffic Offload” sections). Example: FACTS Reports showing a high no of RAB establishment failures due to Admission Control rejections caused by a lack of DL channelisation codes; and the corresponding decrease in CSSR for Packet Interactive. In this case a large portion of the speech calls were already redirected to GSM so the R99 Packet Interactive RAB was worst affected; the required solution is sectorisation of the inbuilding antenna system or implementation of a second carrier frequency. Admission Control: UL/DL ASE 11. If Admission Control rejects a RAB establishment due to a lack of UL or DL air speech equivalent (ASE) then the counters pmNoFailedRabEstAttemptLackUlAse or pmNoFailedRabEstAttemptLackDlAse are incremented. The ASE monitor accounts for the air interface resource usage in a cell (separately for UL and DL) by means of an average static load estimation of each radio link in the cell; for more information refer to the “Capacity Management” document in the “WCDMA RAN” ALEX library. Because a RL’s ASE is an estimation it is possible that in certain situations it is an over-estimation of the load in a cell e.g. for inbuilding cells on a different carrier frequency to the surrounding macro cells. In such situations it is possible to increase the UL/DL ASE admission control limit (parameters aseUlAdm/aseDlAdm) in order to prevent unnecessary admission control rejections of RAB establishments. So, a short term solution to relieve such congestion may be to increase aseUlAdm/aseDlAdm, but the effect on DCR/CSSR should be closely monitored (Note that the aseUlAdm default value on MTN’s network is already less stringent than the Ericsson default). Another short term solution is to reduce the traffic carried by the site (See the “Traffic Offload” sections). The long term solution is to add another cell in the coverage area to take some of the traffic; this may be achieved by introducing a second carrier, another sector, or another site. Example: FACTS Report showing RAB establishment failures due to Admission Control rejections caused by a lack of UL ASE. In this case the UL ASE congestion was minor congestion for a few days so no action was taken. Admission Control: Connection Limits 12. If Admission Control rejects a RAB establishment due to exceeding the configured connection limit for SF 8, SF 16, or SF 32 then the counter pmNoFailedRabEstAttemptExceedConnLimit is incremented. These spreading factors are used by the PS64/384, PS64/128, and PS64/384 RBs so the connection limit blocking typically applies to channel switching between these RBs for an R99 packet interactive RAB. The connection limits are configured by parameters sf8Adm, sf16Adm and sf32Adm. The default settings allow the maximum possible number of RL’s for each spreading factor in which case Admission Control will not block for this reason. Lower settings have been tested (in combination with adjusted Class B QoS settings on the Iub interface) in which case some connection limit rejections were obtained. But this is a special situation and for the purpose of this document such connection limit rejections are not worth further consideration. Admission Control: Hardware Usage (Channel Elements) 13. It is possible for Admission Control to reject a RAB establishment attempt due to insufficient UL or DL RBS hardware capacity i.e. too few channel elements available. The channel element capacity of an RBS may be software limited (according the software license configured for the RBS) or hardware limited (according to the TXBs and RAXBs installed in the RBS). The two parameters that control the RBS hardware admission policy are ulHwAdm and dlHwAdm. By default these parameters should be set to 100% in which case no hardware is reserved for handovers and Admission Control will not block RAB establishment attempts for this reason (see “Failure After Admission: Hardware Usage”). In software revision P4 there is no specific counter to indicate this type of Admission Control rejection, so if pmNoReqDeniedAdm is triggered without any of the other relevant counters indicating a reason then it is likely that this is the cause and that ulHwAdm or dlHwAdm is incorrectly configured to a value below 100%. In the P5 software release there are new counters that indicate when lack of hardware capacity causes RAB establishment failures in a cell: pmNoFailedRabEstAttemptLackDlHw, pmNoFailedRabEstAttemptLackDlHwBest, pmNoFailedRabEstAttemptLackUlHw, pmNoFailedRabEstAttemptLackUlHwBest. Example: FACTS Reports showing RAB establishment failures due to Admission Control with no counter showing the reason (this is for P4). In this case the HW admission limits were suspected and found to be ulHwAdm=70 and dlHwAdm=70 (instead of both being 100). After correcting these settings the Admission Control rejections disappeared and, as can be seen in the second plot below, the Packet CSSR improved. In the third plot below the UL CE Usage is seen to peak around 45 CEs. This RBS had a capacity of 64 UL CEs; 70% of 64 CEs is 44.8 CEs. In otherwords, the UL CE Usage and the Admission Control limit correlate to each other. Failure After Admission: Iub Congestion 14. If a UTRAN cell has a high number of RRC/RAB establishment request failures after being admitted by Admission Control (pmNoFailedAfterAdm), then a common reason would be due to Iub Congestion. When considering the Iub interface it is important to remember that mainly RABs configured to use strict AAL2 QoS settings will be blocked at call setup by AAL2 CAC. Typically the R99 RABs (i.e. all RABs excluding HSDPA and EUL RABs) are configured to use AAL2 QoS class A or class B, with both classes configured to use a strict QoS. HSDPA and EUL will typically use AAL2 QoS class C and class D, with both classes configured to use a best effort QoS. Typically the R99 Packet Interactive RAB will be the first RAB to show signs of AAL2 congestion with a poor Packet Interactive CSSR and corresponding high pmNoFailedAfterAdm. The AAL2 Setup Success Rate statistics from the relevant RXI towards the RBS may then be investigated. This should typically be 99% and above, but if not and the counter pmUnSuccOutConnsLocal indicates that it is local rejections (on the RXI) by CAC, then there is congestion on the Iub interface. Example: FACTS Reports showing high pmNoFailedAfterAdm (1st plot), low CSSR Packet Interactive (2nd plot), and low AAL2 Call Setup Success Rate with corresponding high pmUnSuccOutConnsLocal (3rd plot). From 2006-11-24 the problem disappears. In this case the solution was to activate Directed Retry to GSM and to change the AAL2 QoS class B traffic to use a best effort configuration thereby allowing more PS64/128 and PS64/384 users (as well as ordering a 2nd E1 to the site); note that this RBS did not have HSDPA configured therefore there was no concern about affecting the experience of HS users as described in section “Considerations For HSDPA: Iub Bandwidth”. Failure After Admission: Core Transport Network Congestion 15. Related to the above point (“Failure After Admission: Iub Congestion”) is transport network congestion in links other than the Iub e.g. RNC<->MGW (Iu- cs), RNC<->SGSN (Iu-ps) and inter-MGW links. If this is the case then the CSSR of an entire RNC(s) will deteriorate along with the AAL2 Setup Success Rate for a major link to the RNC. It would then be necessary to look at the link utilisation in order to confirm such link congestion, but that is beyond the scope of this document. Example: FACTS Reports showing poor CSSR Speech for CTRNC1 for two days and then an improvement for the next two days (1 st plot); and the corresponding AAL2 Setup Success Rate for the CTMGW1->RBMGW1 (2nd plot) and RBMGW1->CTMGW1 (3rd plot) links for the same days. The CTMGW1<->RBMGW1 link had a high utilisation (>80%) so the peak cell rate (PCR) for the link was increased resulting in the noticeable improvement. Failure After Admission: Hardware Usage (Channel Elements) 16. A high number of RRC/RAB setup failures after admission (pmNoFailedAfterAdm) could be due to insufficient UL or DL RBS hardware capacity i.e. too few channel elements available. The channel element capacity of an RBS may be software limited (according the software license configured for the RBS) or hardware limited (according to the TXBs and RAXBs installed in the RBS). The two parameters that control the RBS hardware admission policy are ulHwAdm and dlHwAdm. If these parameters are set to a value lower than 100% then Admission Control should block any RRC/RAB setup attempts requiring more than the available channel elements (see “Admission Control: Hardware Usage”); however, by default these parameters should be set to 100% in which case no hardware is reserved for handovers and Admission Control will not block RAB establishment attempts for this reason so the setup attempt fails after admission. The RBS counters pmSetupFailureSfXX in the UplinkBasebandPool (ULSETUPFAILURESSFXX) and pmSetupFailureSfXX in the DownlinkBasebandPool (DLSETUPFAILURESSFXX) indicate RL (at SF XX) setup failures due to a lack of UL and DL hardware capacity. If this is the case then a short term solution may be to reduce the traffic carried by the site (See the “Traffic Offload” sections). The long term solution is to upgrade the UL (RAXB) or DL (TXB) channel element capacity of the site. This may be achieved by swapping the relevant board with that of another site that has more capacity than it requires, or by sourcing a new board. Note that it is possible for these counters to increment even when there should be sufficient channel element capacity (for example due to a software bug in the software revision being used; see “Failure After Admission: Other”) so it is important to compare the channel element usage to the channel element capacity of the RBS to make sure that it makes sense for this to be the root of the problem. Example: FACTS Reports showing poor CSSR Packet Interactive (1st plot); high pmNoFailedAfterAdm (2nd plot); and UL setup failures due to a lack of UL baseband hardware capacity (RAXB). Note that this RBS had 64 UL channel element capacity until 31st August when it was upgraded to 128 UL channel elements. The estimated UL CE Usage peaks above 64 channel elements even before the 31st confirming that RAXB congestion is the source of the problem, and then after the upgrade to 128 channel elements the UL CE Usage starts peaking above 100 indicating how necessary the upgrade was. The improvement to CSSR Packet Interactive and the decrease in pmNoFailedAfterAdm after the RAXB upgrade is clearly noticeable. Failure After Admission: Other 17. If none of the above reasons for a poor CSSR are apparent, then it is likely to be a more complicated problem to resolve; often relating to a software/hardware fault, or perhaps an external source of interference in the area. At the time of writing, the 3G technology is not as mature as the current 2G system (as would be expected) and hence there are still numerous improvements being implemented in every software release, along with the continued development of new, more efficient and optimised hardware generations for the various 3G nodes. The example below illustrates one such problem of this type encountered. Example: FACTS Reports showing poor CSSR Speech with high pmNoFailedAfterAdm (1st plot); and high pmSetupFailuresSfXX indicating TXB congestion. However, the DL CE Usage is very low, seldom peaking above 6 channel elements so this doesn’t make sense. After investigating numerous RBSs showing these “symptoms” it was established that they all had a single HS-TXB as opposed to the other RBSs which all had a TXB as well as an HS-TXB. Both configurations are valid and have more than sufficient downlink channel element capacity. It was also noted that if the RBS is restarted then the problem disappeared for a few days and then re-appeared; this is clearly visible in the plots where the restart occurred on 2 January. This turned out to be a software fault for the single TXB configuration (due to a failure to release some resources on the TXB). The fix was delivered from software release P4.0.20 (whereas the release installed on the nodes at the time was P4.0.12). RETAINABILITY 18. If a cell has poor retainability it is typically due to either missing neighbour definitions (WCDMA and/or GSM), overshooting cell(s), a misbehaving neighbour site, a hardware/software fault or a misconfiguration. It is also possible that there is some external source of interference (such as a microwave link on the same frequency) affecting the retainability. 19. Retainability should be monitored independently for the different RAB types (e.g. Speech, CS Video, PS Interactive R99, PS Interactive HSDPA, etc.) as in certain situations only one of the RAB types will be affected. For example, a cell may be configured with GSM as the preferred HO type in which case Speech calls will perform IRAT handovers to GSM rather than performing IFHOs, but CS Video calls will perform IFHOs. In such a situation, missing inter-frequency neighbour cell relation definitions will impact the DCR of CS Video calls, but not Speech calls. 20. However, in the majority of cases the factors that affect the Speech retainability will also affect the retainability of the other RABs. When a high speech DCR is detected on a cell the first thing to check is the type of drops occurring as indicated by the counters pmNoSysRelSpeechSoHo, pmNoSysRelSpeechNeighbr, pmNoSysRelSpeechUlSynch and pmNoOfTermSpeechCong; and then to analyse the situation with the following in mind… Soft Handover Drops 21. Typically a cell that has a high number of dropped calls due to SOHO failures (pmNoSysRelSpeechSoHo) will also have a high number of drops due to missing neighbours (pmNoSysRelSpeechNeighbr) indicting that the SOHO failures are due to missing neighbour relations; however, there are situations where SOHO failures happen for other reasons. Two common reasons are a neighbouring cell that is misbehaving (often due to faulty hardware/software) or a misconfiguration resulting in a failure to perform an inter-RNC SOHO across the Iur interface. These two situations are illustrated in the following examples… Example: FACTS Reports showing two cells in the same area (1st plot) with a high pmNoSysRelSpeechSoHo and a much lower pmNoSysRelSpeechNeighbr indicating that the soft handover failures are not due to missing neighbours (2nd & 3rd plot). After further investigation it was discovered that the cells on the neighbouring site U4554 were automatically locked (4th plot) and the Mub interface to the site was down. These cells were transmitting CPICH yet there were multiple channels (RACH, FACH, etc) that were disabled preventing the site from carrying any traffic. However, UEs in the neighbouring cells were measuring the CPICH from these cells and attempting to perform SOHO to them. Such SOHO attempts were failing leading to the SOHO drops. As is clearly visible in the FACTS Reports, when the site U4554 came back on air on 22 Jan the SOHO drops on the neighbouring cells disappeared along with a huge reduction in the DCR experienced by these cells. Example: Refering to the three sites shown in the figure below (1st plot): U1393 and U0547 are on CTRNC1 while U3970 is on TBRNC1. The three FACTS Reports below (2nd, 3rd & 4th plots) show cells from these sites with a high pmNoSysRelSpeechSoHo. Note that although there are some drops due to missing neighbours (pmNoSysRelSpeechNeighbr), but most of the SOHO drops are for another reason. In this case the soft handover counters (pmRlAddAttemptBestCellSpeech and pmRlAddSuccessBestCellSpeech) indicated SOHO success between 3970C1 and 1393B1/547B1; however a GPEH trace of event INTERNAL_SOFT_HANDOVER_EXECUTION showed that these handovers actually failed (see snapshot of slide in 5th plot). It was established that a misconfiguration of an AAL2 routing case between the two RNCs resulted in all SOHO attempts across the Iur interface to fail. This was corrected on 9 Jan and from the FACTS Reports the improvement is obvious. Missing Neighbour Drops 22. A cell that has a high number of dropped calls due to missing neighbour relations will have a high pmNoSysRelSpeechNeighbr. A missing neighbour relation will only cause a dropped call if the RNC receives an Event 1a, 1c or 1d Measurement Report from the UE requesting the addition of a SC to the AS (or an HS cell change) for a SC that is not defined as a neighbour relation to any of the cells in the AS and if the Ec/No reported for that SC is releaseConnOffset above the Ec/No of the best serving cell in the AS; where the RNC parameter releaseConnOffset is typically set to 12dBm. The reason for this system release is to prevent excessive UL interference in the network. This type of dropped call is relatively easy to solve using the General Performance Event Handler (GPEH) tool in OSS-RC. With this tool all details on Event 1a, 1c or 1d Measurement Reports containing a SC not in the AS neighbour list may be captured and analysed using the INTERNAL_SOHO_DS_MISSING_NEIGHBOUR event (including those Measurement Reports that do not cause a system release of the call). In this way the missing neighbour or interfering cell may be established and appropriate action taken e.g. addition of the neighbour relation and/or antenna tilting, etc. For more information on the GPEH tool refer to the relevant documentation in the ALEX RNC and OSS-RC libraries. Because missing neighbour drops are relatively easy to solve, it is recommended to optimise the neighbour relations and antenna configuration until the percentage of drops due to missing neighbour relations is less than 10% of the total number of drops in each RNC. Example: FACTS Report showing a high DCR Speech on cell 1379C1 with the majority of dropped calls due to missing neighbours as shown by the counter pmNoSysRelSpeechNeighbr (1st plot). A GPEH trace with event INTERNAL_SOHO_DS_MISSING_NEIGHBOUR was executed on 4 Jan where it was found that SC 24 and SC 88 were the major cause of these missing neighbour drops (2nd plot). The cells in the area with these scrambling codes were found to be 10C1 and 416B1 (3rd plot). With the addition of these two neighbours to 1379C1 on 4 Jan the improvement in the DCR Speech from around 5% to around 2% is clearly visible in the FACTS Report. UL Synchronisation Drops 23. A connection is considered lost by the Radio Connection Supervision (RCS) function in the SRNC when the last radio link set (RLS) for the connection has been out-of-sync for a time given by the parameter dchRcLostT (or hsDschRcLostT for an HSDPA connection). When this occurs for a connection containing a speech RAB (or a multi-RAB containing speech) the counter pmNoSysRelSpeechUlSynch is incremented. A cell that has a high DCR with a relatively high number of dropped calls due to loss of UL synchronisation may have various contributing factors. These may include missing IRAT neighbour relation definitions resulting in the connection “hanging on” to the 3G network until the call is dropped when it would be better served handing the call over to the 2G network. This may be especially true for cells on the border of the 3G coverage area where the 3G signal may reach areas further away from the site than expected. By identifying such areas any missing 2G neighbour relations may be added; or perhaps a misconfiguration discovered, such as having an IRAT neighbour relation defined in an RNC towards a 2G cell that is not defined as an outer cell in the 3G MSC Server (see example below). Bear in mind that a maximum of 10 to 12 GSM neighbours is recommended for each 3G cell in order to avoid neighbour list truncation issues. Another means to improve the situation may be to lower the thresholds used to trigger IRAT HOs (refer to the settings for parameters usedFreqThres2dEcno, utranRelThresh3aEcno, usedFreqThresh2dRscp and utranRelThresh3aRscp). For example, triggering compressed mode at Ec/No=-11dBm instead of -12dBm may prevent drops as calls are handed earlier to the GSM network that typically has better coverage than the 3G network. If the power of a RL is very low then that RL will be more sensitive to sudden interference changes, so a cell that has a high number of UL synchronisation drops may cover a radio environment with a relatively high number of UEs experiencing sudden interference changes (generally caused by bridges, buildings, tunnels, steep hilly terrain, etc). In order to prevent the Power Control function from decreasing the power too much due to temporary good radio conditions, a minimum DL code power may be configured using the UtranCell parameter minPwrRl. Note that this parameter has an effect on the DL power capacity of the cell so should only be used if really necessary. An external interference source may also be the cause of UL sync drops; refer to the section ‘Other Reasons for Drops’ for an example of this. It is recommended to resolve any SOHO and missing neighbour drop problems before attempting to resolve UL sync drops as often the cause of such drops will resolve UL synchronisation drops too. Example: FACTS Report showing a high DCR Speech on cell 4412A1 with the majority of dropped calls due to loss of UL synchronisation as shown by the counter pmNoSysRelSpeechUlSynch (1st plot). It was discovered that the IRAT HOs towards the 2G network stopped functioning as shown in the 2nd plot below. After further investigation it was discovered that the definition of the 3G external cell on the 2G BSC/MSC had somehow been deleted leading to no 3G->2G handovers. After the 3G external cell definition was re-created in the 2G network the IRAT HOs were restored and the corresponding improvement to the DCR is clearly visible in the FACTS Report. Example: FACTS Report showing the reduction in the number of UL synch drops and the corresponding improvement to the DCR Speech of cell 2440A1 when usedFreqThresh2dRscp was changed from -106dBm to -104dBm, minPwrRl was changed to from -150 (-15dBm) to -140 (14dBm) and two additional GSM neighbour relations were added. This cell was on the border of the 3G coverage area. Release due to Congestion Resolve Action 24. When Congestion Control detects downlink cell congestion, besides limiting admission to the congested cell congestion resolve actions are also initiated in the cell. Initially non-guaranteed services are targeted which typically results in R99 packet interactive connections being channel switched to use the RACH/FACH common channels with the counter pmNoOfSwDownNgCong being incremented. If downlink cell congestion persists then RABs belonging to the other service classes (guaranteed and guaranteed-hs) may also be release with the following counters being incremented accordingly: pmNoOfTermSpeechCong, pmNoOfTermCsCong and pmNoOfTermHsCong. (Note that at the time of writing not all of these counters were available in FACTS). When such system releases are detected then the solution is to solve the congestion problems as described in the Accessibility section of this document. Normally the congestion should be detected and resolved through Accessibility monitoring before the DCR is significantly affected due to congestion resolve actions. Refer to the ‘Capacity Management’ document in the WCDMA RAN ALEX library for more information on the Congestion Control functionality. Example: At the time of writing only the following counters related to Congestion Resolve Actions were available in FACTS. Note that in this example only the counter pmNoOfSwDownNgAdm has been incrementing but this counter actually shows the number of channel rate downswitches (PS64/384->PS64/128 or PS64/128->PS64/64) triggered by Admission Control as opposed to the counter pmNoOfSwDownNgCong that shows the number of channel type downswitches (PS64/XX->RACH/FACH) triggered by Congestion Control. Other Reasons for Drops 25. If none of the above reasons for a poor DCR may be established, then it is likely to be a more complicated problem to resolve; often relating to a software/hardware fault, or perhaps an external source of interference in the area. At the time of writing, the 3G technology is not as mature as the current 2G system (as would be expected) and hence there are still numerous improvements being implemented in every software release, along with the continued development of new, more efficient and optimised hardware generations for the various 3G nodes. The examples below illustrate such problems with two difficult scenarios encountered. Example: FACTS Reports showing all three sectors of site U1619 suddenly experiencing a high DCR of around 80%. The counters give no indication as to the reason for the drop i.e. they are not SOHO, or missing neighbour or UL synch drops. After restarting the RBS, the DCR returns to “normal” on all three sectors; as can be seen in the plots below where the RBS was restarted at around 18:00 on 29 Aug. This type of situation appeared randomly on sites throughout the network and turned out to be a software fault in the software release implemented on the RBSs at the time. Example: FACTS Reports showing poor DCR performance with almost all drops being due to loss of UL synchronisation (1st plot). From the counters it is evident that the RRC connection setup success rate is lower than it should be (pmTotNoRrcConnecReq versus pmTotNoRrcConnecReqSuccess in 2nd plot); however, from drive tests it was only possible to establish a call if the UE was less than 100m from the site and with line of site to the antenna. After checking all configurations and equipment no cause for the problem could be established. It was noted from the drive tests that when a call was established the UE TX power was constantly at maximum (or very close there too). Also, from the RBS counter pmAverageRSSI (not available in FACTS at the time of writing) it was established that the uplink RSSI was constantly greater than -80dBm. These two facts indicate the presence of high UL interference. A spectrum analyser was used to establish that there was another signal in the area being transmitted on the same frequency and, as it turned out, Transtel had a microwave link using this frequency that went straight through the site (3rd plot). INTEGRITY 26. Integrity is defined as the ability of a user to receive the requested service at the desired quality. Typically the service quality is measured in terms of the transport block error rate (BLER) for the RAB. The BLER measurements obtainable are fairly limited but may be used as a benchmark of the service quality in the network. The general approach to improving Integrity is through network optimisation as described in the Retainability section of this document; improvements in Retainability KPIs should lead to improvements in Integrity KPIs too. UL Block Error Rate 27. From RNC counters, it is possible to obtain the UL BLER, after macrodiversity combining, per UeRc in each RNC. This effectively allows the BLER on each RNC to be monitored for each RB type, providing an indication of the service quality for each RB type. Refer to the ‘System Performance Statistics’ document in the ALEX WCDMA RAN library for a mapping of UeRc number to RB. However, at the time of writing these statistics were not available in FACTS. Drive Test Based Service Quality Measurements 28. From TEMS Investigation 7.1 the ability to measure the speech and video streaming service quality is introduced through two new KPIs: the WCDMA Speech Quality Index (SQI) and the Video Streaming Quality Index (VSQI). These two KPIs may be used to benchmark the service quality for speech and video streaming from drive test data. In addition to the above the DL BLER may also be obtained for R99 & HSDPA PS Interactive Throughput (RNC Level) 29. The throughput obtained on the packet interactive RABs (HSDPA and R99) are a good indication of the Integrity offered by these services. Counters are available at an RNC level to obtain the PS Interactive Average Throughput for R99 (DCH/FACH) and HSDPA, as well as the retransmission rate for these two services. Example: FACTS Report showing the weekly throughput and retransmission rate for HSDPA and R99 (DCH/FACH) PS Interactive RABs on RBRNC1. HSDPA (MAC-HS) Throughput (Cell Level) 30. The HSDPA service is marketed as a broadband mobile data service with superior performance to standard 3G (or R99) mobile packet data. Of course this means that it is important to ensure that an HSDPA user actually experiences superior performance to an R99 3G user and while the HS Packet Interactive RAB is capable of much higher throughputs than the R99 Packet Interactive RAB, one has to bear in mind that HSDPA uses a shared channel (HS-DSCH) whereas in R99 the channel is dedicated (DCH) to the user. The total MAC-HS cell throughput before re-transmissions (HSCELLDATARATE) and after retransmissions (HSCELLTHROUGHPUT), as well as the average MAC-HS throughput per user before re-transmissions (HSUSERDATARATE) and after retransmissions (HSUSERTHROUGHPUT) are available from RBS counters and implemented in FACTS. Note that this reflects the throughput over the radio interface and generally does not reflect problems in the transport network (See “Considerations for HSDPA: Iub Bandwidth”). As would be expected, there are relationships between the CQI value reported by the UE, the retransmission rate, and the cell throughput. Low CQI values reflect a poor radio environment and hence a lower throughput with higher retransmission rate. Bear in mind that this may be due to a single user that is in a poor coverage area, whereas the cell does provide good throughputs in other areas. In such a situation some antenna adjustments may improve performance; otherwise an additional site may be required to fill the coverage gap. Note too that for reliable throughput statistics a reasonable data volume is required. Example: FACTS Reports showing good MAC-HS Throughputs well above 1Mbps for the cell although dropping below 500kbps when the data volume drops below 500KByte per 15 minute interval (1st plot); good average reported CQI values mostly above 20 with the average used CQI value noticeably higher showing that the MAC-HS scheduler is scheduling the users in the better instantaneous radio environment (2nd plot); retransmission ratio and NACK ratio mostly below 0.1 (or 10%) correlating to the good radio environment reflected by the CQI values (3rd plot); a noticeable split between the cell throughput and user throughput as the average number of UEs with data buffered in the RBS increases (4th plot). Example: FACTS Reports showing poor MAC-HS Throughputs below 300kbps (1st plot); corresponding poor average reported CQI values of around 8 with the average used CQI below the average reported CQI possibly due to the poor ACK/NACK detection success rate and the fact that there is only one user in the cell (2 nd plot); very poor retransmission ratio (above 0.5) and NACK ratio (above 0.4) reflecting the poor radio environment reported by the CQI values (3rd plot). OTHER Considerations for HSDPA: DL Channelisation Codes 31. Typically when HSDPA is launched in a network it is configured to share the same carrier as the R99 traffic. In software release P4 this normally means that five HS-PDSCHs at spreading factor 16 are reserved for the HS-DSCH channel and an HS-SCCH at spreading factor 128 is required too; i.e. 32% of the available downlink channelisation codes are reserved for HSDPA. When R99 and HSDPA. With this configuration it is typical for downlink channelisation code congestion to be the limiting factor of the cells capacity. The P5 software release introduces a feature called HSDPA Dynamic Code Allocation making it possible for the HSDPA Scheduler to only use the channelisation codes available after R99 usage. This may help with downlink channelisation code congestion, but at the expense of reduced throughput for HSDPA users. It is still possible to reserve a limited set of codes for HSDPA. For more about downlink channelisation code congestion refer to section ‘Admission Control: Downlink Channelisation Codes’. To solve downlink channelisation code congestion in the long term the introduction of a second carrier frequency on the site is required. Typically the original carrier is configured to carry the R99 traffic with all HSDPA traffic being directed towards the newly introduced second carrier. Considerations for HSDPA: Iub Bandwidth 32. With the high throughputs that are achievable with HSDPA, the capacity of the Iub interface between RBS and RNC becomes an important factor effecting HSDPA performance and customer perception. The throughput on the radio interface may be monitored as described in the Integrity section ‘HSDPA (MACHS) Throughput’, however even if the radio interface provides good throughputs the application layer may still experience sub-standard throughputs if the Iub interface is congested. Typically the R99 traffic is configured to use AAL2 QoS class A and B, whereas the HSDPA traffic uses AAL2 QoS class C; i.e. across the Iub interface the R99 traffic has priority over the HSDPA traffic. As the R99 traffic on the site increases, so the Iub bandwidth available for HSDPA decreases. The counter pmCapAlloclubHsLimitingRatio (in 1/10th percent) provides an indication of the percentage of time for which the Iub interface limits the HSDPA throughput. When high pmCapAlloclubHsLimitingRatio is experienced in conjunction with lost HSDPA data frames (HSDATAFRAMESLOSTRATIO formula) then the Iub capacity is causing serious performance degradation. To alleviate the problem some traffic may be offloaded to GSM (see ‘Offloading Traffic’ sections), but ultimately additional Iub bandwidth is required unless an additional site, perhaps an inbuilding site at a shopping centre, may be commissioned to carry part of the load. Example: FACTS Reports showing the improvement to pmCapAlloclubHsLimitingRatio and the disappearance of lost HS data frames when the Iub interface was upgraded from one E1 to two E1s. In the second figure, notice that there is no obvious improvement to the HSDPA Throughput as these counters measure the throughput across the radio interface (at the MAC-HS layer) whereas the Iub was congested. Considerations for HSDPA: Throughputs Above 3.6Mbps (Cat 7-10 UEs) 33. To support HSDPA throughputs above 3.6Mbps (which is possible from the P5 software release through support of more that five HSDPA codes per cell) it is likely that many cells will required the introduction of a second carrier frequency in order to prevent downlink channelisation code congestion. Also, the Iub bandwidth will need to be upgraded to make such throughputs possible and the existing DL channel element capacity (TXB/HS-TXB combinations) will need to be reviewed in order to cater for more than five HSDPA codes per cell. Considerations for HSDPA: Channel Element Usage 34. Another consideration with the introduction of HSDPA is the channel element capacity required in the RBS to support the HS downlink channels and the ADCH in the uplink. Because HSDPA is broadband in the downlink only the tendency is to provide sufficient channel element capacity in the downlink without properly considering the uplink channel element requirements for the A-DCH. This is especially evident when the optional UL384 RAB (SF 4) is used as this RAB requires 16 channel elements; as opposed to the standard UL64 RAB (SF 16) that requires 4 channel elements. The number of SF 4 radio links allowed in a cell may be limited to reduce the channel element usage of this RAB, however too low values lead to poor packet interactive SOHO performance: an UL384 user who is in a SOHO area, with one cell that already has it’s maximum allocation of SF 4 radio links, will continuously send Event 1a messages requesting the addition of this cell to the active set, but this will continuously fail resulting in poor SOHO statistics. Example: FACTS Reports showing how the SF 4 UL setup failures (ULSETUPFAILURESSF4) increased rapidly after the introduction of HSDPA, and then disappeared when the UL 384 RAB (SF 4) was deactivated throughout most of the network (by changing parameter sf4AdmUl from 2 to 0). The improvement was a combination of the SOHO problem described above and the fact that many RBSs did not have sufficient channel element capacity. In the second plot notice how the Packet SOHO Success Rate was declining and then improved after the deactivation of the UL384 RAB in 2006 Week 37. Offloading Traffic: Directed Retry to GSM 35. When a UTRAN cell is congesting, for whatever reason, one possibility to provide temporary (and often only partial) relief is to enable Directed Retry to GSM on that cell. In the P4 software release this is the best option; however for the P5 software release refer to the section “Offloading Traffic: Service Based Handover” below. Only speech calls may be re-directed to the GSM network and only one directed retry target GSM cell may be specified (typically the co-sited cell). It is a blind redirection so it is important that the target GSM cell has the same (or better) coverage area as the UTRAN cell. It is also very important to ensure that the GSM cell has the capacity to handle the additional speech traffic. The redirection is triggered by a transmitted DL power threshold which is not ideal for situations where DL power is not the cause of congestion; however a suitable DL power threshold may be established for other forms of cell congestion. Example: Two slides showing an example calculation to estimate the power threshold to use i.e. the setting for parameter loadSharingGsmThreshold. Example: FACTS Reports showing how the number of directed retries to GSM increased as the threshold specified by loadSharingGsmThreshold was changed to 15%, 14%, 13% and finally kept on 12% (1st plot); the corresponding improvement to CSSR Speech and the decline in pmNoReqDeniedAdm (Admission Control was rejecting due to a lack of DL channelisation codes) (2 nd plot); the corresponding improvement to CSSR HS (3rd Plot). Offloading Traffic: Service Based Handover (P5+) 36. With the introduction of the P5 software release comes an optional feature called Service Based Handover. Similar to the Directed Retry to GSM feature, Service Based Handover allows Speech calls to be redirected to GSM, but unlike Directed Retry it is not a blind handover at Call Setup with only one GSM target cell per UTRAN cell. Instead, the speech call is established in the 3G cell and the RNC then instructs the UE to enter compressed mode and to start measuring the GSM neighbours defined. An IRAT HO to the best GSM cell candidate that fulfils the GSM threshold criteria (gsmThresh3a) is then performed (unless a timer times out). This feature can also be activated per subscriber via a service indicator. DEFINITIONS A-DCH AAL2 ASE ATM BLER CAC CE CN Associated DCH ATM Adaptation Layer type 2 Air Speech Equivalent Asynchronous Transfer Mode Block Error Rate Call Admission Control Channel Element Core Network CSSR DCH DCR DPCH DL GPEH HS-DSCH HS-PDSCH HS-SCCH HSDPA IFHO IRAT Iu Iub Iur MCPA MGW Mub QoS RAB RANAG RAXB RBS RL RLS RNC RRC RU RXI SC SF SOHO SRNC TXB UeRc UL Call Setup Success Rate Dedicated Channel (Transport Channel) Drop Call Rate Dedicated Physical Channel (Physical Channel) Downlink General Performance Event Handler High-Speed Downlink Shared Channel (Transport Channel) High-Speed Physical Downlink Shared Channel (Physical Channel) High-Speed Shared Control Channel (Physical Channel) High-Speed Downlink Packet Access Inter-Frequency Handover Inter-Radio Access Technology Interface between an RNC and a CN Interface between an RNC and an RBS Interface between two RNCs Multi-Channel Power Amplifier (board in RBS 3202) Media Gateway Management (O&M) Interface towards RBS from OSS-RC Quality of Service Radio Access Bearer Radio Access Network Aggregator Random Access and Receiver Board Radio Base Station Radio Link Radio Link Set Radio Network Controller Radio Resource Control Radio Unit (board in RBS 3206) Product instance of RANAG Scrambling Code Spreading Factor Soft(er) Handover Serving-RNC Transmitter Board UE Radio Connection Configuration Uplink