ISS CRITICAL INCIDENT REPORT INCIDENT DETAILS Brief description: No access to staff email, external sites and WHD on Thursday 12th Sept 15:00 to 15:30; Friday 13th Sept 19:00 - 23:00 Start date and time: Thursday 12th Sept 15:00, Friday 13th Sept 19:00, Saturday 20th Sept 4:30 End date and time: Thursday 12th Sept 15:30, Friday 13th Sept 23:00, Saturday 20th Sept 21:00 Date of this report:18 Oct 2013 Incident Number: CIR-1309-0001 INCIDENT SUMMARY At the specified times connections to the internet from all staff desktops stopped. In parallel with this, user access to exchange2010 was dropped and the service desk application was unresponsive. INCIDENT ANALYSIS How and when did ISS find out about the incident? The issue was evident when email access was lost for everyone using exchange2010. Which service(s) were affected? Exchange2010 Service desk application Staff internet access What locations were affected? Campus wide What was the user impact? No internet access for all staff an no email access for staff using exchange2010 How was service restored? The incident on Thursday 12th resolved itself without any user interaction. The incident on Friday 13th required a network configuration change to reroute all internet bound traffic away from the proxy servers and load balancers on a temporary basis. Traffic was rerouted back to load balancers and a repeat of the incident on Saturday 20th provided time to find the root of the issue and affect a resolution. If service restoration was a work round how & when will a permanent solution be effected? N\A What is needed to prevent a recurrence? ACL’s have being implemented to prevent a recurrence of this particular incident. Are we currently at risk of recurrence or other service disruption following the incident? In what way? What needs to be done to mitigate the risk? No increased risk of a recurrence of this type of incident over and above the normal risk that an incident of a similar nature can occur. The best mitigation is a robust end point patch management approach. ROOT CAUSE The root cause of this incident was a Trojan infection on a video conferencing device installed on the network. The video conference unit was attempting a denial of service attack against a European web site by generating large numbers of small fragmented packets with a spoofed source address. These malformed packets travelled correctly through the network but caused a failure on load balancers which are in the path of the client to internet connection. The load balancers dropped all traffic when hit by the large volume of fragmented traffic. The issue was resolved by removing the VC unit from the network and additionally upgrading the software version on the Load balancers. FOLLOW UP ACTIONS AND RECOMMENDATIONS See Lessons learned below LESSONS LEARNED Incident_WSA_lesso ns_learned.docx