Ignite Webcast - Office 365 Service Management

advertisement
3
4
AUG
SEP
OCT
AUG
SEP
OCT
AUG
SEPT
OCT
AUG
SEP
OCT
Americas
99.99% 99.99% 99.99% 99.95% 99.97% 99.98% 99.99% 99.99% 99.99% 99.99% 99.95% 99.92%
EMEA
99.99% 99.99% 99.99% 99.95% 99.97% 99.98% 99.99% 99.99% 99.99% 99.99% 99.95% 99.92%
APAC
99.99% 99.99% 99.99% 99.95% 99.97% 99.98% 99.99% 99.99% 99.99% 99.99% 99.95% 99.99%
Implementing Resilience
Online and offline functionality
• Monitoring system attempts automated
recovery actions and alerts 24x7 on-call
engineer
• On-call engineers are core product group
members in the relevant areas
Active load balancing
• Multiple levels of hardware
and network
• Facilities and power redundancy with
at least 2 datacenters per region
Detailed logging and tracing
• Recovery across “failure domains”
regularly tested
• Service component isolation
Additional
Channels
Primary
Channels
Incident Status
Status
Description
Investigating
Monitors have indicated a service anomaly and/or Microsoft has received reports of a potential service
incident. Microsoft is currently investigating.
Service Interruption
Microsoft has confirmed that normal services are being impacted. Microsoft is taking immediate action to
understand the cause of the failure and determine best course of action to restore service.
Service Degradation
Services are still active, but service responsiveness and/or delivery times may be slower than usual. Microsoft
is working to restore normal service responsiveness.
Restoring Service
Microsoft has isolated the likely cause of the incident and is in the process of restoring service
Extended Recovery
Services are restored and may be slower than usual
Service Restored
Normal system services have been restored
False Positive
The service is healthy and a service incident did not actually occur
Additional Information
There is additional information provided
Normal Service
The service is healthy
SHD
icon
?
Roles and Responsibilities
Roles and Responsibilities
Type
Description
Channel
Planned Maintenance Update
• 5 business days prior notification of planned service
maintenance.
• Notification includes start and end time.
• Service Health Dashboard
• RSS Admin Feed (for
subscribed admins)
Are published for Service Availability issues that span multiple customers
Available within 5 business days
Accessible via the Service Health Dashboard
A PIR includes:
•
•
•
•
•
•
Incident Information
Summary
Customer Impact
Incident Start Date and Time
Root Cause
Next Steps
More detailed information around service updates
Transparent non-customer impacting service hygiene
Evaluate peak usage times and if needed schedule service
maintenance during non peak times
PIR downloadable document format from SHD
30 day view for PIRs
Office 365 Community.
http://community.office365.com/en-us/preview/tools/troubleshooting.aspx)
http://community.office365.com/en-us/preview/wikis/diagnostic_tools/2146.aspx#smallbusinesses
http://community.office365.com/en-us/preview/wikis/diagnostic_tools/2146.aspx#enterprises
https://outlook.com/owa.
https://<domain>.sharepoint.com/<pagename>.aspx.
https://<domain>.sharepoint.com/personal/<UserAlias>_<domain>/Documents/Forms/All.aspx
For Limited Set of Service Incidents
Explanation of Incident
Localized Content
22
Migration
A transfer of computer data from one system to another
•Moving from on-premises to Office 365
•Moving from a legacy platform to Office 365
Transition
A process in which something undergoes a change and passes from one
state, stage, form, or activity to another
• Moving from BPOS service to the Office 365 service
• Moving from Live@Edu to the Office 365 service
Upgrade
The act or an instance of bringing something up to date
• Enabling new features for existing customers
23
https://spsites.microsoft.com/sites/bosm/boswiki/Pages/Update-to-Generic-MX-Records.aspx
http://support.microsoft.com/kb/2808208/EN-US
http://technet.microsoft.com/en-us/office/ee748587.aspx
Download