Using Azure ExpressRoute with Microsoft 365 v1.0 Published: March 2021 © 2021 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. Some examples are for illustration only and are fictitious. No real association is intended or inferred. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. Introduction .................................................................................................................................................. 3 What is Azure ExpressRoute? ....................................................................................................................... 4 Azure Private Peering for Virtual Networks .............................................................................................. 4 Microsoft Peering for Office 365 .............................................................................................................. 5 Microsoft Policy for using ExpressRoute with Microsoft 365 and why it is often not the best option ........ 6 Requirements for inbound connectivity mean complex routing .............................................................. 7 Direct and local Internet egress provides best performance ................................................................. 10 ExpressRoute is not an availability solution and Internet access is required ......................................... 14 ExpressRoute is not a security solution .................................................................................................. 14 Where does using ExpressRoute with Microsoft 365 make sense? ........................................................... 14 Technical requirements for using ExpressRoute with Microsoft 365 ......................................................... 15 Multiple circuits per region ..................................................................................................................... 15 Minimize network backhaul to the ExpressRoute circuit ....................................................................... 15 Use redundant Internet connections...................................................................................................... 15 Produce detailed network flow diagrams to manage asymmetric routing ............................................ 16 Design each circuit with unique public NAT pools .................................................................................. 16 Provide public Autonomous System Number (ASN) ............................................................................... 16 Public DNS availability ............................................................................................................................. 16 Best Practices for Connecting to Microsoft 365 ......................................................................................... 16 Optimize Office 365 traffic ...................................................................................................................... 17 Enable Local egress ................................................................................................................................. 18 Enable Direct connectivity ...................................................................................................................... 19 Modernize Security for SaaS ................................................................................................................... 21 What does a modern, Internet-first, enterprise network look like? .......................................................... 22 Testing your enterprise connectivity .......................................................................................................... 25 FAQ.............................................................................................................................................................. 25 Does Microsoft recommend using ExpressRoute with Microsoft 365? ................................................. 25 How Long does it typically take to implement ExpressRoute for use with Microsoft 365? ................... 25 Can ExpressRoute keep all my Microsoft 365 traffic off the internet and remove my need for an active internet connection? .............................................................................................................................. 25 Does ExpressRoute allow Microsoft 365 usage when my Internet links are down due to DDoS attack? ................................................................................................................................................................ 25 Do I need to provide public DNS to my users? ....................................................................................... 25 Do I need public IP space to use ExpressRoute for Microsoft 365? ....................................................... 25 Page | 2 Do I need a public ASN? .......................................................................................................................... 25 Does ExpressRoute connect me directly into Microsoft’s datacenters? ................................................ 26 Is a single ExpressRoute circuit sufficient? ............................................................................................. 26 Can I use ExpressRoute for inbound initiated connectivity from Microsoft 365? .................................. 26 Does ExpressRoute provide me with additional security for Microsoft 365 traffic? .............................. 26 Next Steps if I think using ExpressRoute with Microsoft 365 is the right approach for my organization .. 26 Further Guidance ........................................................................................................................................ 26 Introduction Azure ExpressRoute is a Microsoft service which allows you to create private connectivity between your own networks and Microsoft’s backbone so that supported traffic can flow between the two networks privately. Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions running in Azure benefit from using ExpressRoute as it often addresses network architecture and performance concerns for these types of services. A detailed overview of ExpressRoute and its capabilities can be found here. Unlike IaaS and PaaS, Software as a Service (SaaS), including Microsoft 365, is designed to be accessed securely and reliably via the Internet as its primary model, and it does not generally benefit from using ExpressRoute. As a result, ExpressRoute is not recommended for use with Microsoft 365 for most customers. Using ExpressRoute with Microsoft 365 presents considerable technical challenges that must be overcome, and it carries a high risk of service outage if incorrectly configured and increases ongoing network complexity. In addition, ExpressRoute often creates an inefficient network model that is poorly suited for a globally distributed SaaS service like Microsoft 365. This inefficient network model often results in poorer performance, typically due to the use of backhauling to send traffic to a centralized ExpressRoute link. For the majority of customers, using a direct Internet connection to Microsoft 365 provides the fastest, simplest, most extensible, and lowest cost connectivity model. Using local and direct Internet connectivity to Microsoft 365 is therefore strongly recommended for the majority of customers. Using ExpressRoute with Microsoft 365 is blocked by default and Microsoft has an authorization process in place for customers who wish to enable ExpressRoute for use with Microsoft 365. The authorization process is used to protect customers from service outages caused by misconfiguration or inadvertently enabling the Microsoft 365 routes, to allow Microsoft to provide relevant information, and to ensure the right Microsoft account staff are available before any investment in design or infrastructure. Our goal is to ensure that Microsoft 365 customers have all the information they need to deliver the best connectivity model for the enterprise, and to help them understand that ExpressRoute does not provide the best connectivity model for Microsoft 365 in almost all circumstances. This document provides you with detailed information required to understand the achievable outcomes from using ExpressRoute with Microsoft 365 and the steps required to successfully implement ExpressRoute. The goal is to allow you to assess whether your specific scenario is one of the few scenarios where ExpressRoute is the right connectivity model for Microsoft 365. Page | 3 What is Azure ExpressRoute? Azure ExpressRoute is a service offering from Microsoft that provides private network connectivity between your organization’s network and Microsoft’s backbone, facilitated by a connectivity provider. This configuration allows supported traffic to flow between the two networks, avoiding the use of the Internet for supported traffic. ExpressRoute does not connect your network directly to a Microsoft datacenter, but instead connects it to Microsoft’s global backbone in the location of your choice. From there, the Microsoft backbone routes the traffic to between the destination within Microsoft’s infrastructure and your network, as required. Microsoft’s backbone network carries traffic peered via the Internet, so essentially ExpressRoute provides a private connection to and from the same network. ExpressRoute is an Azure service and it’s circuits are built and configured within the Azure portal for a given subscription. As illustrated below, connectivity over these circuits falls into two categories: Azure Private Peering and Microsoft Peering. Figure 1 - ExpressRoute Peering Types Azure Private Peering for Virtual Networks Azure compute services, namely virtual machines (IaaS) and cloud services (PaaS) that are deployed within a virtual network, can be connected through the private peering domain. The private peering domain can be considered a trusted extension of your core network into Azure. You can set up bi-directional connectivity between your core network and Azure virtual networks (VNets). This peering lets you connect to virtual machines and cloud services directly via their private IP addresses. An example connection using this method would be for a client on the organization network to connect to an application server running in a private VNet in Azure via it’s RFC 1918, non-publicly routable address such as 10.1.1.2. Enabling and using this model of peering is very simple and low risk as it is a simple extension of the existing corporate network to VNets in Azure, and it can be enabled and configured on demand by any ExpressRoute customer. Page | 4 Microsoft Peering for Office 365 The second type of peering is called Microsoft peering, previously named as two peering types: Public and Microsoft Peering. Microsoft Peering allows connectivity to the public IP space in both Azure PaaS and Microsoft 365. Connectivity requires the use of NAT on the customer side of the circuit so that traffic sent to Microsoft has a publicly routable IP address. There are two elements to Microsoft Peering: Azure Routes and Microsoft 365 service routes. 1. Azure Routes Like private peering, connectivity to public Azure IP ranges can be enabled and configured on demand by any ExpressRoute customer. The IP ranges are segmented into Border Gateway Protocol (BGP) communities, enabled by route filters in the Azure portal. For example, if you want to connect only to the Azure public IP space in the West US and West US 2 Azure regions, you can select the route filters that contain the IP ranges for those regions. This means a path to these IP addresses is available via ExpressRoute. Azure also operates a number of service specific BGP communities, such as Azure Active Directory. Figure 2 - Azure Region Route Filter Selection 2. Microsoft 365 Routes Unlike Azure, Microsoft 365 does not operate regional IP space, so routes to Microsoft 365 services are separated into service offered or service grouping instead of regional BGP communities. Before you can add any route filters for Microsoft 365, you must first receive authorization from Microsoft. If you configure any of these route filters without authorization, an error will occur: • • • • Exchange (12076:5010) SharePoint Online (12076:5020) Skype for Business (includes Teams) (12076:5030) Other Office 365 Services (12076:5100) For example, if you want to configure your Exchange Online traffic to use ExpressRoute (once you’ve obtained authorization and done your planning), you will select the Exchange Online BGP route filter on Page | 5 your ExpressRoute circuit and Microsoft would start offering routes to Exchange Online via your ExpressRoute circuit. You would then decide how to handle these routes and where they were advertised into your internal network. Traffic sent to one of the Exchange Online IP addresses from Outlook or onpremises Exchange (in a hybrid environment) would then be able to use the ExpressRoute circuit. Note CRM Online (12076:5040), which applies to Microsoft Dynamics v8.2 and below, does not need authorization to be enabled. For versions higher than 8.2, select the regional Azure community containing your Dynamics deployment. Figure 3 - Microsoft 365 Service - Route Filter Selection ExpressRoute can also be used to provide inbound connectivity from Microsoft to supported onpremises elements such as Exchange Server in a hybrid configuration and Active Directory Federation Services (ADFS); however, this involves significant complexity and risk, as discussed in the next section. Microsoft Policy for using ExpressRoute with Microsoft 365 and why it is often not the best option Using ExpressRoute with Microsoft 365 is supported and in-use by a number of Microsoft 365 customers. However, as described above, it requires approval from Microsoft for its use with Microsoft 365, which is not the case for Azure. This policy exists because connectivity to Azure resources via ExpressRoute generally is quite simple for organizations to successfully enable. The majority of connections are outbound from the organization to Azure, and ExpressRoute simply becomes a BGP routing path to a defined set of IP addresses controlled by the customer. This makes the model low risk in terms of implementation and long-term management. In addition, these connections are typically point-to-point from an on-premises datacenter to an Azure datacenter; thus, network routing can be planned easily via ExpressRoute. For these reasons ExpressRoute is the recommended connectivity model for most Azure resources. For Microsoft 365, however, ExpressRoute is not the best model for most organizations for four primary reasons: 1. Requirements for inbound connectivity mean complex routing; 2. Direct and local Internet egress often provides better performance than backhauling traffic to a centralized ExpressRoute circuit; Page | 6 3. ExpressRoute is not a network availability solution for Microsoft 365, and Internet access is still required for Microsoft 365 service operation; and 4. ExpressRoute is not a security solution, and there are no security benefits to using ExpressRoute with Microsoft 365. Let’s examine these reasons in detail. Requirements for inbound connectivity mean complex routing Microsoft recommends that customers using ExpressRoute with Microsoft 365 keep inbound connectivity over the Internet to ensure that necessary endpoints are accessible, to minimize the changes required to the customer’s environment, and to simplify the routing model. The service is designed to operate over the Internet securely and you can read more about how Exchange does this in this article. In an enterprise environment, Microsoft 365 connectivity often requires elements of inbound connectivity (traffic from Microsoft to the enterprise network). Examples of this include: • • • • • • • SMTP services from Exchange Online tenant to an on-premises server, or mail sent from SharePoint Online to an on-premises server ADFS, during password validation for sign-in Exchange Server hybrid deployments SharePoint hybrid federated search results SharePoint business connectivity services hybrid solution Skype for Business hybrid connectivity and/or Skype for Business federation Skype for Business Cloud Connector Edition The most common traffic within a corporate environment is Exchange hybrid and SMTP-related traffic. These inbound connections are used by Microsoft 365 services to communicate with on-premises servers, but there are still other elements required for communicating with non-Microsoft resources, such as users connecting from outside of the corporate network. In addition, the SMTP protocol is used more broadly within Microsoft's network than route prefixes shared over ExpressRoute, and advertising on-premises SMTP servers over ExpressRoute causes failures with these other services. When introducing ExpressRoute into the environment, it is possible to move or expose these endpoints only to Microsoft via the ExpressRoute circuit by changing the public DNS record to point to an IP address only available via ExpressRoute. Unfortunately, this comes with the loss of essential connectivity from non-Microsoft endpoints. For example, the ADFS or Exchange Autodiscover endpoints may be necessary for users when authenticating or accessing their email from outside the organization’s network. Exposing these endpoints over ExpressRoute means that the users can only access them from within the organization’s network. Moreover, cloud-based load balancing devices for managing inbound connectivity cannot work via ExpressRoute unless they are hosted within Azure (thus severely restricting the available options). There is an even more pressing issue, though: the high risk of outage (and one of the reasons why authorization is required from Microsoft before ExpressRoute can be enabled for use with Microsoft 365). Consider an enterprise network that has been successfully working with Microsoft 365 for many years, including using Exchange hybrid and its required endpoints advertised and available over the Internet. If Page | 7 an admin introduced Exchange Online routes via ExpressRoute without significant planning, it would likely introduce asymmetric routing and cause an outage of the service. Asymmetric routing in this example is when traffic from Microsoft 365 to on-premises Exchange servers arrives from the Internet and return traffic from the on-premises servers to Microsoft 365 travels via ExpressRoute. The following diagram further outlines this scenario. Below, traffic from Exchange Online enters the onpremises network via the Internet but tries to return to Microsoft 365 via the ExpressRoute circuit. Figure 4 - ExpressRoute Asymmetric Routing Issue Asymmetric routing in itself is not an issue when the paths are fully traversable. In fact, it is how many networks operate. Further, there are no stateful devices in Microsoft’s cloud services that prevents asymmetric routing from working in this scenario. However, in many enterprise environments, connectivity is controlled on stateful devices such as firewalls. If a device does not know about a TCP connection that arrived through another device, it will drop the outbound packets. When an admin enables the Exchange route filter in ExpressRoute, it modifies the designed flow of traffic to/from Microsoft 365, causes outbound packets to be discarded, and ultimately causes a service outage. These problems can ultimately be resolved using solutions such as source NAT or routing segmentation outlined below, but those solutions take time and add complexity to both network management and Office 365 design and administration. • Using source NAT to hide the Microsoft IP from the internal network. Page | 8 This solution keeps the inbound flows symmetrically flowing over the Internet by implementing source NAT to the inbound connections to replace the originating Microsoft IP address with an internal IP address. This ensures that the response from the on-premises element does not pick up the BGP routes offered by ExpressRoute within the internal network, as the IP address used is not one from Microsoft. Thus, the traffic will flow out of the network it came from (e.g., the Internet). Figure 5 - Source NAT solution for Asymmetric Routing • Route Segmentation Another option is implementing route segmentation to prevent BGP routes from ExpressRoute being available on any network segment where traffic flows from Microsoft to/from on-premises. Using route segmentation will prevent traffic from using the ExpressRoute circuit for the return path. Note that this model presents a higher risk than using NAT due to the risk of unexpected network changes/flows removing the segmentation. Page | 9 Figure 6 - Route Segmentation solution for Asymmetric Routing It is critical to understand that using ExpressRoute with Microsoft 365 adds significant complexity to the management of network traffic for Microsoft 365. Typically, network and architecture planning that incorporates ExpressRoute takes much longer to successfully complete. In contrast with Azure routing, it is just not as easy as adding a route filter to the circuit. Direct and local Internet egress provides best performance ExpressRoute customers often implement a circuit at their main office location for a region, as this is where the majority of users and network traffic exists. Moreover, each circuit takes a certain level of investment and implementation work, often making it not feasible to deploy in more than one location per region. For example, a large multinational corporation with fifty global sites typically chooses to use ExpressRoute in three main locations in each region: one in London (for EMEA), one in Singapore (for APAC), and one in New York (for North America). As noted previously, SaaS services like Microsoft 365 don’t generally benefit from traditional point to point networking where we are connecting the client to the perceived location of the data. For Microsoft 365, data and service entry points are not in a single place and the active endpoint the client can use to connect to the service will very likely be local to the user, regardless of where their data at rest is held. Page | 10 Customer Example: Contoso Consider the scenario below. Contoso has an ExpressRoute circuit in New York at their USA headquarters. They also have several sites on the west coast of the US with local Internet egress. If we look at the latency figures and compare the use of direct Internet connectivity to Microsoft 365 from Los Angeles with using Contoso’s MPLS network to backhaul traffic to ExpressRoute in New York, we can see there is a significant reduction in latency when a direct Internet approach is used. Connectivity from LA office to Microsoft 365 Latency to Microsoft 365 front door Microsoft 365 front door location used Local Internet in LA 5 ms US West Cost ExpressRoute in NYC via Contoso MPLS 80 ms US East coast Table 1 - Contoso Connectivity Details Using local Internet egress enabled Contoso to eliminate a significant volume of traffic from their WAN, allowing Contoso to free up capacity for traffic that requires the MPLS route, or to reduce the size of their WAN circuits. Providing cloud services local to users Two important elements are in place that enable Microsoft to provide cloud services to Microsoft 365 customers around the world as if they were local to each user: Microsoft’s global network, and Microsoft 365 front door services. a. Microsoft global network Microsoft owns and operates one of the largest networks in the world. This global and sophisticated architecture connects our datacenters and customers, and it is part of Microsoft's multi-billion-dollar investments to deliver a global, dynamic, and resilient cloud infrastructure. Our network consists of very high bandwidth, low latency, failover-capable links with over 165,000 route miles of privately-owned fiber, with multi-terabit connections from each datacenter to each other, and to the Internet edge. As the time of writing, our network connects to over 4000 networks globally with more than 200 points of presence where we peer with ISPs and network providers. Our network is optimized to get your traffic to and from its destination as quickly as possible and covers all Microsoft traffic, including Azure, Microsoft 365, Xbox Live, Dynamics 365, Bing, and more. To understand where our network is relative to your users, the following table contains the list of public and private peer points (as of February 2021). We often have multiple peer points in each location. For the complete list, and to see which networks are present in each location, see PeeringDB (Microsoft's ASN number/network ID is 8075). Page | 11 Location Country Location Country Location Country Buenos Aries Argentina Dublin Ireland Istanbul Turkey Brisbane Melbourne Perth Sydney Vienna Brussels Manama Rio de Janeiro Sao Paulo Sofia Montreal Toronto Australia Australia Australia Australia Austria Belgium Bahrain Brazil Brazil Bulgaria Canada Canada Tel Aviv Rome Milan Turin Osaka Tokyo Nairobi Israel Italy Italy Italy Japan Japan Kenya Cyberjaya Johor Bahru Kuala Lumpur Mexico City Queretaro Malaysia Malaysia Malaysia Mexico Mexico Dubai Fujairah London Slough Manchester Kyiv Ashburn Atlanta Boston Chicago Dallas Denver UAE UAE UK UK UK Ukraine USA USA USA USA USA USA Vancouver Santiago Bogotá Zagreb Prague Copenhagen Cairo Canada Chile Colombia Croatia Czech Republic Denmark Egypt Helsinki Marseille Paris Berlin Dusseldorf Finland France France Germany Germany Amsterdam Auckland Wellington Lagos Oslo Stavanger Manila Warsaw Lisbon Bucharest Moscow Jeddah Netherlands New Zealand New Zealand Nigeria Norway Norway Philippines Poland Portugal Romania Russia Saudi Arabia Detroit Honolulu Houston Jacksonville Las Vegas Los Angeles Miami Minneapolis Nashville New York Newark Palo Alto USA USA USA USA USA USA USA USA USA USA USA USA Frankfurt Germany Geneva Switzerland Phoenix USA Singapore South Africa South Africa South Korea Spain Spain Sweden Switzerland Taiwan Thailand Portland Reston Reston San Antonio San Diego San Jose Seattle Ho Chi Minh City USA USA USA USA USA USA USA Vietnam Hamburg Germany Singapore Munich Germany Cape Town Athens Greece Johannesburg Hong Kong Hong Kong SAR Seoul Budapest Hungary Barcelona Chennai India Madrid Hyderabad India Stockholm Mumbai India Zurich New Delhi India Taipei Jakarta Indonesia Bangkok Table 2 - Microsoft’s global Internet peering locations This same network is used by ExpressRoute to reach destinations within Microsoft’s infrastructure. ExpressRoute circuits connect you to the edge of this network, avoiding the short leg over the public Internet for a subset of Microsoft 365 traffic. Given the scale of this network, in most metropolitan areas of the globe, traffic should be on the Internet only until your network provider hands it off to Microsoft's network or to another network that will transmit it to Microsoft (we'll look at how you can assess that in the next section). If egressed to the Internet locally to the user, Microsoft 365 traffic is not likely to be on the Internet for more than a few hops. In most cases, the traffic uses Microsoft's optimized network to backhaul to the Page | 12 location of your Office 365 data. Organizations using expensive MPLS circuits to backhaul traffic to a centralized egress can save money on private backhaul, reduce usage of the MPLS circuits so traffic that requires those circuits have more headroom, and deliver higher performance for Microsoft 365 by locally breaking out Microsoft 365 traffic to the Internet. b. Local Service Front Doors In addition to our global network that connect users to their data and services, Microsoft 365 also has distributed service entry points (referred to as front doors) for various services around the globe, in both datacenters and at the edge of the global network. This design delivers high performance to users regardless of where their tenant or data resides. Service front doors operate differently for each workload, but all use the same principles from a connectivity perspective. The way in which these front doors operate may change over time, but Microsoft is constantly working to move these front doors closer and closer to users. • • • Exchange Online uses local Client Access Front-End (CAFE) servers throughout the globe. When local egress occurs with local DNS resolution, clients use anycast DNS to find and connect to a CAFE server close to where the user is regardless of where the user’s data resides. This provides low latency to the service front door. When required, data is backhauled between the CAFE server and the user’s mailbox server over Microsoft's managed network. SharePoint & OneDrive work differently from Exchange Online. They use an anycast IP method to find the nearest service front door, which normally resides at the edge of Microsoft’s global network. SharePoint and OneDrive traffic are connected through a distributed service front door where connectivity is performed locally and then optimized back to wherever the data is stored. This design means that users experience low latency to the service front door in most scenarios. Teams Media traffic has a complex set of connectivity possibilities depending on the scenario; however, it frequently uses the ability to use a local transport relay server to bounce calls off in some scenarios, regardless of where the tenant resides. Customer Example: Contoso Consider another example regarding Contoso’s office in Sydney, Australia. Contoso implemented a network model that uses ExpressRoute in Singapore to egress all Office traffic in the region to Microsoft and their MPLS circuits to backhaul traffic from Sydney to Singapore. Teams calls, Outlook connectivity and SharePoint/OneDrive traffic are all routed to a service front door near Singapore via Contoso’s expensive (and perhaps bandwidth constrained) MPLS circuit. This design has around 100ms of latency to reach the egress from Sydney then a few more milliseconds to hit the service front doors. At 100ms away at best, and with a longer routing path, jitter, packet loss, and the loss of key communication elements can occur, resulting in performance levels for Microsoft 365 below what Contoso and its users desire. By following the network connectivity principles and egressing at the very least, Optimize-marked endpoints locally in Sydney, latency to the service front door drops dramatically to around 5-10ms. This will result in a significant improvement in performance whilst also reducing the load on Contoso’s WAN and saves money when the MPLS circuit no longer needs the same levels of bandwidth. You can see a demonstration of the remarkable impact on Office 365 performance by delivering an optimal network model in this recorded Ignite session. Page | 13 ExpressRoute is not an availability solution and Internet access is required Some customers want to use ExpressRoute for their Microsoft 365 traffic to completely remove the reliance on public Internet connections for using the service. ExpressRoute for Microsoft 365 is not an availability solution, and reliable Internet connectivity is a critical dependency for using Microsoft 365. Some Microsoft 365 endpoints are not reachable via ExpressRoute, and since Microsoft 365 over ExpressRoute is not a private network connectivity solution (e.g., connections are to the same Internet-reachable public endpoints), clients must resolve endpoint names via Internet DNS. Moreover, certificate revocation list (CRL) checks performed by Microsoft 365 client applications use the Internet, and content delivery network (CDN) traffic (which is used to access static content associated with almost all the components of Microsoft 365) also requires the Internet as CDNs are often provided by third-party solutions that are not accessible via the Microsoft global network. If Internet connectivity is unavailable, Microsoft 365 is effectively unusable regardless of whether ExpressRoute is being used. It is simply not possible to use Microsoft 365 without Internet connectivity. Rather than trying to use ExpressRoute as an availability solution for Microsoft 365 traffic, we strongly recommended using local Internet break-out combined with geo-diversity of Internet connectivity and diversity of ISPs. This provides resiliency against failures due to service provider issues and natural disasters affecting a particular geographical area. The benefit of this investment also aggregates to other (non-Microsoft) services that may require the Internet. ExpressRoute is not a security solution ExpressRoute is and never was intended as a security solution for Microsoft 365. Microsoft 365 requires Internet connectivity, and as such, it operates with the highest levels of security in place to ensure data is transferred safely and securely over the Internet. Microsoft has published documentation that details encryption in the service and how Microsoft 365 protects data in transit using encryption. ExpressRoute does not provide any encryption levels or mechanisms greater that what Microsoft 365 provides natively over the Internet. There are also no differences between the public endpoints connected via the Internet or ExpressRoute path, and ExpressRoute is simply a routing override from the default Internet path for what is normally a small number of hops until the traffic reaches Microsoft’s network. ExpressRoute with public peering can be viewed as a secondary Internet connection with connections scoped to Microsoft IPs. As with any public network, it needs to be secured, monitored, and managed just like an Internet circuit. Where does using ExpressRoute with Microsoft 365 make sense? ExpressRoute for Microsoft 365 should only be used in rare, specific scenarios where the Internet path isn’t viable for all traffic to the service by the organization. This is limited to scenarios such as: 1. Where there is a regulatory requirement that can only be met by using ExpressRoute. This scenario is rare, as most regulations have been updated to recognize the use of cloud services and the Internet, and they typically require industry-standard protection mechanisms to protect traffic. Microsoft 365 complies with a large number of national, regional, and industry Page | 14 requirements and provides these protection mechanisms regardless of which connectivity path is used. Customers can also apply their own security controls where applicable regardless of network path. Before using ExpressRoute with Microsoft 365, it is important to verify that what may have been a regulatory requirement for private networking in the past, is still the case. 2. Where your Internet egress topology does not meet the minimum requirements for using Microsoft 365 now or in the future, while a specific network design based on ExpressRoute peering somehow overcomes these constraints. This scenario is also rare, as it is almost always more cost effective, quicker, simpler, and more extensible to upgrade/modernize the existing Internet connection than it is to use ExpressRoute with Microsoft 365 to solve the problem. 3. Where Microsoft 365 performance is affected by network deficiencies that only ExpressRoute can address. In this scenario, it is critical to clearly understand the root cause of any performance issues by performing a complete network assessment, and to confirm that an ExpressRoute network design will remediate the performance issue. Technical requirements for using ExpressRoute with Microsoft 365 There are a number of technical requirements to ensure effective use of Microsoft 365 over ExpressRoute. Detailed guidance can be found in Implementing ExpressRoute for Office 365 and the ExpressRoute FAQ. It is important to understand and consider the requirements below, and to better understand they are significantly greater than for using direct Internet connectivity, when determining whether using ExpressRoute with Microsoft 365 is right for your organization. Multiple circuits per region Microsoft highly recommends deploying multiple ExpressRoute circuits per region, peered in different locations to provide resiliency for the peering location and eliminate the peering location as a single point of failure. Without multiple circuits, you must provide a seamless Internet-based backup path capable of handling all Microsoft 365 traffic. Minimize network backhaul to the ExpressRoute circuit ExpressRoute should not be used for backhauling Microsoft 365 traffic over long distances. As outlined above, Microsoft has local resources in most metropolitan areas of the world and backhauling to a distant ExpressRoute circuit instead of using direct local Internet egress will cause significant performance degradation. Either additional local ExpressRoute circuits or local Internet egress will be required for locations that currently need to backhaul traffic over long distances. A thorough network assessment should be performed to discover which sites can and cannot efficiently use the ExpressRoute circuits. Use redundant Internet connections As stated above, Internet connectivity is critical for Microsoft 365 service operation even with ExpressRoute in place. Without working Internet connectivity, the user experience will degrade very quickly as users can’t reach critical elements such as DNS, CDNs, and CRL endpoints, and other Internetbased endpoints. It is therefore necessary to invest in redundant Internet connectivity to provide high availability for the service, regardless of whether or not ExpressRoute is used. Page | 15 Produce detailed network flow diagrams to manage asymmetric routing One of the most critical steps when using ExpressRoute with Microsoft 365 is to clearly map inbound and outbound connections to ensure symmetry for these connections. This is especially important for inbound flows such as Exchange Hybrid connectivity, as misconfigurations can easily cause a service outage when ExpressRoute is implemented. Note that mapping and designing symmetric routing can take many months to complete for large organizations. Design each circuit with unique public NAT pools As Microsoft 365 traffic uses Microsoft peering, which connects you to public IP space, NAT is required for traffic leaving customer networks. Each circuit should have a public NAT pool that is unique to that circuit and not advertised to the Internet. This ensures route symmetry and maintains connectivity to other Microsoft services. Microsoft will confirm ownership or authorized use of any IP blocks advertised via ExpressRoute before they can be used. If the subnets are not registered to your organization in the public registries, these checks will be done manually. Provide public Autonomous System Number (ASN) A public ASN is required, and Microsoft will confirm ownership. A private ASN can be used, but that requires manual validation and removes the ability to use AS PATH prepending to influence routing on circuits, which is a critical function for many customers. Public DNS availability Clients in your environment must have fully recursive public DNS available for the published FQDNs for direct connectivity via ExpressRoute. Best Practices for Connecting to Microsoft 365 Using ExpressRoute with Microsoft 365 isn’t the right option for most customers. For Optimal connectivity, Microsoft has developed a set of four networking connectivity principles for Microsoft 365 connectivity that are designed with both service performance and modern networking trends in mind. Adhering to the four principles will provide you with the optimal experience with Microsoft 365. The closer you can align to these principles, the better off you’ll be in terms of end-user experience now and in the future. The four principles are described below: Page | 16 Figure 7 - Microsoft 365 Network Connectivity Principles Optimize Office 365 traffic Office 365 URLs and IP address ranges are documented here. This data is also available via a REST-based web service, making it consumable for automation. Microsoft 365 endpoints are organized into three categories: Optimize, Allow, and Default. This first network connectivity principle is to use the endpoint categories (Optimize, Allow, and Default) to distinguish traffic destined for key Microsoft 365 endpoints from other Internet traffic and to apply optimizations to endpoints that require it. Optimize Endpoints Optimize marked endpoints are critical, Microsoft-owned, and hosted, high-volume, and latency sensitive. Microsoft strongly recommends that no traffic inspection or proxying of this traffic is performed, doing so is very likely to cause performance degradation, and issues at the proxy itself. Comprehensive IP ranges are provided for all endpoints and they do not change often. When they do change, we provide one month’s notice of the change beforehand. The Optimize endpoints have the following characteristics and requirements: • • • • • • • Highest impact on end user performance High volume Most sensitive to network latency/QoS Expect low rate of change Bypass of SSL break and inspect required Proxy bypass very strongly recommended First priority for local and direct Internet egress We strongly recommend that these critical endpoints are optimized as described in the networking connectivity principles document: • • • Exchange Online - 2 FQDNs/11 IPv4 subnets OneDrive and SharePoint - 2 FQDNs/ 5 IPv4 subnets Skype for Business and Teams: 3 IPv4 subnets Page | 17 Even though the number of endpoints in this category is small, they account for 70-80% of the volume of traffic to the service, and they are the most critical in terms of performance. These endpoints are fully controlled by Microsoft and they manage latency sensitive transactions for Exchange, SharePoint, Skype for Business, and Teams media flows. Allow Endpoints Allow endpoints are less critical than Optimize endpoints, but they are still required for the Microsoft 365 service to function. Like Optimize endpoints, Allow endpoints are Microsoft-owned and hosted, and Microsoft provides and publishes IP addresses for them. Allow endpoints tend to be more transactional and lower volume than Optimize endpoints. Allow endpoints change frequently, roughly once a month. Ideally these endpoints are not proxied like Optimize endpoints but proxying them is possible if you want to work on a staged move to direct egress by concentrating on Optimize endpoints first. We do recommend that any possible optimizations are implemented on these endpoints. Default Endpoints Default marked endpoints contain other dependencies, including third party endpoints, and may not be hosted by Microsoft. Microsoft also does not publish IP address information for endpoints in this category. You can treat this traffic as general web browsing traffic and proxy, backhaul, or inspect it as required. We find that a majority of customers proxy these endpoints. They key to this first principle is determining which traffic needs special treatment. Microsoft makes this simple to do using the Optimize, Allow, and Default categories. Optimize endpoints should be where you focus, as they have the potential for the greatest effect on performance for your users. Enable Local egress Once you have identified the traffic that needs to be optimized, the second principle is to enable egress as close as possible to the user. In addition to the benefits discussed previously in this document, traffic can be taken efficiently to wherever the endpoint is in the world on Microsoft’s managed network, and elements of services can be provisioned locally to improve performance. Figure 8 - Principle #2: Enable local egress Page | 18 In most metropolitan areas of the globe, the reach of the Microsoft global network and our peering agreements with 2700 + ISPs means that traffic is typically on the Internet only for a few hops before it’s on Microsoft infrastructure. This provides users with the highest levels of performance. It is also critical that DNS resolution be performed at the egress, as DNS is used to find the nearest service front doors. Performing DNS resolution outside the client’s geographical location often causes suboptimal connectivity to Microsoft 365. Enable Direct connectivity The third principle is to enable direct connectivity for the endpoints that are being optimized. Generally, the shortest, most direct route between user and closest Microsoft 365 endpoint offers the best performance. It is particularly important to avoid network hairpins. A network hairpin happens when WAN or VPN traffic bound for a particular destination is directed to another intermediate location (such as security stack, cloud access broker, or cloud-based web gateway), introducing latency and potential redirection to a geographically distant endpoint. Network hairpins can also be caused by routing/peering inefficiencies or suboptimal (remote) DNS lookups. To ensure that Microsoft 365 connectivity is not subject to network hairpins, verify that the ISP used to provide Internet egress has a direct peering relationship with the Microsoft Global Network in close proximity to the user’s location. You also should configure egress routing to send trusted Microsoft 365 traffic directly, as opposed to proxying or tunnelling through a third-party cloud or cloud-based network security mechanism that processes Internet-bound traffic. As noted, local DNS name resolution of Microsoft 365 endpoints helps ensure that, in addition to direct routing, the closest Microsoft 365 entry points are being used for user connections. Figure 9 - Principle #3: Enable Direct Connectivity If you use cloud-based network or security services for your Microsoft 365 traffic, ensure that any hairpins are evaluated and their effect on Microsoft 365 performance is understood. To do this, examine the number and locations of service provider locations through which the traffic is forwarded in relationship to number of your branch offices and Microsoft Global Network peering points, the quality of the network peering relationship of the service provider with your ISP and Microsoft, and the performance effects of backhauling in the service provider infrastructure. Page | 19 Due to the large number of distributed locations with Microsoft 365 endpoints and their proximity to users, routing Optimize-marked Microsoft 365 traffic to any third-party network or security provider will have a negative effect on Microsoft 365 performance. The following graph shows the difference in latency between user devices working across many distributed locations to the nearest Microsoft 365 service front door, for each user, via various connectivity options. Figure 10 - User to Front door latency via different connectivity models Instead of using average numbers, the graph represents the full spectrum of observed latencies and the corresponding percentage of users being observed. The focus on the 90th percentile helps to estimate the difference in performance/user experience that accounts for most customer users and locations. It also helps to estimate the network performance that the “bottom 10% of customer locations” are achieving – which is what typically correlates with the volume of user escalations and performance complains. In the above chart: • Red (1) represents a traditional backhaul/central proxy inspection path. We reach 240ms latency before 90% of connections are achieved. At this level of latency, many real-time and near real-time application experiences are considered unusable by users. • Yellow (2) represents connections via a security as a service (SECaaS) mechanism where latency is around 150ms before 90% of connections are achieve. At this latency, users may characterize many cloud experiences as slow and limiting to their productivity. • Blue (3) shows connections via an IaaS mechanism, which shows similar figures to the SECaaS scenario. Page | 20 • Green (4) and (5) represent direct egress that follows Microsoft’s network connectivity principles. For dark green 4, we see 70ms latency with 90% of connections under that figure. As Microsoft adds front door locations closer to users, we’re able to bring this green line forward with no changes required by customers. You can see this occurring in light green 5 below where latency is only 30ms for 90% of the connections reaching Microsoft. Under those latency conditions, most user experiences are fast and snappy. Modernize Security for SaaS The final principle to be implemented is to take a modern security approach for the optimized endpoints that bypasses the traditional approach. Enterprise customers should review network security and risk reduction methods for cloud services in general, and specifically for Office 365, they should use Microsoft 365 security features to reduce their reliance on expensive, intrusive, and performance draining network security technologies for cloud services. Most enterprise networks enforce network security for Internet traffic using technologies like proxies, TLS inspection, packet inspection, and data loss prevention systems that are often deployed to a small number of locations. These technologies were generally designed for the pre-cloud world and they provide important risk mitigations for generic Internet requests. However, they also dramatically reduce performance, scalability, and the quality of user experience when used to connect to Microsoft 365 endpoints or to any other trusted cloud service. Cloud services can also put a heavy load on these egress models, which were not designed for such use. This can also have an adverse effect on anything else using this environment. In short, these solutions do not scale or perform well when an enterprise starts to shift from on-premises to a cloud-based world where trust can no longer be attributed to a network. For Microsoft 365, security is available for many of the elements delivered by a traditional network egress security model. For most service endpoints, traffic can take a traditional approach; however, for Optimize-marked endpoints, the traditional approach should be bypassed. Below are common reasons traffic is run through a centralized model, and details on how to Microsoft 365 implements a more modern approach. Malware Detection By default, SharePoint automatically scans file uploads for known malware. Similarly, Exchange Online Protection scans email messages for malware. If your Microsoft 365 subscription includes Microsoft Defender for Office 365, you can also enable safe attachments to provide advanced protection against malware. If your organization uses Microsoft Defender ATP for endpoint protection, remember that each user is licensed for up to five company-managed devices. Data Loss Prevention To help prevent the accidental disclosure of sensitive information, Microsoft 365 has a rich set of built-in data loss prevention (DLP) tools. If required, you can use the built-in DLP capabilities of Teams and SharePoint to detect inappropriately stored or shared sensitive information. If part of your remote work strategy involves a bring-your-own-device (BYOD) policy, you can use Conditional Access to prevent sensitive data from being downloaded to users’ personal devices. Page | 21 Prevention of unauthorized access Multi-factor authentication (MFA) helps increase authentication assurance. We recommend requiring it for all users. If you are not ready to deploy to all users, consider entering an emergency pilot for higher risk or more targeted users. Learn more about how to use Azure Active Directory (Azure AD) Conditional Access to enforce MFA. You will also want to block legacy authentication protocols that allow users to bypass MFA requirements. Control of authorized user access To reduce the risk that would be posed by resident malware or intruders, you should ensure that only registered devices that comply with your organization’s security policies can access your environment. Learn more about how to use Azure AD Conditional Access to enforce device health requirements. To further increase your level of assurance, you can evaluate user and sign-on risk to block or restrict risky user access. Preventing access to other tenants You may also want to prevent your users from accessing other organizations’ instances of the Office 365 applications. If you use Azure AD tenant restrictions to do this, a small number of authentication endpoints need to traverse the proxy, for remote users this authentication traffic can traverse the VPN to reach the proxy. The use of tenant restrictions may be an element of the planning to ensure only authorized tenants are accessible by your users. Additional security features that may be of interest when changing your network egress model away to zero trust include Azure Sentinel and Microsoft Cloud App Security, which bolster security in several ways and allow you to move away from applying security at the network egress, as that model becomes less effective the more you move to the cloud. For more information about moving to a modern network egress model, see Alternative ways for security professionals and IT to achieve modern security controls in today’s unique remote work scenarios, which outlines common challenges when moving to this model, and the solutions for them. Blocking access to consumer services Restricting access to our consumer services should be done at your own risk. The method for blocking access to consumer services relies on blocking access to login.live.com from within the organization. This has consequences outside of consumer services such as consumer Outlook and OneDrive, as it prevents the authentication to anything which uses login.live.com. An alternative is to block specific endpoints such as outlook.live.com or onedrive.live.com. For more information, see Managing Office 365 endpoints. What does a modern, Internet-first, enterprise network look like? Traditional enterprise networks generally form a hub and spoke model where traffic from remote sites and users is routed via a WAN such as an MPLS circuit, back to a centralized point where inspection and security controls are applied before the traffic can leave the corporate network. This model worked well in the pre-cloud world when most traffic was for services and data hosted within the corporate network. As the world moves to the cloud, this model makes it difficult for customers to rapidly adapt to changing needs and requirements. Microsoft consistently sees enterprises struggle with this model because it no longer serves their needs well. The global pandemic rapidly amplified this issue, and many customers Page | 22 found that this model was unable to deliver successfully when many remote workers were suddenly using the network at a scale it was never designed for, and it was unable to rapidly adapt to. Below is a depiction of this traditional connectivity model, which is represented by red lines (1a, 1b). In this model, all traffic to Microsoft 365 is hair pinned, backhauled, and proxied, and therefore it carries significant latency and bottlenecks. In short, this model does not adhere to the four network connectivity principles for Microsoft 365. Figure 11 - Connectivity/Egress options Instead of using outdated and inefficient network connectivity models, you should allow your cloud partners like Microsoft handle security and efficiency so you can move away from a reliance on centralized on-premises network equipment. When enterprises break away from the legacy connectivity model illustrated by 1a and 1b, the first challenge is providing security for connections coming over the Internet. This is where cloud based SECaaS solutions such as secure web gateway services illustrated by 2a and 2b can help solve this problem. This model provides significant advantages over the on-premises model for general web connectivity, as it allows remote sites and users to connect to SECaaS resources close to where they are, without the need for VPN or routing through on-premises WAN networks. It provides scalability on demand while still allowing for central control and security for user connectivity. That said, for key Microsoft 365 connectivity elements, such as the Optimize-marked endpoints, paths 2a/2b are not optimal. Microsoft strongly recommends that this traffic is sent directly to Microsoft 365 to deliver the highest level of performance to your users. Microsoft is constantly moving the edge of our network and cloud services closer and closer to users and ISPs; therefore, sending this critical traffic Page | 23 directly allows customers to continue to reap the benefits of our improvements, while reducing the burden of complex traffic routing through intermediaries. For office locations, many customers are removing traditional MPLS circuits to centralized egress locations and replacing them with SD-WAN solutions in branch offices. This model allows for direct Internet egress, which can be coupled with the existing MPLS circuits where needed. This enables critical traffic to Microsoft 365 (and other cloud services) to be egressed locally and therefore use Microsoft’s local resources that are close to the user where appropriate. Customers can also use this model to create a connection from the local SD-WAN to SECaaS services to provide local Internet connectivity for other traffic. Customer example: Contoso Consider the example below, which provides local egress in Sydney. Users in that branch office can now reach Microsoft service front doors in 10ms versus around 120ms using a backhaul model to the Singapore head office egress. An added benefit is that many customers reap considerable cost savings by significantly reducing the size and cost of their MPLS circuit while freeing up remaining capacity for business-critical traffic that requires it. Figure 12 - Modern network architecture example Microsoft works closely with key network partners to identify products and solutions that adhere to the Microsoft 365 network connectivity principles. An example may be an SD-WAN vendor that integrates automatically with the REST-based web service containing the endpoint information for Microsoft 365. These solutions can automatically receive changes from Microsoft’s web service and dynamically allow direct egress for selected traffic, such as the Optimize-marked endpoints. The current partners and their solutions can be found at our Microsoft 365 Network Partner site. This same model of selective local offload can easily be adopted in remote worker scenarios because the model used for branch offices is the same as that used for a remote user. Once you shift to a zero-trust Page | 24 model that doesn’t rely on-premises elements for handling security or routing, your ability to use more efficient local paths for traffic increases. Microsoft’s guidance for VPN split tunnelling is outlined in detail here and includes a guide on how to achieve this for the most popular VPN partners. Testing your enterprise connectivity Microsoft has released several comprehensive network connectivity tools that allow customers to gain detailed visibility into their network connectivity to the service. This consists of a standalone tool that can used to look at connectivity from a particular device, and the new network connectivity dashboard in the Microsoft 365 admin center that shows connectivity from corporate locations and provides connectivity advice where connectivity does not align with the network principles outlined above. Customers can use this data to highlight key areas for improvement of connectivity. FAQ Below is a summary of common questions around using ExpressRoute with Microsoft 365. These questions are answered in detail throughout this document. Does Microsoft recommend using ExpressRoute with Microsoft 365? No, using ExpressRoute with Microsoft 365 is not advised for most customers. How Long does it typically take to implement ExpressRoute for use with Microsoft 365? There are many issues that must be resolved before ExpressRoute can used with Microsoft 365. Microsoft typically sees a timeframe of six months or more using a large technical team. Can ExpressRoute keep all my Microsoft 365 traffic off the internet and remove my need for an active internet connection? No, Internet connectivity is a critical dependency for Microsoft 365 regardless of the use of ExpressRoute. Without Internet connectivity, Microsoft 365 becomes unusable. Does ExpressRoute allow Microsoft 365 usage when my Internet links are down due to DDoS attack? No, redundancy and protection are required on your Internet links whether you use ExpressRoute. Do I need to provide public DNS to my users? Yes, clients must be able to resolve public IP addresses to connect to Microsoft 365 via ExpressRoute. It is strongly recommended not to proxy core traffic via any connectivity model. Do I need public IP space to use ExpressRoute for Microsoft 365? Yes, unique NAT pools that are large enough to deal with the volume of connections from users and which are not advertised to the Internet are required for each circuit. Do I need a public ASN? Yes, if you want to use AS PATH prepending to influence routing from Microsoft to your network edge, which many customers do. A private ASN can be used, but that removes the ability to influence routing using AS path prepending. Page | 25 Does ExpressRoute connect me directly into Microsoft’s datacenters? No, ExpressRoute connects you to Microsoft’s global network backbone which routes traffic to/from its destination globally. Is a single ExpressRoute circuit sufficient? No, as a single circuit provides a single point of failure if the peering location suffers an outage. We recommend implementing multiple circuits in different peering locations to avoid single points of failure. Can I use ExpressRoute for inbound initiated connectivity from Microsoft 365? Yes, however, this means that Microsoft 365 is only accessible from Microsoft sources, which may not be possible in cases where connectivity to resources needs to be available via the Internet. Routing Microsoft 365 traffic inbound via ExpressRoute adds significant complexity and risk to connectivity, without adding much value. Does ExpressRoute provide me with additional security for Microsoft 365 traffic? From a Microsoft 365 perspective, ExpressRoute is not a security feature. Microsoft 365 is designed to operate securely over the Internet, and ExpressRoute does not change that in any way. Next Steps if I think using ExpressRoute with Microsoft 365 is the right approach for my organization Microsoft only authorizes the use of ExpressRoute with Microsoft 365 in certain rare scenarios. If you’ve read this document and still think using ExpressRoute with Microsoft 365 is right for your organization, please speak with your Microsoft account team and your request will be reviewed. Your Microsoft account team is required to assist you through the onboarding process. Further Guidance Please review the Microsoft 365 Network Connectivity Video Series. This video series provides in depth coverage of many of the elements discussed in this paper, including: • • • • • • • Network Connectivity Principles overview Enterprise network design for the cloud era Remote workers (VPN split tunneling) Networking Partner Program Using the IP URL web service Tooling (Admin center dashboard, network onboarding tool) Zero Trust – Security for the Cloud Era Page | 26