How to prioritize metrics as an e-commerce CTO DANNY MILES CTO, Dollar Shave Club ABOUT THE AUTHOR Danny Miles joined Dollar Shave Club as the company’s first Chief Technology Officer in 2017. Previously, Danny served as the Vice President of Direct-To-Consumer Technology at Nike, and has more than twenty years experience in architecting, implementing and launching leading-edge technical solutions. As CTO, I am accountable for the reliability and performance of Dollar Shave Club’s site. When I get to the office in the morning, I open up several tabs that display my metric dashboards. This morning ritual usually takes me about 10 minutes. I do this not to find problems but to get situational awareness to start my day. Site reliability and performance is just one of my re- 02 — 12 sponsibilities, but it’s a foundational area that the whole business depends on. The graphic below illustrates why site reliability and performance is so important. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO 1 SITE RELIABILITY AND PERFORMANCE 2 ON-SITE USER BEHAVIOR METRICS (AKA WEB ANALYTICS) 3 UPPER MARKETING FUNNEL METRICS 4 BUSINESS PERFORMANCE METRICS If the site isn’t doing well, users won’t be moving through the on-site sales funnel and converting (impacting the metrics in ring 2). Then, big drivers of traffic to our site like Google and Facebook will see that we aren’t converting, because we report that information back to them (impacting the metrics in ring 3). In addition, any ad spend will have been wasted. Then, traffic will dry up and business metrics (ring 4) will take a big hit. That’s why site reliability and performance is such a critical area to the business: any issues get significantly amplified beyond just the loss of a sale/ conversion. Incidentally, I’m also accountable for the metrics 03 — 12 in the order shown in the graphic. Therefore, I check my metrics in this order during my morning routine. Sometimes I’ll see anomalies and trends that might not get picked up by alerts. If I do see something that doesn’t look right or something I don’t understand, I’ll ping a member of the team. But, generally, first thing in the morning is not the time to be discovering performance issues: this is a level-setting exercise to begin my day. In the sections below, I dive into each of my metric areas and describe the specific metrics I look at in each ring during my morning routine. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO Top site reliability and performance metrics I could get quickly bogged down in system performance data but, as CTO, that’s not my best use of time. Because I sit between the engineering team and other business functions, I need metrics that tell me whether we’re meeting our obligations to the business and hitting our goals. Over the years, I’ve zeroed in on some key metrics in the site reliability and performance space that matter to me most. Below, I describe the metrics I check regularly. 01 APPLICATION-LEVEL METRICS I want to see some key application-level metrics. It’s important to go through the basics of the e-commerce application and make sure it’s serving users (and users are behaving on it) as expected. Active sessions We care a lot about this metric. This is the stateful part of the site where people are actually logged in and creating carts. If active sessions drops off you know something is wrong. Carts created If you’re not seeing people create carts or orders for a period of time, that’s a signal that something’s amiss. Orders processed This tells me we’re able to make money and also tells me at what rate we’re making money. I consider security to be a part of site reliability and performance engineering. While the following metrics provide indications of site functionality, I also use them to assess security issues. 04 — 12 Logins per day Both failed and successful. This number should be fairly predictable. Unexpected changes may indicate a security threat or breach. Payment declines If I get a spike in the number of payment declines, it could be an outage in our payment processing service, a spike in fraud activity, or a code or API break. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO Changes to account This is a big one from a fraud perspective. Account takeover threats have become fairly common in the direct-to-consumer e-commerce space. Members don't frequently change primary account fields like their email, password, address, or payment information. This is especially the case on a subscription site where members configure most of their settings at sign-up. If account changes spike above normal you may have a malicious actor on your hands. While these metrics are important to the business they’re also important to the tech team because they can signal technical issues. Next, I’ll go a level deeper into system-level metrics. 02 SYSTEM-LEVEL METRICS Uptime We have a “Sunglasses Man” who pops up every once in a while telling customers the site’s not available. Our goal is to never see Sunglasses Man. We should be getting alerts about problems way before Sunglasses Man needs to show up. My view on site downtime is a bit different than what I typically hear, but I think it reflects the reality of our digital business. For me, downtime isn’t as much about losing new orders and conversions, although it does certainly affect those metrics. In my experience, the biggest impact of downtime on an e-commerce site is the lost marketing dollars and wasted effort driving traffic to the site. Marketing campaigns drive our business. The worst thing that can happen is an outage during a big campaign we’re running around March Madness or, when I was at Nike, during the Olympics. If our site isn’t available or performant 100 percent during those times the impact can ripple through partners, search engines, and ad servers. Not only have you wasted your ad spend 05 — 12 but you are promoting poor quality of your brand while your digital presence is down. The impact of downtime through the big ad platforms that drive traffic to our site, like Facebook and Google, lasts beyond just the time you are offline. We report back in real time to Facebook, and their algorithms get tuned based on whether people are landing on our site and completing our checkout. If we have an outage on our site or if we stop reporting back to our ad partners, their algorithms may stop serving impressions and it will take days to re-tune and scale the traffic back up. Outside of our programmatic buying platforms, the lead times for TV and radio spots are not something you can halt during an outage so you are literally throwing money away at that point. So, downtime has a lot of ripple effects. It’s not just losing new customers and orders—it’s all the investment you’re putting into marketing, search, media, and social. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO Requests & Traffic I want to understand the load on the infrastructure and whether it matches up with expectations. I discussed several traffic metrics above. There are a few more technical traffic metrics that I look at: Rate limits I want to see if we’re triggering any of our rate limits on key APIs. For example, if a login endpoint is spiking over 1,000 times per second from a specific network or client, that’s potentially a sign of malicious activity or an application error. Unique users on the site How many unique users are on the site and does it match up with expectations? I have week-overweek and year-over-year comparisons for this metric, as well as a predicted range. Requests served from edge versus origin (cache hits versus cache misses) I look a lot at traffic served at the edge, via our CDN, versus our core infrastructure. I want to make sure requests are hitting the cache. If too many requests are coming all the way to your origin web server for images and other content, then something may not be right and your end users could be experience slow performance. A lot of activity should be caught at the edge via CDN, which helps if you’re experiencing a DDoS attack. Latency The site needs to be fast, otherwise users will become frustrated and leave the site before they complete their purchases. I primarily focus on two performance metrics: Response times 06 — 12 For any API or any endpoint, I want to know what are the response times. Response times can directly correlate to your conversion rates. In my experience, when response times start to deteriorate, they don't usually improve without intervention. Before you get to a point where your website’s HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO not up, you’re usually going to see a slowdown somewhere. It’s very rare that the site just goes ‘click’ and it’s off. More often than not, things start to slow down and time out. And then it ultimately results in not being able to load the application. Page load time I focus a lot on the time it takes for pages to actually be useful to the user, which is the time it takes to load and render a piece of content and for the page to be interactive. Page load time is a measure of speed from the user’s vantage point, and this information is captured by monitoring the client side. The time it takes to transact or get through an API is more likely to degrade after we’ve pushed changes into production. This would in turn have business impact. I recommend having an “always on” offense when it comes to speed as the continous deployment of new code, content, pixels, and tracking can quickly shift your site from a 2-3 second load time to a 7-8 second load time where you’ll see your conversion rate go off a cliff. I like to further break down page load time by device and by browser to ensure that users are having similar experiences. I’ve definitely experienced situations where certain device types and browsers have prolonged load times and we’ve had to engineer device-specific fixes, especially now that most brands' traffic is coming from mobile devices that are less performant than desktops and tablets. Errors Errors will always occur on the site, but I really need to be aware of any systemic flaws that are blocking users from viewing content, using features, and making purchases. Are we getting malformed requests or behaviors in the request that don’t make sense? I check 8-10 error rates which capture key user requests and Tier 1 services. 07 — 12 Then I check error message patterns to see any underlying issues. I just described the site reliability metrics I pay the most attention to as CTO. Next, I describe some of the “outer ring” metrics I pay the most attention to. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO Metrics the engineering team supports but does not own While platform reliability and performance is my team’s responsibility, there are several other metric areas where my team has some degree of shared accountability with other teams. I check these metrics next. This will vary slightly from one e-commerce firm to another and from one CTO to another. Here’s a peek at what matters to me at Dollar Shave Club. CTO + Digital Product: On-site user behavior metrics (Web Analytics) As Dollar Shave Club’s CTO, I manage our software development lifecycle. I spend a lot of time dealing with outstanding tickets, defects, and building out new features and functionality. However, the Digital Team, headed up by a peer of mine, establishes the roadmap and vision for the content and functionality of the site. My team implements that vision. So, the look, feel, and functionality of the site is a partnership between my team and the Digital Team. 08 — 12 Our collective goal is to move users seamlessly through the site towards a purchase. This is all about pulling the right levers through tests, design, content, and features to increase AOV [average order value] and conversion rates. Many people associate this layer of monitoring with web analytics. Here are some of the metrics that matter to me in this layer. If any of these metrics deviate from expectations there may be a technical issue (but not necessarily). Time on site This is active browsing time on the site. Bounce rate This is the percentage of visitors who land on a page and leave the site without proceeding to another page on the site. Number of pages visited This is the number of pages visited per session. Product/content views This tracks what users are looking at and how often. Cart adds/removals This tracks products added and removed from carts across experiences. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO Time to conversion This is the amount of time it took for the user to make a purchase after landing on the site. Abandonment rate This is the percentage of site visitors who left the site without making a purchase. Also important to ask is: Are people abandoning the site in particular places? This will help us reduce leakage and keep users moving towards a purchase. A/B Testing Groups We are typically running multiple feature and content experiments across the site. Comparing the metrics above by testing group is key. All of these metrics are a level up from the purely technical because they involve UX, content, and design. We’re constantly adding new site instrumentation to capture user behavior data. Every time the company comes up with a new theme or campaign there’s work to do. Also, new features that my team is responsible for developing constantly impact these metrics. CTO + Marketing Partnership: Upper marketing funnel metrics The upper marketing funnel takes place largely off the site, and is primarily owned by the Marketing Team. However, the Technology Team supports the Marketing Team in two main ways: setting up robust instrumentation to capture data and marketing analytics platforms. SETTING UP INSTRUMENTATION AND MAINTAINING DATA PIPELINES As I explained above, it’s critical to ensure our data pipelines back to referrers like Google, Facebook, and other large drivers of site traffic 09 — 12 are operational. Any number of things can disrupt reporting back to ad platforms. The two most common issues are: Misfiring pixels The Javascript we load in the browser from a site like Facebook isn’t working or running correctly. Misconfigured pages Deployed pages, for example new landing pages, on our site aren’t reporting accurately. HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO The way to find problems with reporting back to referrers is to set up robust alerts in your web analytics platform. For example, if I’m serving ad impressions, but not converting at normal or expected rates, there’s likely something wrong with the site or the instrumentation. I would immedi- ately want to stop spending money on ads until the problem is resolved. As a last resort, Google and Facebook will stop serving ads and driving traffic to the site, but I’d prefer to know well before that happens. ANALYTICS: MULTI-TOUCH ATTRIBUTION Assuming everything is reporting back as designed, we still have a lot of work to do to help optimize our ad spend through analytics. One of the tough things to capture and the most elusive in the e-commerce space is “multi-touch attribution” (MTA). This is about understanding what it costs to get someone to our site. It’s the holy grail for marketing when we can truly understand what impressions visitors saw on Facebook, organic search, and paid search and how they got to the Referrer tracking site. Did this person see a billboard, TV commercial, or Facebook ad? Where did they come from and how much did we spend to get them to our site? This is a complicated instrumentation and analytics exercise. It is the responsibility of the digital and marketing teams to drive this work, but my team supports them technically to stitch all the vendor data together with our onsite analytics. I use this to closely monitor where traffic is coming from, such as paid ads, organic (direct) search, or our social and email campaigns. CTO + Operations Partnership: Post-funnel metrics My job isn’t over when the customer converts on the site. I also need to make sure we’re performing after the order is taken and doing a proper handoff to fulfillment. That means making sure the orders are processed and flowed to the warehouse and that credit cards were billed Orders sent to warehouse (expected and actual) 10 — 12 properly and on time. For a subscription business, we process orders for customers automatically on a recurring basis though large batch jobs that happen every night. Here are the primary metrics I track in this area: I pay close attention to the number of orders I expect to send to the warehouse, and the number actually received by the warehouse. If we miss a day, that’s bad because people are waiting idle at the warehouse one day and have a huge backlog the next day (sometimes requiring extra resources to get everything shipped out). HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO Payments processed (expected and actual) Because we are a subscription business, there are some unique things I pay attention to. The worst day of the year for us is February 28 because we need to process payments and orders for all subscriptions for the 28th, 29th, 30th, and 31st. As a subscription business, people expect to be charged and receive email confirmations consistently each month. If we start moving around the day, that creates a poor customer experience. I pay close attention to the number of payments I expect to process, the number actually processed, and the number that failed to process. Then we need to dig into the number that failed to process and find out why. Similarly, holidays also add complexity to our operations. Any issues with order or payment processing have ripple effects for fulfillment, operations, and customer service. Business performance metrics where technology has an impact The technology team plays a key role supporting business performance metrics, but so does virtually every other team. These metrics are lagging indicators of technology issues: I do not want to see technology issues reflected in business metrics. Technology issues should be detected 11 — 12 well before they materially affect the business. If we do our part on the areas I described above, the business performance metrics will follow. Here are the top business metrics the tech team ultimately supports: Average order value (AOV) The average dollar value for orders in a given time period. Numerous technology issues can affect AOV, from site availability to site speed to errors to payment processing. Order rate/Ship rate (orders per hour/day) AOV tell us that we’re making money. Orders per hour/day tells us if we’re making money at the rate we should be. Again, numerous technology problems can slow down our order rate. Conversion rate The percentage of users who complete a certain action (usually a purchase). In my experience, conversion rate is tricky to talk about because it depends on the team and the context. A marketer might HOW TO PRIORITIZE METRIC S AS AN E- COMMERCE C TO might care about the conversion rate of a single email. Others might care about the conversion rate of all site visitors (and even then there are different methods: some people include bounces, others do not). The key is to get on the same page quickly about what version of conversion rate you’re talking about and what assumptions you’re making. Churn/cancellation rate We closely monitor the rate of customer cancellations and try to correlate cancellations to root causes. Technology issues leading to poor customer experiences could very well be a cause of churn if your recurring processes and systems are not reliable and predictable. Return rate We closely monitor the order return rate and try to correlate cancellations to root causes. Technology issues can sometimes interfere with getting an order right. Technology problems create friction that will drive these business metrics in the wrong direction. This is why I’m obsessive about my platform reliability and performance metrics. Conclusion CTOs from e-commerce and retail firms swim in a sea of business and system health data. These leaders are ultimately responsible for very granular technical performance metrics (like disk space), but they also play a vital role supporting key business performance metrics (like 1 2 — 12 conversion rate). That’s why I wrote this article: to offer what I think is an efficient framework for thinking about and prioritizing metrics, so that e-commerce CTOs can check what matters in 10 minutes, and then go about their days. Datadog empowers organizations to easily monitor and secure their cloud-scale infrastructure and applications. Our SaaS-based observability platform provides real-time visibility into high-scale, dynamic IT environments. With 600+ out-of-the-box integrations, Dev, Sec, and Ops teams can simplify and automate their cloud operations, securely deliver software, and ensure exceptional digital experiences. Join the thousands of companies worldwide who trust Datadog to accelerate their go-to-market efforts and stay ahead of the competition. START YOUR FREE TRIAL TODAY