Apigee Monitoring and
Operations Strategy
<Company Name>
©2014 Apigee. All Rights Reserved.
The purpose of this document is to:
Describe key elements of Apigee’s monitoring and operations.
Define the specific strategy for monitoring and operations for your API program.
This document will provide high-level information on what needs to be monitored, what needs to be done to define your
monitoring solution and position the Apigee support model and support processes.
The following topics are covered:
Monitoring Roles
Description of split of responsibilities for monitoring between Apigee and the Customer.
Monitoring Strategy
High-level description of the approach the Customer will be taking for their specific
monitoring implementation.
Operations Strategy
High-level descriptions of the Customer’s operational teams and processes.
Operational Support
High level overview of Apigee’s support processes and a framework for defining your
Further Reading
Additional reference information.
Hierarchy of Monitoring
General Monitoring
Strategy – Best Practices
Core concepts of monitoring and operations along with key definitions.
Collection of high-level core monitoring and alerting approaches and principals.
©2014 Apigee. All Rights Reserved.
Monitoring Roles
The detail required for Monitoring and Operations is dependent on whether the Customer is configuring Apigee onpremise or utilizing Apigee hosted solution.
Some of the sections in this template are divided into:
On-Premise specific content
Hosted specific content
Generic content applicable to both hosted and on-prem.
When completing this document, delete the headers / content that are not applicable.
On-Premise Specific:
Monitoring and Operations for Apigee on-premise requires implementation appropriate controls covering the infrastructure and
run-time components of Apigee, for example: servers, Cassandra, Zookeeper, Postgres, loadbalancer, router, etc in addition
to the API-level monitoring. A highly detailed operations guide exists for this which will be made available to help you
implement appropriate monitoring.
You will be responsible for monitoring, alerting and responses to alerts on all aspects of the Apigee stack, from the base
infrastructure that Apigee software runs on, through the API proxies, Developer Portal and any other Apigee components all
the way to your API program service.
Apigee will play a limited role in your day-to-day operations, but will be available to provide additional remote support in the
event that you experience an issue which may require our assistance.
Hosted Specific:
When utilizing the hosted Apigee solution, you do not need to monitor the infrastructure and run-time components of Apigee.
The Apigee Global Support Center (GSC) will monitor and manage the operations of the Apigee infrastructure and run-time
components and play an active role in day-to-day operations. If you experience a service issue, then Apigee Support Team will
become involved in investigating and addressing the service issue.
You will need to integrate Apigee Support Processes into your existing Support Processes.
Common: API Monitoring
A key part of ensuring your service is working and, in the event of service issues, is being able to rapidly identify where the
issue is, so that you can focus your efforts on that area, is a well-structured API monitoring capability.
API monitoring can be implemented in a variety of ways but a key objective should be to define a monitoring strategy that
enables rapid isolation of the source of any issues. The remainder of this document provides a framework for doing that by
describing what needs to be monitored and describing Apigee’s role in operations.
©2014 Apigee. All Rights Reserved.
Monitoring Strategy
Identify the specific types of monitoring that will be utilized for the Customer’s specific APIs.
Passive End to End Service Monitoring:
What, if any, passive monitoring will be utilized / which APIs will be monitored and what are the key specific metrics being
Active End to End Service Monitoring:
What, if any, active monitoring will be utilized / which APIs will be monitored, where and how will traffic be generated and what
are the key specific metrics being monitored?
Back-End Service Monitoring:
What specific independent back-end service monitoring will be in place?
Threshold Definition and Management:
What are the key thresholds?
Alerting Management:
What mechanisms are going to be used to generate alerts, e.g. email, SMS, visual displays? What, if any, alert aggregation
will be utilized to apply prioritization rules to alerts?
Operations Strategy
Identify the Customer’s operational teams and processes.
Customer’s Operations Team Overview:
Description of operations team, including geographic region locations?
Customer’s Operational Processes:
High-level summary of core operational processes?
Integration with Apigee:
Description of how the customer’s operational teams will utilize Apigee Support, process changes and identification of any
training needs on how Apigee Support works?
Apigee Support
Brief overview of Apigee Support and how to integrate into your existing support processes. There are two views
expressed here, one that is relevent to On-Prem Customers and one for Hosted Customers. Delete whichever is not
As an on-premise Customer, you will be responsible for the day-to-day operations and monitoring of all aspects of your Apigee
solution. The on-premise monitoring guide document provides you with all the details you need to get set up.
In the event that you experience an issue with your API service, your ideal first step is to review your monitoring and isolate
the source of the issue (e.g. back-end server offline, rogue app creating traffic spike, Apigee gateway failure, etc.). If you
suspect that the issue is related to your Apigee software and you are unable to diagnose further, then please engage Apigee
Support. The nature of on-premise deployments typically means that it is difficult to impossible for Apigee Support Engineers
to be able to directly log into your Apigee servers. You will need to be aware of access issues and utilize remote sharing
technologies as required in order for Apigee Support Engineers to be able to review your installation.
©2014 Apigee. All Rights Reserved.
As a hosted Customer, Apigee will be constantly monitoring all cloud instances and will detect most service-impacting issues
automatically and initiate response protocols to resolve the service incident rapidly.
In the event that you experience an issue with your API service, the first step you should take is to review any existing
monitoring along with Apigee Analytics and attempt to isolate the source of the service issue. If you are unable to determine
the source and / or you determine that an Apigee component is not functioning, please engage Apigee Support.
Please refer to the following Further Reading material for details on the Apigee Support Processes including Service Levels
and Escalations.
Further Reading
Additional details are available on our website: apigee.com.
Landing page for a wealth of best practices, strategies, training materials and
Collection of videos covering multiple topics related to getting up and running
with Apigee. Of particular relevance is a video “Introduction to Support”.
Detailed support level agreement.
Product documentation.
©2014 Apigee. All Rights Reserved.
APPENDIX: Hierarchy of Monitoring
Simplified hierarchy of monitoring.
There is no universally applicable monitoring tool or process. End-to-end service delivery is served utilizing multiple layers of
software and hardware that all work together but have very different monitoring and alerting solutions.
Your current IT systems will be operating over and existing infrastructure and software potentially comprising a mix of 3 rd party
hosted elements, open source elements, custom in-house elements and an uncontrolled wider developer community.
The picture below presents a simplified way of abstracting that complexity, by decomposing the end to end journey into three
The Consumption layer, illustrated by the Client(s) and / or App(s) that consume your services
The Digital Enablement layer, Apigee, providing API management, analytics, monetization, app services and insights
The Exposure layer, illustrated by the Back-End Services, that expose your internal data
In turn, each of those layers themselves are comprised of a series of layers:
Infrastructure, the physical and virtual compute layers
Run Time, the utility software, comprising operating systems, databases, web servers, etc
Services, the consumable software, comprising back end services, Apigee services, clients & end-user apps
Each layer serves a unique role in your end to end service, but each layer has different key operating metrics that are used to
determine whether that layer is functioning as expected. Metrics from monitoring at each layer is combined together in order to
provide an overall view of the health of your end to end service.
Great monitoring enables you to prioritize issues, trigger alerts when preset thresholds are breached and in doing so, predict
and take pro-active action to resolve issues before they threaten service. In the event that an issue becomes service affecting,
©2013 Apigee. All Rights Reserved.
great monitoring can make the difference in enabling rapid identification of the source of the service issue and in turn shorten
the duration of the service issue.
APPENDIX: General Monitoring Strategy – Best Practices
Decomposing your monitoring to enable the combination of overall service monitoring and isolation of components.
The most important aspect of monitoring is measuring service in the same way that Client(s) and / or App(s) are consuming it
or observing how those Client(s) and / or App(s) are currently consuming it, i.e. real-time usage analytics. This is your front line
in knowing whether your API is working.
Passive vs. Active Monitoring
With Passive monitoring, you are watching the current usage patterns. You are looking out for discrepancies, for example a
sudden reduction or increase in traffic volumes, a sudden increase in error rates or latency. Apigee Edge comes with a
sophisticated Analytics capability that enables creation of highly customized, near real-time passive monitoring.
With Active monitoring, you are injecting specific requests that mimic the requests that Client(s) and / or App(s) are making
and then you are measuring the responses.
The advantage of Passive monitoring is that you are not adding additional transactional load / overhead to your system but the
disadvantages are that it is dependent on establishing usage patterns which, in turn, is dependent on having sufficient
volumes of traffic to provide large enough sample sizes. Conversely, if your traffic volumes are very high, you may have
aggregation lags as large volumes of data are analyzed and you may not detect an issue immediately.
The advantage of Active monitoring is that you know that your ‘uncertainty’ window is – it is the duration between two
monitoring requests and you can tune those requests so that you get rapid detection of service issues. The disadvantages are
that you need to invest in custom load generating solutions, you are unlikely to be able to generate traffic from every potential
source and the load that you are generating can mask real traffic patterns that could be useful to derive business insights from.
A third alternative is a hybrid of Active and Passive, where your utilize both and tune them to achieve a balance between load
and detection rate.
As such there is no single, ideal monitoring solution for end to end service and you will need to work through the pros and
cons of different options to arrive at the one that best meets your needs.
Utilizing Monitoring to Drive Alerts by Setting Thresholds
The next important aspect of your monitoring strategy is how you utilize the metrics you are monitoring. Even simple end to
end services have multiple components, each of which could be generating large amounts of monitoring metrics. You can very
rapidly become overwhelmed with data.
Defining thresholds is an important step in taming the data. Through thresholds, you can identify what sort of alerts get raised
when measures start to breach thresholds.
Smart Alerting – Building Intelligence into Alert Prioritization
The final aspect is building an intelligent rule-based alerts processing system. Here, you identify what combination of alerts
need to trigger immediate action or delayed action or simply informational. For example, if a single disc fails in a multi-disc
storage array, you may want to trigger a procurement request and a low priority support request for an engineer to replace the
faulty disk but there is no service issue. However, if multiple disks inside the same array fail, the priority of the fix and the
urgency of the procurement increases. Smart, rule-based alert post-processing can help ensure focused and appropriate
©2013 Apigee. All Rights Reserved.
About Apigee
Apigee empowers enterprises to thrive in today’s digital world by transforming digital assets into innovation engines. Results
are delivered as apps, optimized by data and catalyzed by APIs. Hundreds of companies including Walgreens, eBay, Shell,
Bechtel, Marks & Spencer & Vodafone partner with Apigee to accelerate their digital transformations.
To learn more, visit www.apigee.com
©2013 Apigee. All Rights Reserved.