Cloud Native NetflixOSS Services on Docker

advertisement
Cloud Native NetflixOSS Services
on Docker
Andrew Spyker (@aspyker)
Sudhir Tonse (@stonse)
Agenda
• Introduction
– NetflixOSS, Cloud Native with Operational
Excellence, and IBM Cloud Services Fabric
• Docker Local Port
• Docker Cloud Port
About Andrew
• IBM - Cloud Performance Architecture and Strategy
• How did I get into cloud?
– Performance led to cloud
scale, led to cloud platforms
– Created Mobile/Cloud Acme Air
– Cloud platforms led to NetflixOSS,
led to winning Netflix Cloud Prize
for best sample application
– Also ported to IBM Cloud - SoftLayer
– Two years focused on IBM Cloud
Services Fabric and Operations
• RTP dad that enjoys technology as well as
running, wine and poker
@aspyker
ispyker.blogspot.com
About Sudhir
• Manages the Cloud Platform
Infrastructure team at Netflix
• Many of these components have been
open sourced under the NetflixOSS
umbrella.
• Sudhir is a weekend golfer and tries to
make the most of the wonderful
California weather and public courses.
NetflixOSS on Github
• NetflixOSS is what it
takes to run a cloud
service and business
with operational
excellence
• netflix.github.io
–40+ OSS projects
–Expanding every day
• Focusing more on
interactive mid-tier
server technology today
NetflixOSS Categorized
OR
Other
IaaS
Netflix OSS – Application Container/Services
Hystrix Dashboard
Function
IPC (smart
LB)
App Instance
NetflixOSS Library
REST Framework/Bootstrapping/DI
Karyon/Governator
Functional Reactive Programming
RxJava
Resiliency/Fallback
Hystrix
RPC (Routing/LB)
Eureka
Server(s)
Service Requests
Ribbon/Eureka
Distributed Co-ordination (Zookeeper)
Curator
Distributed Caching
EVCache
NoSQL (Cassandra) Persistence
Astyanax
Data
Access/Caching
Cassandra
Metrics Dashboard
Monitoring
Turbine
Metrics
Servo
Logging
Blitz4J
Properties/Configuration
Archaius
Config/Insights
Elastic, Web and Hyper Scale
Doing This
Not Doing That
Source: Programmableweb.com 2012
Elastic, Web and Hyper Scale
Durable
Storage
Load
Balancers
Front end API
(browser and mobile)
Booking
Service
Authentication
Service
Temporal
caching
Strategy
Benefit
Make deployments automated
Without automation impossible
Expose well designed API to users
Offloads presentation complexity to clients
Remove state for mid tier services
Allows easy elastic scale out
Push temporal state to client and caching tier
Leverage clients, avoids data tier overload
Use partitioned data storage
Data design and storage scales with HA
HA and Automatic Recovery
Feeling This
Not Feeling That
Highly Available Service Runtime Recipe
Execute
auth-service
call
(REST services)
Call “Auth Service”
Ribbon REST client
with Eureka
Hystrix
Web App
Front End
Eureka
Eureka
Server(s)
Eureka
Server(s)
Fallback
Implementation
Micro service
Implementation
Server(s)
App Service
(auth-service)
Karyon
Implementation Detail
Benefits
Decompose into micro services
•
•
Key user path always available
Failure does not propagate across service boundaries
Karyon /w automatic Eureka registration
•
•
New instances are quickly found
Failing individual instances disappear
Ribbon client with Eureka awareness
•
•
Load balances & retries across instances with “smarts”
Handles temporal instance failure
Hystrix as dependency circuit breaker
•
•
Allows for fast failure
Provides graceful cross service degradation/recovery
IaaS High Availability
DAL01
Datacenter (DAL06)
DAL05
Global Load
Balancers
Eureka
Local LBs
Web App
Auth Service Booking Service
Region (Dallas)
Cluster Auto Recovery and Scaling Services
Rule
Why?
Always > 2 of everything
1 is SPOF, 2 doesn’t web scale and slow DR recovery
Including IaaS and cloud services
You’re only as strong as your weakest dependency
Use auto scaler/recovery monitoring
Clusters guarantee availability and service latency
Use application level health checks
Instance on the network != healthy
Only proof is testing!
Chaos Testing
DAL06
✗
Datacenter (DAL05)
Global Load
Balancers
DAL01
Local LBs
Web App
Eureka
✗
Auth Service
Booking Service
Region (Dallas)
Cluster Auto Recovery and Scaling Services
Chaos Gorilla
Videos: bit.ly/noss-sl-blog, http://bit.ly/sl-gorilla
Continuous Delivery
Not This
Reading This
Continuous
Delivery
Continuous
Build Server
Baked to SoftLayer
Image Templates
(or AMI’s)
Cluster v1
Canary v2
Cluster V2
Step
Technology
Developers test locally
Unit test frameworks
Continuous build
Continuous build server based on gradle builds
Build “bakes” full instance image
Imaginator (Aminator inspired) creates SoftLayer images
Developer work across dev and test
Archaius allows for environment based context
Developers do canary tests,
red/black deployments in prod
Asgard console provides app cluster common devops
approach, security patterns, and visibility
Operational Visibility
If you can’t see it, you can’t improve it
Operational Visibility
Incidents
Uptime
Hystrix/Turbine
Web App
Servo
Metric/Event
Repositories
Auto Service
LogStash/Elastic
Search/Kibana
Visibility Point
Technology
Basic IaaS instance monitoring
Not enough (not scalable, not app specific)
User like external monitoring
SaaS offerings or OSS like Uptime
Service to service interconnects
Hystrix streams Turbine aggregation Hystrix dashboard
Application centric metrics
Servo gauges, counters, timers sent to metrics store
Remote logging
Logstash/Kibana
Threshold monitoring and alerts
Services like PagerDuty for incident management
Current IBM Cloud Services Fabric
7. Uptime
Service
3. Region (us-south-1)
3. Datacenter (DAL01) – Fabric services are clustered across 3 DC’s
2. Global Load
Balancers
3. Datacenter (DAL05) – Apps are clustered across 3 DC’s
Datacenter (DAL06)
5. Asgard
Service
2. Local LB
Service
Devops
6. Imaginator
Service
Tested base
images /w
agents
Your
built code
Code and Image Build
Currently
VM based
Your front end
service
A service you
depend on
Your mid tier
service
8. Logstash
Kibana
4. Cluster Auto Recovery and Scaling Services
1. Eureka
Agenda
• Introduction
• Docker Local Port
– Lessons Learned
– Open Source
• Docker Cloud Port
Demo Start
Start demo
loading here
Docker “Local” Setup
Region (docker-local)
Docker-local-1b
Docker-local-1c
Users
Devops
(admin)
Load Balancer
(Zuul)
Datacenter
(docker-local-1a)
Devops Console
(Asgard)
Acme Air
Web App
Web App
Acme Air
Auth Service
Auth Service
Service
Discovery
(Eureka)
Cassandra
Cluster Auto Recovery & Scaling Service (Microscaler)
Skydock
SkyDNS
Blue and green boxes are container instances
Node
Why Docker for our work?
• Because we could, actually …
– To show Netflix cloud platform as portable to non-VM clouds
– Help with NetflixOSS understanding inside of IBM
• Local Testing – “Cloud in a box” more production like
– Developers able to do larger scale testing
– Continuous build/test tool systems able to run at “scale”
• Public Cloud Support
– Understand how an container IaaS layer could be implemented
• So far, proof of concept, you can help continue
– More on that later (hint open source!)
Two Service Location Technologies?
Web App
Front End
Execute
auth-service
call
(REST services)
Call “Auth Service”
Ribbon REST client
with Eureka
Micro service
Implementation
DockerHost
Server(s)
App Service
(auth-service)
Karyon
Skydock
Eureka
Event
API
SkyDNS
Eureka
Eureka
Server(s)
Eureka
Server(s)
Docker
Daemon
Auth Service
Micro Service
Service Location Lessons Learned
• Both did their job well
– SkyDNS/SkyDock for container basic DNS
• Must be careful of DNS caching clients
– Eureka for application level routing
• Interesting to see the contrasts
– Intrusiveness (Eureka requires on instance/in app changes)
– Data available (DNS isn’t application aware)
– Application awareness (running container != healthy code)
• Points to value in “above IaaS” service location registration
– Transparent IaaS implementations struggle to be as application aware
• More information on my blog http://bit.ly/aws-sd-intr
Instance Auto Recovery / Scaling
• Auto scaling performs three important aspects
– Devops cluster rolling versions
– Auto recovery of instances due to failure
– Auto scaling due to load
• Various NetflixOSS auto scalers
– For NetflixOSS proper – Amazon Auto Scaler
– For SoftLayer port – RightScale Server Arrays
– For Docker local port – we implemented
“Microscaler”
Dockerhost
Auth Service i002
Auth Service i001
Web App i002
Web App i001
Microscaler
Microscaler
REST or CLI
Microscaler Agent
Microscaler Agent Architecture
Docker
Remote
API
• OSS at http://github.com/EmergingTechnologyInstitute/microscaler
• Microscaler service, agent are containers
• Microscaler has CLI remote client and REST interface
• Note:
– No IBM support, OSS proof of concept of auto scaler needed for local usage
– Works well for small scale Docker local testing
Microscaler CLI/REST usage
•
•
•
•
•
•
Login CLI:
– ms login --target <API URL> --user user01 --key key
Login REST:
– curl -X POST -H "Content-Type: application/json" -d '{"user":“user01","key":“key01"}'
http://localhost56785/asgcc/login
– {"status":"OK","token":"a28e7079-db0b-4235-8b9b-01c229e02e9a“}
Launch Config CLI:
– ms add-lconf --lconf-name lconf1 --lconf-image-id cirros --lconf-instances-type m1.small --lconf-key key1
Launch Config REST:
– curl -X POST -H "Content-Type: application/json" -H "authorization: a28…e9a" -d
'{"name":"mylconf","image_id":”img1","instances_type":"m1.small","key":"keypair"}'
http://locahost:56785/asgcc/lconfs
– {"status":"OK”}
ASG CLI:
– ms add-ms --ms-name asg1--ms-availability-zones docker01,docker02 --asg-launch-configuration lconf1 --asg-mininstances 1 --asg-max-instances 3 --asg-scale-out-cooldown 300 --asg-scale-in-cooldown 60 --asg-no-loadbalancer--asg-domain docker.local.io
– ms start-ms --ms-name asg1
ASG REST:
– curl -X POST -H "Content-Type: application/json" -H "authorization: a28…e9a" -d
'{"name":”asg1","availability_zones":[”az1"],"launch_configuration":”lconf1","min_instances":1,"max_instances":
3}' http://localhost:56785/asgcc/asgs
– {"status":"OK“}
– curl -X PUT -H "Content-Type: application/json" -H "authorization: a28e…e9a”
http://localhost:56785/asgcc/asgs/myasg/start
– {"status":"OK”}
Working with the Docker remote API
• Microscaler and Asgard need to work against the “IaaS” API
– Docker remote API to the rescue
– Start and stop containers, query images and containers
• Exposed http://172.17.42.1:4243 to both
– Could (should) have used socket
– Be careful of security once you do this
• Found that this needs to easily configurable
– Boot2docker and docker.io default to different addresses
• Found that current API isn’t totally documented
– Advanced options not documented or shown in examples
– Open Source to the rescue (looked at service code)
– Need to work on submitting pull requests for documentation
Region and Availability Zones
• Coded Microscaler to assign availability zones
– Via user_data in an environment variable
– Need metadata about deployment in Docker eventually?
• Tested Chaos Gorilla
– Stop all containers in a single availability zone
• Tested Split Brain Monkey
– Jepsen inspired, used iptables to isolate Docker network
• Eureka awareness of availability zones not there yet
– Should be an easy change based on similar SoftLayer port
Image management
• Docker and baked images are kindred spirits
• Using locally built images - Easy for a simple demo
• Haven’t yet pushed the images to dockerhub
• Considering Imaginator (Aminator) extension
– To allow for Docker images to be built as we are VM’s
– Considering http://www.packer.io/
– Or maybe the other way around?
• Dockerfiles for VM images?
Using Docker as an IaaS?
• We do all the bad things
– Our containers run multiple processes
– Our containers use unmanaged TCP ports
– Our containers run and allow ssh access
• Good
– Get all the benefits of Docker containers and images
– Only small changes to CSF/NetflixOSS cloud platform
• Bad
– Might not take full advantage of Docker
• Portability, container process optimizations, composability
• Considering more Docker centric approaches over time
Where can I play with this?
# on boot2docker or docker.io under virtual box Ubuntu
git clone http://github.com/EmergingTechnologyInstitute/
acmeair-netflixoss-dockerlocal
cd bin
# please read http://bit.ly/aa-noss-dl-license
./acceptlicenses.sh
All Open Source
Today!
# get coffee (or favorite caffeinated drink), depending on download speed ~ 30 min
./buildsimages.sh
# this is FAST! – but wait for about eight minutes for cross topology registration
./startminimum.sh
# Route your network from guest to docker network (http://bit.ly/docker-tcpdirect)
./showipaddrs.sh
# Look at the environment (Zuul front end, Asgard console, Eureka console, etc.)
Browse to http://172.17.0.X
Docker “Local” Setup
Region (docker-local)
Docker-local-1b
Docker-local-1c
Users
Load Balancer
(Zuul)
Service
Discovery
(Eureka)
Datacenter
(docker-local-1a)
Show demo here
Devops
(admin)
Devops Console
(Asgard)
Acme Air
Web App
Web App
Acme Air
Auth Service
Auth Service
Cassandra
Cluster Auto Recovery & Scaling Service (Microscaler)
Skydock
SkyDNS
Blue and green boxes are container instances
Node
Agenda
• Introduction
• Docker Local Port
• Docker Cloud Port
– Lessons Learned
Dockerhost
DAL05 Datacenter
SoftLayer Private Network
Docker
Remote
API
Dockerhost
Microscaler Agent
Skydock
Auth Service i004
Auth Service i002
Web App i004
Web App i002
Microscaler Agent
Skydock
Auth Service i003
Auth Service i001
Web App i003
Web App i001
SkyDNS
API Proxy
Skydock
Asgard
Microscaler
Cassandra
Eureka
Zuul
Registry
Docker Cloud on IBM SoftLayer
Dockerhost
Docker
Remote
API
DAL06 Datacenter
Networking
• Docker starts docker0 bridge to interconnect single host instances
• We assigned the subnet of the bridge to be a portable subnet
within our SoftLayer account within a VLAN
– We routed all traffic to the actual private interface
• This allows network to work seamlessly
–
–
–
–
Between datacenters
Across hardware firewall appliances
To external load balancers
To all other instances (VM’s, bare metal) in SoftLayer
• This allowed for easy networking between multiple Docker hosts
Docker API and Multi-host
• Once you have multiple Docker hosts
– You have multiple Docker remote API’s
• Wrote “API Proxy” to deal with this
• Not the best solution in the world, but worked
• Considering how this works with existing IaaS API
– Single SoftLayer API handles bare metal, virtual machines
– How to keep the API Docker compatible
• Maybe other more Docker centric approaches coming?
Image Management
• Currently using standard Docker private registry
• Considering how this could be integrated with
SoftLayer Image management system
– Use optimized cross datacenter distribution network
– Expose Docker layered versions through console
• Again, important to not lose Docker value in
image transparency and portability
Dockerhost
DAL05 Datacenter
SoftLayer Private Network
Docker
Remote
API
Dockerhost
Auth Service i004
Auth Service i002
Web App i004
Web App i002
Microscaler Agent
Skydock
Dockerhost
Demos 1-1 today or
tomorrow at Jerry’s session
Microscaler Agent
Skydock
Auth Service i003
Auth Service i001
Web App i003
Web App i001
SkyDNS
API Proxy
Skydock
Asgard
Microscaler
Cassandra
Eureka
Zuul
Registry
Docker Cloud on IBM SoftLayer
Docker
Remote
API
DAL06 Datacenter
Questions?
Download