Message Driven Architecture for Massive Service Elastic Scalability, High Availability 2011.11.18 박혜웅 Massive Service Think different No good solution for all cases Good Bad 디자인이 이쁘다. 귀가 무겁다. 선이 없어어 편하다. 가끔 끊긴다. 겨울에 귀가 따뜻하다 여름에 귀에 땀이 난다. 3 Cloud Architecture • Elastic Scalability (기민한 확장성) – 시스템 부하에 따라 빠르게 확장,축소할 수 있어야 한다. • • High Availability (고가용성) – 가용성이 99%와 99.999%는 매우 큰 차이이다. • • • – • Availability = 서비스 가능 시간 / 전체 시간 99.999% (무중단 시스템) – downtime: 26초/월 (약5분/년) – 원자력 발전소 서비스 정기 점검도 장애시간(downtime)에 포함됨. Single Point Of Failure 을 제거하는 것이 중요. Automatic Resource Management (자동 리소스 관리) – • 부하의 종류에 따라 확장할 수 있는 아키텍쳐가 필요하다. Resources: CPU, MEM, Disk... Self-healing (자동 복구/치료) "클라우드 컴퓨팅 구현 기술(김형준 외)"의 p66에서 발췌 4 What we need for Massive Service? • coupled vs decoupled architecture – decoupled architecture • • • systems for removing SPOF – for All System • – – health-checking script for RDBMS/NoSQL • • Hadoop/HBase dual namenode (next version, 0.23) MySQL cluster or MySQL replication( + heartbeat) or MySQL multiple-master blocking vs non-blocking (synchronous vs asynchronous) – – • distributed coordinator for Load balancer • • distributed data cache distributed message queue blocking(synchronous): easy coding, big resources non-blocking(asynchronous): hard coding, small resources multi-thread(single-port) vs single-thread(multi-port) – advantage of single thread cheap server • • No locking, No Synchronization easy to coding 5 What we need for Massive Service? • low cost – money • • – time: • • – • hardware based vs software based commercial software vs free software development & debugging management human resouces performance tunning – Linux options • – JVM options • – stress test socket options • – Xms, Xmx, GC option the number of processes, threads (each system) • – ulimit, ... TCP_NODELAY, SEND/RECV_BUFFERSIZE... RDBMS/NoSQL options 6 What we need for 칼퇴근? • experts for each technical area = DRI(Directly Responsible Individual in Apple Inc.) – coding & interface • – DB & storage • • • – Google Protocol Buffer, Guice, Log4j, Slf4, Xstream, Jackson, Java mail, .... system management • – coordinator(Zookeeper) cache server(Redis, Memcached, Ehcache) queue server(RabbitMQ, ZeroMQ) util software • – MapReduce, machine learning distributed system software • • • – Java NIO, Netty data analysis • – RDBMS(MySQL, MyBatis) NoSQL(Hbase) storage(DAS, NAS, HDFS, Haystack ...) network & threading • – code convention, design pattern, UML Linux, monitoring tools, JMX hardware • L4 switch 7 What we need for 칼퇴근? • fast & easy development/debugging – good architecture • • • – common util classes • – JUnit well-known system or not? • • • • Apache Commons, Google Guava,... Test Driven Development (TDD) • – system architecture design pattern code convention RDBMS vs NoSQL JSON vs Google Protocol Buffer JUnit vs Guice easy management – logging system • – – logging, collecting, parsing, log visualization JMX Admin/Monitoring tools or web pages 8 many Kinds of Decoupling • decoupling(removing) of SPOF and our system – Distributed Coordinator process process Coordinator SPOF SPOF SPOF process process • decoupling of business logic and data – Distributed Cache process logic process data logic DB Cache data process logic process DB data logic data • decoupling of function and control(message) – Message Queue process function function 9 process Queue process function message function the steps of Decoupling (step1) • Distributed Coordinator – registry: important data (small size) • • • – server status server configuration common data removing SPOF from our system Coordinator registry process process function function function function data data data data registry DB DB process process function function function function data data data data registry 10 the steps of Decoupling (step2) • Distributed Data Cache – fast read/write in memory • – alleviate DB overload • • – – – 10~100times faster than DB query. read query: read cache instead of DB. write query: lazy update for DB with write-through queue. remove duplicated data remove overhead of data synchronization among processes. fault tolerant system • no matter what process terminated in the same cluster. Coordinator registry Coordinator registry process function function data data process function function Cache data DB data data cluster data process function process function process function function data data function DB data 11 function the steps of Decoupling (step3) • Distributed Message Queue – scale out (elastic scalibility) • • – fault tolerant system • – but lazy processing system monitoring • Coordinator registry when all process terminated, message queue server preserves messages. prevent server overload or failure. • – auto scaling by fan-out exchange rule. light-weight processes(daemons). just monitor queue status. process function Coordinator registry function Cache Cache data data data data data data process Queue process function message function cluster process function process function DB data function DB data 12 process Queue process function message function Scale Out cluster cluster Coordinator registry node Cache data data data node node node cluster node Cache cluster data cluster data node data node node node process process task #1 Queue message function n connections function node Queue task #2 message node node node DB data 13 message message work queue process function node node node SEDA vs Message Driven Architecture process data/heap area global variable thread Queue thread function event function thread Queue thread function event function SEDA thread data DB data service node node node node node process Queue process function message function Cache data node node process Queue process function message function DB data 14 Coordinator registry MDA code of Message Driven Architecture • simple chatting service (simple client-server based model vs MDA) /** Simple Client-Server Model **/ /* Send Thread */ myInfo = xml.getInfo(xmlFile); // from local file db.setAlive(myInfo); // updates server status /** Message Driven Architecture **/ /* Send Thread (Process) */ myInfo = Zookeeper.getInfo(zookeeperList, myIp, myPort); Zookeeper.setAlive(myInfo); servers = connectAll(relayServers);//connects to other servers. queue = Queue.getQueue(myInfo.queue); cache = Cache.getCache(myInfo.cache); while( (input=client.getInput()) !=null ){ roomInfo = localData.getRoomInfo(client.userId); for( userId: roomInfo.getUserIds() ){ for( server : servers ){ if( server.hasUser(userId) ) server.send(userId, input); } } } while( (input=client.getInput()) !=null ){ roomInfo = cache.getRoomInfo(client.userId); for( userId : cache.getUserIds(roomInfo.no) ){ queue.publish(new Message(userId, input)); } } /* Receive Thread */ while(true){ message = socket.receive(); // from other server user = localData.getUser(message.userId); //from local client = getClient(message.userId); client.send(user.name + ":" + message.input); } /* Receive Thread (Process) */ while(true){ message = queue.consume(); // from queue user = cache.getUser(message.userId); // from cache client = getClient(message.userId); client.send(user.name + ":" + message.input); } inter-server networking (p2p) queueing/dequeuing (work queue) 15 Summary 개발자 관점 Client-Server Based 시스템/역할 분담 Message Driven 서비스별 기능별 (e.g. API, file, DB, logging, ....) 개인 전문성 비지니스 로직(서비스 흐름) 기술적 지식 서비스 개발 개인별 협업 없어도 개발 시작 가능 process간 연동 문서 필요 inter-process interface (queue) shared data scheme (cache) 약함 (개인별 프로젝트 진행) 긴밀 (한 서비스를 위해 구성원 대부분의 협의 필요) 모든 개발자 일부 담당자 선행 개발 문서(필수) 팀내 의사소통 타부서와 협의(PM) 기획/마케팅팀 디자인팀 클라이언트팀 PM (service manager) API Part inter-process interface (queue) Logic Part inter-process interface (queue) shared data scheme (cache) 16 DB Part Summary 시스템 관점 Client-Server Based Message Driven 서버간 복잡도 매우 복잡 (서버끼리 모두 연결 필요) 덜 복잡 (coordinator, cache, queue에만 연결) 확장성/효율성 낮음 (불필요한 로직도 구동) 높음 (간단한 로직의 process만 구동) 서버 업데이트 어려움 (전체 패치만 가능) 쉬움 (queue서버가 임시로 task 저장가능) (상위 버전용 process 미리 구동 가능) 서비스 단위 장애 부분 장애 (로직의 크기에 따라 다름) 프로토콜 수정 쉬움 (함수 재정의) 어려움 (message scheme를 공유해야 함) 서버상태/로깅 서비스별 (개인별) 중앙식 (queue 서버만 모니터링/로깅하면 됨) 비즈니스 로직 모든 비즈니스 로직 가능 loop또는 rollback이 필요한 비즈니스 로직 어려움. 코드 복잡도 복잡 단순 (간단한 로직 단위) 코딩 스타일 서비스별로 다름 기능별로 다름 process내에 다양한 모델이 공존 process종류별로 다른 Thread모델 사용 서비스 장애 Thread/Worker Model 17 Appendix Think deeply Single-thread vs Multi-thread • Multi-thread – I/O intensive task (blocked task) process thread thread DB thread data • Single-thread – CPU/Mem intensive task (non-blocked task) process thread data process thread MEM data process thread data 19