High Performance Logging System for Embedded UNIX and GNU

High Performance Logging System for Embedded UNIX and GNU/Linux Applications IEEE RTCSA 2013 (8/21/13) Cisco Systems Jaein Jeong Introduction - Embedded UNIX in many places Traditional UNIX Logging System App log Process App log Process syslogd … App log Process syslog File System USER KERNEL Buffer 2 / 25 Problem Statement - Apps slow down w. large amount of logging • Long latency to logging daemon • Inefficiency of unbuffered writes to flash FS • Long latency even with output buffering App log Process App log Process syslogd … App log Process Flash Logger syslog Flash File System USER KERNEL Buffer Named pipe 3 / 25 Our Approach • Faster Message Transfer • Compatibility with Existing Logging Apps • Destination-Aware Message Formatting 4 / 25 Organization • Related Work for UNIX Logging Systems • Background – Cisco UCS and Virtual Interface Card (VIC) – Evolution of VIC Logging System • Design Requirements and Implementation • Evaluation and Optimization • Conclusion 5 / 25 Related Work - Logging Methods for UNIX Apps • Not designed for embedded/flash logging – Slow msg passing (msg copying over kernel) – Unbuffered message writes Rsyslog • An extension used in latest distros • Multi-threading. Syslog-ng • An extension based on nsyslogd • Reliable transport, encryption, and richer set of information and filtering Syslog • Introduced in early 80’s • Still most notable one 6 / 25 Background - Cisco UCS and Virtual Interface Card Cisco UCS server Cisco UCS datacenter server system 10GBASE-KR Unified Network Fabric, 1 to Each Fabric Extender Cisco UCS Virtual Interface Card (VIC) Mgmt CPU MIPS proc core (500MHz, MIPS 24Kc) VIC ASIC Mgmt CPU FCPU 0 Embedded Linux (Linux kernel 2.6.23-rc5) FCPU 1 128 Programmable Virtual Interfaces Ethernet NICs Fibre Channel HBAs 7 / 25 Background - Evolution of VIC Logging System App Process log App Process log logd … App Process log Flash Logd – a simple logging daemon Buffered • Logging from syslogd Switch JFFS2 System Process … App Process log Flash Flash Logger syslog JFFS2 USER KERNEL log App Process log Switch System Process System Process syslogd … System Process … log Flash syslog Switch syslogd log App Process App Process Multiple Processes System Process • Different Severity Levels System App Process log • Process Formatting and flash writing … App Process Switch Unbuffered syslogd JFFS2 USER KERNEL Buffer • Improves Forwards serious flash write msgs performance to switches • Functional, of unbufferedbut syslogd with worse write • performance Still suffers long latency 8 / 25 Buffer Named pipe Organization • Related Work for UNIX Logging Systems • Background – Cisco UCS and Virtual Interface Card (VIC) – Evolution of VIC Logging System • Design Requirements & Implementation • Evaluation & Optimization • Conclusion 9 / 25 Design Requirements - Faster Message Transfer • Avoid kernel-to-user space msg copying Switch Syslogd Logging App Process log App Process log Switch System Process System Process syslogd … App Process log KERNEL … System Process App Process Switch Flash App Process System Process log mqlogd log dequeue … JFFS2 USER Switch System Process App Process Flash Logger syslog Mqlogd Logging Named pipe KERNEL System Process enqueue log Flash Memory Mapped File Flash Logger JFFS2 USER Buffer … 10 / 25 Named pipe Design Requirements - Faster Message Transfer • Reduce message copying from 4 to 2 3 Syslogd copy 1 App locallocal copy 4 Write to named pipe 2 kernel buffer 2’ Write directly from shared memory to named pipe 1’ to shared memory Switch Syslogd Logging App Process log App Process log System Process System Process 1 syslogd … App Process Switch 2 3 log syslog 4 Switch Switch System Process … App Process System Process App Process Flash App Process Flash Logger System Process log log … JFFS2 USER KERNEL Mqlogd Logging 1’ mqlogd dequeue enqueue 2’ log Named pipe KERNEL System Process Flash Memory Mapped File Flash Logger JFFS2 USER Buffer … 11 / 25 Named pipe Design Requirements - Compatibility with Existing Logging Apps • Thru Logging API – Replace syslog() with share memory lib calls Logging Client app1 mcp app2 fls • Direct Syslog Calls – Server receives msgs through UDP Unix socket Logging Client … klogd fls xinetd … Logging API : log_info(), log_error(), … syslog() library call syslog() call Shared library Memory Logging UDP UnixLibrary Socket UDP Unix Socket Logging Server (mqlogd) (Syslogd) Logging Server (mqlogd) (Syslogd) 12 / 25 Design Requirements - Destination-Aware Message Formatting • Syslogd – Working but limited – Redundant – Coarse time granularity (in seconds) • Mqlogd – Destination-aware formatting with space saving – Uses system supported timing (in micro-seconds) 13 / 25 Implementation - Shared Memory and Circular Queue Queue Memory Layout Header Entry Circular Queue Header Non-Header Entry Non-Header Entry Notification Disable Flag Non-Header Entry … Logging Event Logging Enqueue Client … Logging Client • Notification Mechanism Shared Memory Dequeue Logging Server Notification – Write-and-select – Signal • Locking Mechanism – Semaphore lock – Pthread lock 14 / 25 Organization • Related Work for UNIX Logging Systems • Background – Cisco UCS and Virtual Interface Card (VIC) – Evolution of VIC Logging System • Design Requirements & Implementation • Evaluation & Optimization • Conclusion 15 / 25 Evaluation • Metrics – Request Latency – Request Drop Rate • Parameters – Number of clients – Number of iterations (Depth of queue size) – Locking mechanism – Notification mechanism 16 / 25 Performance Results - Performance compared to syslogd • Avg Latency: >10x speed-up • Min Latency: >20x speed-up • Max Latency: >2x speed-up Maximum Minimum Average Request RequestLatency Latency- -11Client Client Latency (us) 800 50000 40000 600 30000 400 20000 200 10000 syslogd mqlogd (select, semaphore) mqlogd (signal, semaphore) mqlogd (select, pthread) 0 0 100 100 1000 1000 5000 5000 10000 10000 Number NumberofofIterations Iterations 50000 50000 mqlogd (signal, pthread) 17 / 25 Performance Results - Effect of Queue Size • No drops within queue size (e.g. 10000) • Queue size should be larger than max expected burst size Request Drop Rate - 1 Client 100% Percent 80% 60% mqlogd (select, semaphore) 40% mqlogd (signal, semaphore) 20% mqlogd (select, pthread) 0% mqlogd (signal, pthread) 100 1000 5000 10000 Number of Iterations 50000 18 / 25 Performance Results - Effect of Multiple Clients • Avg request latency increases proportionally • With 2 clients, request starts to drop with smaller number of iterations Avg Request Request Drop Latency Rate - -11and and22Clients Clients (us) Latency Percent 2000.0 100% 80% 1500.0 60% 1000.0 40% 500.0 20% syslogd (1 client) syslogd (2 mqlogd (select, clients) 1 client) mqlogd (select, 2 1 clients) client) 0% 0.0 mqlogd (select, 2 clients) 100 100 1000 1000 5000 5000 10000 10000 Number Numberof ofIterations Iterations 50000 19 / 25 Performance Results - Effect of Notification Mechanisms • Makes little difference Maximum Minimum Average Request RequestLatency Latency- -11Client Client Latency (us) 100 25 20000 20 80 15000 15 60 10000 10 40 5000 20 5 mqlogd (select, semaphore) mqlogd (signal, semaphore) 00 0 100 100 100 1000 1000 1000 5000 5000 5000 10000 10000 10000 Number Number Number of ofof Iterations Iterations Iterations 50000 50000 20 / 25 Performance Results - Effect of Lock Mechanisms • Pthread mutex is 40% faster than semaphore. • Semaphore is used for our production code due to a limitation of pthread mutex lock (Linux kernel 2.6.23-rc5).. Maximum Minimum Average Request RequestLatency Latency- -11Client Client Latency (us) 250 150 20000 200 15000 100 150 10000 100 50 5000 50 mqlogd (select, semaphore) mqlogd (select, pthread) 0 0 100 100 1000 1000 5000 5000 10000 10000 Number NumberofofIterations Iterations 50000 50000 21 / 25 Performance Results - Effect of Client Interface Type • Logging using UNIX socket interface – Backward compatibility is no faster – About the same level as syslogd. – For compatibility, not for general use. Average Request Latency - 1 Client Latency (us) 1000 800 600 syslogd 400 mqlogd (select, semaphore) 200 mqlogd (Unix socket) 0 100 1000 5000 10000 Number of Iterations 50000 22 / 25 Optimization - Effects of deferred notification • Sends one notification for a batch of msgs • Measured time for host-to-adapter commands (capability & macaddr) with and w.o. logging • 2x speed-up in latency Latency for 'capability' command Latency for 'macaddr' command 1000 1800 1600 1400 1200 1000 800 600 400 200 0 800 logging time (us) msg xfer time (us) logging time (us) 600 400 msgxfer xfer msg time(us) (us) time 200 0 write-and-select deferred syslogd write-and-select deferred syslogd 23 / 25 Future Works • Reduce kernel msg copying even further • Improve performance with faster lock • Avoid loss of serious messages App log Process App log Process enqueue App log Process … Memory Mapped File mqlogd dequeue Memory Mapped File Flash Logger File System USER KERNEL Named pipe 24 / 25 Conclusion • • • • Logging system for embedded UNIX apps Up to 100x speed-up in latency, 10x throughput Backward Compatibility Commercially used in Cisco UCS Virtual Interface Cards 25 / 25

High Performance Logging System for Embedded UNIX and GNU

Related documents

Products

Support

High Performance Logging System for Embedded UNIX and GNU

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib