Maintaining Large Vista Installations Amy Edwards, Ezra Freelove, & George Hernandez July 12, 2007 Agenda • • • • • • • 2 Comparisons Who is USG Automation Monitoring Maintenance More Tricks Questions? Informal Poll - Number of nodes (All prod clusters) now: • 1-10 • 11-20 • 21-50 • 50-70 • 70+ • Ours in bold 3 • (All prod clusters) by December: • 1-10 • 11-20 • 21-50 • 50-70 • 70+ Informal Poll – Number of DB Instances Including secondary and non-production • • • • 1-2 3-6 7-10 10+ • Ours in bold 4 Vista Architecture 5 GeorgiaVIEW Project • University System of Georgia (USG) • Vista 3.0.7 • Host 32 institutions & multiple consortial programs • >150,000 active students – Active is 100+ actions • >11,000 active sections / term 6 Issues • • • • • • 7 Handling performance issues Capacity planning Upgrades Replication JMS sensitivity Integration Automation • Rolling Restarts – Managed nodes restarted weekly • except JMS • Log cleanup to preserve space • Error reporting – application, tracking, vulnerabilities • Thread dumps • Sync admin node with backup • LDIS batch integration 8 Monitoring • Nagios – http://www.nagios.org/ – Sends alerts • Stats – Custom AJAX web app – Watch changes of over time • AWStats – http://www.awstats.org/ 9 Nagios Example 10 Nagios Monitors • OS / Hardware – Load – Temperature – Free space • Database – Tablespace free space – Listener – Oracle processes 11 • Application – Direct-login – Weblogic processes – Java MBeans • Default/Primary Pending Requests Current Count • Java Heap Current • JDBC Waiting for Connection Current Count • Multicast Messages Lost • Primary count Stats • Short and long term analysis – 21 months of data • Graphs all Nagios data collected • Flexible creation of reports • Built with AJAX 12 Stats Examples I of III 13 Stats Examples II of III 14 Stats Examples I of III 15 AWStats • Records data from web server logs • Custom script grabs data from webserver.log files • Runs daily 16 AWStats Examples I of II 17 AWStats Eamples II of II 18 Specialized Nodes • Admin • JMS • Institutional Admin – Integration • Chat 19 JMS Node • Provides special services – Mail, LC creation, chat • Failure or migration of JMS node hinders usage • Services do not migrate well – Allow targeted migration – OTHERS: Pin JMS to a specific node 20 Integration • Batched LDIS data files • Cron runs nightly • Files broken up by: – type – “reasonable” number of records • Done on Inst node – Issues with import can kill node 21 Touching Nodes • ssh & dsh – Touch groups of nodes at once – Useful for: • Installs • Gathering logs • Locating a session 22 Maintenance Page • Hosted on opposite f5 • Two versions – Scheduled maintenance – Unscheduled outage • In an f5 outage, move DNS to other f5 so message still appears 23 Installs and Upgrades • Silent install scripts • Test in both development environments – Create against a small database – Get results of time to complete against a full size copy of production • Install to production 24 Powerlinks and Custom Development • • • • 25 Test in development Try to break Pilot in production Release to all Questions? 26 Want More? • To view my resources and references for this presentation, visit www.scholar.com • Simply click “Advanced Search” and search by ezrafreelove and tag: ‘bbworld07’ 27 Contact Information • Ezra Freelove ezra.freelove@usg.edu • Amy Edwards amy.edwards@usg.edu • George Hernandez george.hernandez@usg.edu 28