Maintaining Large Vista Installations

advertisement
Maintaining Large Vista
Installations
Amy Edwards, Ezra Freelove, & George Hernandez
July 12, 2007
Agenda
•
•
•
•
•
•
•
2
Comparisons
Who is USG
Automation
Monitoring
Maintenance
More Tricks
Questions?
Informal Poll - Number of nodes
(All prod clusters) now:
• 1-10
• 11-20
• 21-50
• 50-70
• 70+
• Ours in bold
3
• (All prod clusters) by
December:
• 1-10
• 11-20
• 21-50
• 50-70
• 70+
Informal Poll – Number of DB Instances
Including secondary and non-production
•
•
•
•
1-2
3-6
7-10
10+
• Ours in bold
4
Vista Architecture
5
GeorgiaVIEW Project
• University System of
Georgia (USG)
• Vista 3.0.7
• Host 32 institutions &
multiple consortial
programs
• >150,000 active students
– Active is 100+ actions
• >11,000 active sections /
term
6
Issues
•
•
•
•
•
•
7
Handling performance issues
Capacity planning
Upgrades
Replication
JMS sensitivity
Integration
Automation
• Rolling Restarts
– Managed nodes restarted weekly
• except JMS
• Log cleanup to preserve space
• Error reporting
– application, tracking, vulnerabilities
• Thread dumps
• Sync admin node with backup
• LDIS batch integration
8
Monitoring
• Nagios
– http://www.nagios.org/
– Sends alerts
• Stats
– Custom AJAX web app
– Watch changes of over time
• AWStats
– http://www.awstats.org/
9
Nagios Example
10
Nagios Monitors
• OS / Hardware
– Load
– Temperature
– Free space
• Database
– Tablespace free space
– Listener
– Oracle processes
11
• Application
– Direct-login
– Weblogic processes
– Java MBeans
• Default/Primary Pending
Requests Current Count
• Java Heap Current
• JDBC Waiting for
Connection Current Count
• Multicast Messages Lost
• Primary count
Stats
• Short and long term analysis
– 21 months of data
• Graphs all Nagios data collected
• Flexible creation of reports
• Built with AJAX
12
Stats Examples I of III
13
Stats Examples II of III
14
Stats Examples I of III
15
AWStats
• Records data from web server logs
• Custom script grabs data from
webserver.log files
• Runs daily
16
AWStats Examples I of II
17
AWStats Eamples II of II
18
Specialized Nodes
• Admin
• JMS
• Institutional Admin
– Integration
• Chat
19
JMS Node
• Provides special services
– Mail, LC creation, chat
• Failure or migration of JMS node hinders
usage
• Services do not migrate well
– Allow targeted migration
– OTHERS: Pin JMS to a specific node
20
Integration
• Batched LDIS data
files
• Cron runs nightly
• Files broken up by:
– type
– “reasonable” number
of records
• Done on Inst node
– Issues with import can
kill node
21
Touching Nodes
• ssh & dsh
– Touch groups of nodes at once
– Useful for:
• Installs
• Gathering logs
• Locating a session
22
Maintenance Page
• Hosted on opposite f5
• Two versions
– Scheduled maintenance
– Unscheduled outage
• In an f5 outage, move DNS to other f5 so
message still appears
23
Installs and Upgrades
• Silent install scripts
• Test in both development environments
– Create against a small database
– Get results of time to complete against a full
size copy of production
• Install to production
24
Powerlinks and Custom Development
•
•
•
•
25
Test in development
Try to break
Pilot in production
Release to all
Questions?
26
Want More?
• To view my resources and references for
this presentation, visit
www.scholar.com
• Simply click “Advanced Search” and
search by ezrafreelove and tag:
‘bbworld07’
27
Contact Information
• Ezra Freelove
ezra.freelove@usg.edu
• Amy Edwards
amy.edwards@usg.edu
• George Hernandez
george.hernandez@usg.edu
28
Download