PPT - SAS OPUS

advertisement
SAS Grid at Statistics Canada
BY: Yves DeGuire
Statistics Canada
June 12, 2014
Agenda
• SAS at Statistics Canada
• What is the StatCan SAS
Grid?
• Migration and Use Cases
• Lessons Learned
• Looking Forward
Statistics Canada
• Canada’s central statistical agency.
• Mandate to collect, compile, analyse and
publish statistical information on the
economic, social and general conditions of the
country and its citizens.
• Mandate is fulfilled under the authority of the
Statistics Act which prohibits the disclosure of
identifiable information.
Crunching numbers is our business!
SAS@StatCan Where?
Collection
Processing
Input
Database
Clean
Microdata
Analysis
Dissemination
Output
Database
Survey Lifecycle
SAS@StatCan What?
•
•
•
•
•
•
Data processing
Application development
Query and reporting
Statistical analysis
Exploratory data analysis
“Specialised” computations (time-series,
optimization, matrix operations, etc.)
SAS@StatCan How?
•
•
•
•
•
•
•
•
•
Base SAS
SAS/ACCESS
SAS/AF
SAS/CONNECT
SAS/ETS
SAS/GRAPH
SAS/IML
SAS/Intrnet
SAS/OR
•
•
•
•
•
•
•
•
•
SAS/SHARE
SAS/STAT
SAS/TOOLKIT
Integration Technologies
Enterprise Guide
Enterprise Platform
DI Server
JMP
Grid Manager
SAS@StatCan
Some Numbers!
• 2,500,000 SAS jobs run every year
• 4,000 PC-SAS installations
• 2,500 active SAS users
• 450 production applications
• 80 Windows servers
• 25 Unix servers
• 20 platforms
• 3 versions of SAS: 9.1.3, 9.2 and 9.3
• 1 grid!
SAS@StatCan
More than 2500 Users!
*
What is the StatCan SAS Grid?
• A complete SAS Platform deployment utilizing the SAS
Grid Manager 9.4.
• Available to the entire Agency via a Hosting service.
• Part of the Network Transformation Initiative (NTI)
• 3 objectives:
– Consolidate 100+ SAS servers (Phase 1)
– Migrate processing from workstations to the grid (Phase 2)
– Enable new computing initiatives/possibilities (Phase 1 & 2)
StatCan Grid Milestones
• 2005-2010: Several “home-made” grids developed
over the years using Base SAS and SAS/CONNECT
• 2011: first test grid based on Grid Manager
• 2013: enhanced test grid released
• May 2014: production grid released for IBSP (V1)
• Q3 2014: full production grid will be released for
general availability (V2)
A Few Impressive Results while
Testing the Grid
• Capital stock calculation: 89% improvement on
elapsed time (2005)
• Audit module in G-Confid: Over 90% improvement
on elapsed time (2009)
• NHS-Tax Linkage project: from 59 hours to 50
minutes using G-Link V3 (2012)
• Simulations with CCHS data: hundreds of
simulations run in a few hours compared to days on
a workstation. (2013)
Why the StatCan Grid?
•
•
•
•
•
•
•
Reduced costs $ $ $
Process Higher Volume of Data.
Process data in less time.
Scalable
Secure
Centrally managed
Usage metrics
Implementation Highlights
(phase 1)
Grid Nodes
SAS
Platform
Clients
Web Clients
and
Services
SAS
Metadata
Server
SAS Mid-Tier
Node1
Node2
Node3
Node4
Node5
Node6
Node7
Node8
Node9
Node10
Node11
Node12
Node13
Node14
Node15
Node16
Intel
X86_64
16 cores
256GB ram
Shared File
System
Clustered
2-tier storage
80 TB
The Transparent Grid
One of the objectives of the grid is to make the
user experience as transparent as possible.
Single sign-on
Samba shares
Helpers (Macros, Stored Processes)
SAS Grid Data Tier
• Data Files (must “live” on the CFS)
– Flat files / SAS files
– PC files (Excel spreadsheets, etc.)
– Exposed to Windows via SAMBA
• Databases:
– SQL*Server
– ORACLE
– Sybase
Migration Requirements
Platform clients only such as Enterprise Guide
No host commands available
SAS/Access to PC File formats with limitations
No direct access to Windows Shares
SAS 9.4 and SAS 9.3M1 supported
The StatCan SAS grid is a “pure” SAS compute service!
Use Cases
•Use Case #1: Ad hoc users
•Users who need to process/analyze data “on-demand”
•Large number of concurrent users
•Use Case #2: Batch Jobs
•SAS Jobs that run unattended.
•A new mainframe!!!
•Use Case #3: Parallel Processing
•Jobs broken into smaller tasks and
dispatched to the grid.
•Myth: a SAS program will execute in parallel with no
modifications!
Lessons Learned
• A SAS grid project is an also infrastructure project.
• Linux offers some challenges to integrate with a Windows.
• Managing users expectations is critical.
• Resistance to change must be managed.
• Start simple and build on success.
• Be proactive: plan/think about your next SAS environment.
Looking Forward
• Phase 1: consolidate 80 servers over the next 2 years.
• Phase 2:
• Introduce a new grid at SSC Data Centre.
• Complete servers consolidation started in Phase1.
• Migrate workstation processing to the grid.
Are there opportunities to
collaborate with other
departments?
Thank You!
Yves DeGuire
Section Chief
System Engineering Division
Statistics Canada
R.-H.-Coats Building 14 A
100, Tunney’s Pasture driveway
Ottawa, Ont., K1A 0T6
(613) 951-1282
Yves.Deguire@statcan.gc.ca
Download