SAS Grid at Statistics Canada BY: Yves DeGuire Statistics Canada June 12, 2014 Agenda • SAS at Statistics Canada • What is the StatCan SAS Grid? • Migration and Use Cases • Lessons Learned • Looking Forward Statistics Canada • Canada’s central statistical agency. • Mandate to collect, compile, analyse and publish statistical information on the economic, social and general conditions of the country and its citizens. • Mandate is fulfilled under the authority of the Statistics Act which prohibits the disclosure of identifiable information. Crunching numbers is our business! SAS@StatCan Where? Collection Processing Input Database Clean Microdata Analysis Dissemination Output Database Survey Lifecycle SAS@StatCan What? • • • • • • Data processing Application development Query and reporting Statistical analysis Exploratory data analysis “Specialised” computations (time-series, optimization, matrix operations, etc.) SAS@StatCan How? • • • • • • • • • Base SAS SAS/ACCESS SAS/AF SAS/CONNECT SAS/ETS SAS/GRAPH SAS/IML SAS/Intrnet SAS/OR • • • • • • • • • SAS/SHARE SAS/STAT SAS/TOOLKIT Integration Technologies Enterprise Guide Enterprise Platform DI Server JMP Grid Manager SAS@StatCan Some Numbers! • 2,500,000 SAS jobs run every year • 4,000 PC-SAS installations • 2,500 active SAS users • 450 production applications • 80 Windows servers • 25 Unix servers • 20 platforms • 3 versions of SAS: 9.1.3, 9.2 and 9.3 • 1 grid! SAS@StatCan More than 2500 Users! * What is the StatCan SAS Grid? • A complete SAS Platform deployment utilizing the SAS Grid Manager 9.4. • Available to the entire Agency via a Hosting service. • Part of the Network Transformation Initiative (NTI) • 3 objectives: – Consolidate 100+ SAS servers (Phase 1) – Migrate processing from workstations to the grid (Phase 2) – Enable new computing initiatives/possibilities (Phase 1 & 2) StatCan Grid Milestones • 2005-2010: Several “home-made” grids developed over the years using Base SAS and SAS/CONNECT • 2011: first test grid based on Grid Manager • 2013: enhanced test grid released • May 2014: production grid released for IBSP (V1) • Q3 2014: full production grid will be released for general availability (V2) A Few Impressive Results while Testing the Grid • Capital stock calculation: 89% improvement on elapsed time (2005) • Audit module in G-Confid: Over 90% improvement on elapsed time (2009) • NHS-Tax Linkage project: from 59 hours to 50 minutes using G-Link V3 (2012) • Simulations with CCHS data: hundreds of simulations run in a few hours compared to days on a workstation. (2013) Why the StatCan Grid? • • • • • • • Reduced costs $ $ $ Process Higher Volume of Data. Process data in less time. Scalable Secure Centrally managed Usage metrics Implementation Highlights (phase 1) Grid Nodes SAS Platform Clients Web Clients and Services SAS Metadata Server SAS Mid-Tier Node1 Node2 Node3 Node4 Node5 Node6 Node7 Node8 Node9 Node10 Node11 Node12 Node13 Node14 Node15 Node16 Intel X86_64 16 cores 256GB ram Shared File System Clustered 2-tier storage 80 TB The Transparent Grid One of the objectives of the grid is to make the user experience as transparent as possible. Single sign-on Samba shares Helpers (Macros, Stored Processes) SAS Grid Data Tier • Data Files (must “live” on the CFS) – Flat files / SAS files – PC files (Excel spreadsheets, etc.) – Exposed to Windows via SAMBA • Databases: – SQL*Server – ORACLE – Sybase Migration Requirements Platform clients only such as Enterprise Guide No host commands available SAS/Access to PC File formats with limitations No direct access to Windows Shares SAS 9.4 and SAS 9.3M1 supported The StatCan SAS grid is a “pure” SAS compute service! Use Cases •Use Case #1: Ad hoc users •Users who need to process/analyze data “on-demand” •Large number of concurrent users •Use Case #2: Batch Jobs •SAS Jobs that run unattended. •A new mainframe!!! •Use Case #3: Parallel Processing •Jobs broken into smaller tasks and dispatched to the grid. •Myth: a SAS program will execute in parallel with no modifications! Lessons Learned • A SAS grid project is an also infrastructure project. • Linux offers some challenges to integrate with a Windows. • Managing users expectations is critical. • Resistance to change must be managed. • Start simple and build on success. • Be proactive: plan/think about your next SAS environment. Looking Forward • Phase 1: consolidate 80 servers over the next 2 years. • Phase 2: • Introduce a new grid at SSC Data Centre. • Complete servers consolidation started in Phase1. • Migrate workstation processing to the grid. Are there opportunities to collaborate with other departments? Thank You! Yves DeGuire Section Chief System Engineering Division Statistics Canada R.-H.-Coats Building 14 A 100, Tunney’s Pasture driveway Ottawa, Ont., K1A 0T6 (613) 951-1282 Yves.Deguire@statcan.gc.ca