Sundio Group SQL Server 2012 TAP Deployment Lessons Learned Bas Bruijninckx (Sundio Group) Koen Reijns (Sundio Group) Justin Langford (Coeo) INTRODUCTION Sundio Facts & Figures Business IT 10 Brands > 100 Servers 9 Countries > 25 workstations 12 offices 68TB EMC storage >400 Million turnover >100 databases >700K Passengers in 2011 4 TB Fusion-io ioMemory 250 employees 25 IT Staff 450 tourguides Sundio Strategy Business IT Volume Process automation Efficiency (Low cost) One platform 2005 – 2011 : Buy & build (investor) State of art, Proven technologies 2012 onwards : autonomous & foreign growth Performance engineering IT Challenges • Seasonal business pattern – €400k/ day to €4 million/ day • 24hr business • Massive amounts of data • Performance is business key success factor – Duplicate data for scalability SQL Server Environment 1 4 2 3 SQL Server 2012 Rationale • Already invested in platform stability, risk-ready • Skip SQL 2008 – one upgrade to SQL 2012 rather than migration path 2005-2008-2012 • Scale-out reporting to AlwaysOn replica – Off-load reporting, reduce OLTP contention – Utilize standby hardware • Recognized as an advanced technology company • Closer to Microsoft PROJECT PLAN Planning • Hardware upgrade – New SAN / Blade servers • SQL Upgrade – Basic preparations done for SQL 2008 upgrade – Preparations started in February 2011 Planning • Restoring databases from SQL2005 to SQL2012 CTP version • Upgrade SSIS packages • Preparing connection strings & load balancer • Basic testing • Baseline performance test new hardware • Running SSIS packages • Applications – basic functions Planning • Performance testing – SQLCAT performance lab • Focus: Price and availability engine • Push the configuration to the max Step Performance SQL 2005 + old hardware 100% SQL 2005 + new hardware + 222% SQL 2012 + new hardware + 269 % SQL 2012 + new hardware + new functionality … An additional 50% increase! … Planning • Functional testing – User testing – Vendor support • Back-office process – Jobs – Price & Availability engine – Replication testing Planning • Upgrade (November 5th) – One 4-node cluster – Big bang upgrade – Temporary front-end servers • Post migration support • Upgrade to RC1 (mid-February) LESSONS LEARNED Pre-Upgrade • Installation procedure – CTP3 + Refresh install slow* • 5 instances on 4 nodes (20 installs) • *Fixed in RC0 (good experience with RC1 in test) • Challenges with CT3 in-place upgrade • SSIS Bugs – Password retention – Deadlocks with self – Job step properties Pre-Upgrade • AlwaysOn AG – – – – – Async readable secondary 4-node cluster Primary and replica may not co-exist on same node Control failover via Possible owners & anti-affinity Use T1448 to avoid AlwaysOn latency affecting Log Reader performance Pre-Upgrade • SAN storage performance – Acceptance criteria • Max Avg read/ write latency <20ms • Target Avg read/ write latency <10ms – Significant challenges reaching target performance – Persisted with EMC and performance improved – Necessary to fix pre-go-live Post-upgrade • Replication – Heavy users of transactional replication – Utilise SP replication to improve performance – Transaction isolation READ UNCOMMITTED not supported with repl’d SPs from SQL 2008 onwards Database Administration • Used native backup compression – Previously using Quest Litespeed – Problem: lack of disk space – Enable T3042 to avoid pre-allocation Spinlock Diagnosis: SOS_OBJECT_STORE 1 Higher CPU with minimal increase in throughput 3 • • 2 Observed close to ~80 million spins and 8-10K backoffs per minute on SOS_OBJECT_STORE (below is from 13.3 minute period) Used Xevents and spinlock_stats to identify XVB structure as the source of the contention and involved CSS and product team via RFC. After investigation team provided private build to test Conclusion • SQL Server 2012 platform - supports business growth • Benefits from the TAP program – Working closer with Microsoft – Improved quality of released product (High CPU issue solved in CU1) • Work ahead – RTM upgrade – Upgrade other environment and BI platform to SQL 2012 – Deploy CU1 for contention fix Bas Bruijninckx Koen Reijns Justin Langford Q&A twitter @basbrx @koenreijns @justinlangford