Met Office Science Review Meetings 2012 MOSAC 17.5 Strategy for optimal use of HPC for NWP 2013-2015 Stuart Bell, Weather Science © Crown copyright 2012 Met Office Presentation overview • NWP 2013 • IBM P7 Performance • Resources for Operations • Issues • NWP 2015? © Crown copyright Met Office Production NWP (end FY2013-14) 12km NAE 18km MOGREPS-R •1.5km model •2.2km ensemble •Up to 36hr f/c •Up to 36hr f/c •3-hourly update •6-hourly update •4.4km model •33km ensemble •Up to 120hr f/c •Up to 3day f/c •6-hourly update •6-hourly update •60km coupled model •Up to 6 months •Daily lagged ensemble •17km model •Up to 144hr f/c •6-hourly update Production NWP (end FY2013-14) Model Resolution Forecast Length Frequency Other Details (Members) UK 1.5 T+36 8 3dvar UK ensemble 2.2 T+36 4 (12) lbc perturbed Europe 4.4 T+144 2+ (T+72 at 06,18) 2 No DA – global downscaler Global 17 T+144 2+ (T+72 at 06,18) 2 Hybrid 4dvar Global Ensemble 33 T+72 ETKF + Stochastic Phys Global Extended Ensemble 60 T+72=>monthly tbd and seasonal © Crown copyright Met Office 4 (12+) MOGREPS-15 and Glosea5 merger details tbd Implementation Plan By Dec 2012: • 33km Global EPS • 2.2km UK EPS • Additional Global EPS members to T+12 for Hybrid DA • Expanded domain for 4km global downscaler Q1 & Q2 2013 • Business as usual science changes (eg UK model physics; New Satellites) • Global 4DVAR resolution increase • Retire various deprecated model configurations • New Suite Control System Autumn 2013 • 17km Global (with ENDGAME dynamical core) • Migrate Global EPS to ENDGAME Spring 2014 • Migrate UK and UK EPS to ENDGAME © Crown copyright Met Office HPC Resource Number of Compute Cores Linpack Performance (TFlops) Disk Capacity (TBytes) IBM P6 IBM P7 Factor 7904 38912 4.9 120 879 7.3 860 1500 1.7 ? © Crown copyright Met Office Benchmark performance Speedup if same share of resource used Global N512 1.5km Fraction of Cluster (416 nodes for P7) 50% 25% 25% 12.5% Speedup (P7: P6_2009) 2.5 2.0 3.6 4.1 Speedup (P7: P6_2012) 2.8 2.0 2.4 3.3 Speedup if same number of compute cores used 1.5km Global N512 Global N96 Nodes (32 cores) 52 26 2 Speedup (P7: P6_2009) 1.8 2.0 1.6 Speedup (P7: P6_2012) 1.9 1.3 1.4 © Crown copyright Met Office ENDGAME Coupled Model Performance (N216L85 + O0.25L75) •Coupled Core Count = Sum of components •Coupled Time = Max Component time plus coupling overhead • Currently more Cores needed • But scalability much better • Scope for more optimisation © Crown copyright Met Office • Scope for longer timestep ENDGAME Performance (N512L70) Planned Resource © Crown copyright Met Office Operational Constraints • Cluster Sizes : 6 frames + 5 frames (+ 2 frames) • Seasonal Forecast System = 1 frame • Operational NWP : max 2 frames • < Half cluster limit keeps NWP R&D/Operations ratio in check at hopefully 3:1 ratio • Delivery constraints for NWP • Many Operational Configurations => Forces overlap of suites => quarter cluster limit per configuration • Customers require end-product as soon as possible after data time => Maximum run length ~60 minutes © Crown copyright Met Office Table 5 – Breakdown of resource allocation Modelling System Global Deterministic Global Ensemble (excluding MOGREPS-15) Monthly & Seasonal Ensembles (plus Hindcast) Regional and Local Deterministic (for UK & Europe) Regional and Local Ensembles (for UK) Other LAMs Wet Models Total Operational NWP (R&D) Climate (Production and R&D) © Crown copyright Met Office Resource Shares Fraction of Total resource on IBM P6 (Jan 2012) 3.5% 1.0% Projected Fraction resource on IBM P7 (Apr 2014) ~2.5% ~1.5% 2.1% ~5.5% 4.9% ~2.5% 0.8% ~2% 1.3% 0.6% <1% <1% 14.3% 36.7% 49.0% 15% 30% 55% Expected Usage Profile : March 2014 Projected NWP Resource Usage P7 Nodes 288 192 96 0: 00 1: 00 2: 00 3: 00 4: 00 5: 00 6: 00 7: 00 8: 00 9: 00 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0 0: 00 0 Seasonal © Crown copyright Met Office Global Global EPS UK Euro UK EPS WHY EURO4? BBC 12=>4 •Increasing the domain •Increasing the resolution. PRIORITIES FOR UK MODELLING RESOURCE 2014-15 Assuming Best Case for ENDGAME COST and OPTIMISATION PROGRESS Priorities if affordable 1.Increasing the number of levels. 2.Extending the forecast range beyond T+36 3.Increasing the domain 4.Increasing the resolution © Crown copyright Met Office •Increasing the domain •Increasing the resolution. PRIORITIES FOR GLOBAL ENSEMBLE 2014-15 Assuming Best Case for ENDGAME COST and OPTIMISATION PROGRESS Priorities if affordable 1.Extending the forecast range to Day 5 • Does Short Range Forecasting Extend Out to Day 5 • Tim Johns and Adam Scaife review Global EPS beyond Day 5 in a later talk 2.Increasing the number of levels =>85 © Crown copyright Met Office What about 2015 ? 1/3 towards the 2020 vision : •10km Global Coupled EPS & •1km Regional Coupled EPS or 500m Local Coupled EPS – which? 2015 Strawman – Short-range NWP • Global 12km with >100levels • Global EPS at 25km • UK 1.5km ensemble + bigger domain [(x,y)=>(1.5x*1.5y)] + >100 levels • UK DA =>4DVAR; hourly with NWP Nowcast • Monthly / Seasonal 40km? or embedded 10km? – Question for a later presentation © Crown copyright Met Office Questions, answers & discussion © Crown copyright Met Office