Resources on forecast evaluation/verification Barbara Brown Joint Numerical Testbed NCAR, Boulder, CO July 2011 WMO Verification web page Developed and maintained by working group on forecast verification research Includes FAQs, references and links to tutorial presentations Verification discussion group: vxdiscuss@rap.ucar.edu http://www.cawcr.gov.au/projects/verification/ EUMETCAL Online Learning Module Free on-line tutorial with exercises Includes modules on Categorical Continuous Probabilistic Verification methods http://www.eumetcal.org/resources/ukmeteocal/verific ation/www/english/courses/msgcrs/index.htm Books Wilks (2011): Statistical Methods in the Atmospheric Sciences Published by Elsevier Includes extensive chapter on verification methods Jolliffe and Stephenson: Forecast Verification - A Practitioner’s Guide in Atmospheric Sciences Published by Wiley (new version in Warner (2010): 2011) Numerical Weather and Climate Prediction Published by Cambridge Universtity Press Includes chapter Forecast Evaluation Tools R Statistics Package: Includes many statistical tools, including the R Verification Library MET is freely available and supported to the community Main focus: Model verification Includes tools for point and gridded data, ensemble forecasts, spatial methods http://www.dtcenter.org/met/users/ Introduction to R Tara Jensen NCAR/RAL/JNT What is R? • A statistical programming language • In part, developed from the S Programming Language from Bell Labs (John Chambers) • Created to allow rapid development of methods for use in different types of data. • Create new graphics. Many default parameters are chosen, but users retain complete control. Why R? • R has become the dominant language in the statistical research community. • R is Open Source and free. • Runs on all operatingsystems • Nearly 2,400 packages contributed. • Packagesand applicationsin nearly every field of science, business and economics. • See R Notes, R Journal and Journal of Statistical Software.www.jstatsoft.org • More than 100 books with accompanying code • Very large, active user base. Why not R? • NCL, IDL, Matlab, SAS, … are all viable alternatives to R. If you are a part of an active community of researchers using another language, do likewise. • If we were biostatisticians we would be using SAS. Book Title: “Analyzing Receiver Operating Characteristic Curves with SAS” • Consider building verification functions and utilities as part of code development . Verification need not be an external process to forecasting. The R Community • Developers – R Core Group (17 members), only 2 have left since 1997 – Major update in April/October (freeze dates, beta versions, bug tracking, ...) • Mailing lists – Help list ~ 150 messages/day, archived, searchable. • 5 International Conferences, 2 US, 1 China Everything about R is at www.r-project.org • Source code • Binary compilations (Windows, Mac OS, Linux • Documentation ( Main documents, plus numerous contributed. Some in foreign languages.) • Newsletter (replaced by R Journal.) • Mailing list (Several search engines) • Packages on every topic imaginable • Wiki with examples • Reference list of books using R. ( more than 100) • Task Manager Use R with scripts • In Linux - Emacs Speaks Statistics – – – – – Providessyntax-based Object name completion Key strokeshort cuts Commandhistory Alt-x R to invoke R with Xemacs. – – – – Added GUI features <control>R sends a line or highlighted section into R. Install package with GUIs Save graphics by point and click. • In Windows, use editor • Mac OS – Similarto Windows with advantagesof system calls. Coding principles • Make verification code transparent and easy to read • Comment and document liberally • Archive your code • Share your code • Label and save your data • Share your data Packages in R • Contributed by people world wide. • Allow scientists or statisticians to push their ideas. • Apply and extend R capabilities to meet the needs of specific communities. • Accompany many statistical textbooks • Accompany applied articles (Adrian Raftery, Doug Nychka, Tilman Gneiting, Barbara Casati, Matt Briggs) A sample of useful packages • • • • • verification fields (spatial stats) radiosondes extRemes BMA(BayesianModel Averaging) • BMAensemble • circular • Rsqlite • Rgis, spatstat (GIS) • ncdf ( support for netcdf files ) • Rcolorbrewer • randomForests Packages • Packages must be installedto call. • Packages must be called to use. • Base packages are installed by default. 10 most useful function in R • aggregate - applies a function to groups of data subset by categories. • apply - incredibly efficient in avoiding loops. Applies functions across dimensions of arrays. • layout - creatively divide a print region. • xyplot (in the lattice package) slightly advance graphic techniques • %in% returns logical showing which elements in A are in B. (e.g A%in%B) More top 10 • • • • • table – create contingency tabel counts. boot – apply bootstrap function correctly read.fwf – read fixed width format data par – control everything in a graph system( ) – allows you to call system command from R • pairs – the most under utilized plot – plots a matrix of 4 columns in a 4x4 plot layout R Exercises • • • • • Choose groups of 3-4 – find a computer Log onto machines Bring up at least 2 xterms >cp /wrfhelp/MET/R-packages/example/* . >vi intro2R.2011comet.R or >kwrite intro2R.2011comet.R &