Jensen_R_intro

advertisement
Resources on forecast
evaluation/verification
Barbara Brown
Joint Numerical Testbed
NCAR, Boulder, CO
July 2011
WMO Verification web page
Developed and
maintained by
working group on
forecast verification
research
Includes FAQs,
references and links to
tutorial presentations
Verification discussion
group:
vxdiscuss@rap.ucar.edu
http://www.cawcr.gov.au/projects/verification/
EUMETCAL Online Learning Module
Free on-line
tutorial with
exercises
Includes modules
on
Categorical
Continuous
Probabilistic
Verification
methods
http://www.eumetcal.org/resources/ukmeteocal/verific
ation/www/english/courses/msgcrs/index.htm
Books
Wilks (2011): Statistical
Methods in the Atmospheric
Sciences
Published by Elsevier
Includes extensive chapter on
verification methods
Jolliffe and
Stephenson:
Forecast
Verification - A
Practitioner’s
Guide in
Atmospheric
Sciences
Published by Wiley
(new
version
in
Warner
(2010):
2011)
Numerical
Weather and
Climate
Prediction
Published by
Cambridge
Universtity Press
Includes chapter
Forecast Evaluation Tools
R Statistics Package:
Includes many statistical tools,
including the R Verification
Library
MET is freely available and
supported to the community
Main focus: Model
verification
Includes tools for point and
gridded data, ensemble
forecasts, spatial methods
http://www.dtcenter.org/met/users/
Introduction to R
Tara Jensen
NCAR/RAL/JNT
What is R?
• A statistical programming language
• In part, developed from the S Programming
Language from Bell Labs (John Chambers)
• Created to allow rapid development of
methods for use in different types of data.
• Create new graphics. Many default
parameters are chosen, but users retain
complete control.
Why R?
• R has become the dominant language in the statistical
research community.
• R is Open Source and free.
• Runs on all operatingsystems
• Nearly 2,400 packages contributed.
• Packagesand applicationsin nearly every field of
science, business and economics.
• See R Notes, R Journal and Journal of Statistical
Software.www.jstatsoft.org
• More than 100 books with accompanying code
• Very large, active user base.
Why not R?
• NCL, IDL, Matlab, SAS, … are all viable
alternatives to R. If you are a part of an active
community of researchers using another
language, do likewise.
• If we were biostatisticians we would be using
SAS. Book Title: “Analyzing Receiver Operating Characteristic Curves with SAS”
• Consider building verification functions and
utilities as part of code development .
Verification need not be an external process to
forecasting.
The R Community
• Developers
– R Core Group (17 members), only 2 have left since
1997
– Major update in April/October (freeze dates, beta
versions, bug tracking, ...)
• Mailing lists
– Help list ~ 150 messages/day, archived,
searchable.
• 5 International Conferences, 2 US, 1 China
Everything about R is at www.r-project.org
• Source code
• Binary compilations (Windows, Mac OS, Linux
• Documentation ( Main documents, plus numerous
contributed. Some in foreign languages.)
• Newsletter (replaced by R Journal.)
• Mailing list (Several search engines)
• Packages on every topic imaginable
• Wiki with examples
• Reference list of books using R. ( more than 100)
• Task Manager
Use R with scripts
• In Linux - Emacs Speaks Statistics
–
–
–
–
–
Providessyntax-based
Object name completion
Key strokeshort cuts
Commandhistory
Alt-x R to invoke R with Xemacs.
–
–
–
–
Added GUI features
<control>R sends a line or highlighted section into R.
Install package with GUIs
Save graphics by point and click.
• In Windows, use editor
• Mac OS
– Similarto Windows with advantagesof system calls.
Coding principles
• Make verification code transparent and easy
to read
• Comment and document liberally
• Archive your code
• Share your code
• Label and save your data
• Share your data
Packages in R
• Contributed by people world wide.
• Allow scientists or statisticians to push their
ideas.
• Apply and extend R capabilities to meet the
needs of specific communities.
• Accompany many statistical textbooks
• Accompany applied articles (Adrian Raftery,
Doug Nychka, Tilman Gneiting, Barbara Casati,
Matt Briggs)
A sample of useful packages
•
•
•
•
•
verification
fields (spatial stats)
radiosondes
extRemes
BMA(BayesianModel
Averaging)
• BMAensemble
• circular
• Rsqlite
• Rgis, spatstat (GIS)
• ncdf ( support for
netcdf files )
• Rcolorbrewer
• randomForests
Packages
• Packages must be installedto call.
• Packages must be called to use.
• Base packages are installed by default.
10 most useful function in R
• aggregate - applies a function to groups of
data subset by categories.
• apply - incredibly efficient in avoiding loops.
Applies functions across dimensions of arrays.
• layout - creatively divide a print region.
• xyplot (in the lattice package) slightly advance
graphic techniques
• %in% returns logical showing which elements
in A are in B. (e.g A%in%B)
More top 10
•
•
•
•
•
table – create contingency tabel counts.
boot – apply bootstrap function correctly
read.fwf – read fixed width format data
par – control everything in a graph
system( ) – allows you to call system
command from R
• pairs – the most under utilized plot – plots a
matrix of 4 columns in a 4x4 plot layout
R Exercises
•
•
•
•
•
Choose groups of 3-4 – find a computer
Log onto machines
Bring up at least 2 xterms
>cp /wrfhelp/MET/R-packages/example/* .
>vi intro2R.2011comet.R
or
>kwrite intro2R.2011comet.R &
Download