R freeware statistics package Tara Jenson NCAR RAL JNT Tom Hopson What is R? • A statistical programming language • In part, developed from the S Programming Language from Bell Labs (John Chambers) • Created to allow rapid development of methods for use in different types of data. • Create new graphics. Many default parameters are chosen, but users retain complete control. Why R? • R has become the dominant language in the statistical research community. • R is Open Source and free. • Runs on all operatingsystems • Nearly 2,400 packages contributed. • Packagesand applications in nearly every field of science, business and economics. • See R Notes, R Journal and Journal of Statistical Software www.jstatsoft.org • More than 100 books with accompanying code • Very large, active user base. Why not R? • NCL, IDL, Matlab, SAS, … are all viable alternatives to R. If you are a part of an active community of researchers using another language, do likewise. • If we were biostatisticians we would be using SAS. Book Title: “Analyzing Receiver Operating Characteristic Curves with SAS” • Consider building verification functions and utilities as part of code development . Verification need not be an external process to forecasting. The R Community • Developers – R Core Group (17 members), only 2 have left since 1997 – Major update in April/October (freeze dates, beta versions, bug tracking, ...) • Mailing lists – Help list ~ 150 messages/day, archived, searchable. • 5 International Conferences, 2 US, 1 China Everything about R is at www.r-project.org • Source code • Binary compilations (Windows, Mac OS, Linux • Documentation ( Main documents, plus numerous contributed. Some in foreign languages.) • Newsletter (replaced by R Journal.) • Mailing list (Several search engines) • Packages on every topic imaginable • Wiki with examples • Reference list of books using R. ( more than 100) • Task Manager Use R with scripts • In Linux - Emacs Speaks Statistics – – – – – Provides syntax-based Object name completion Key strokeshort cuts Commandhistory Alt-x R to invoke R with Xemacs. • In Windows, use editor – – – – Added GUI features <control>R sends a line or highlighted section into R. Install package with GUIs Save graphics by point and click. • Mac OS – Similarto Windows with advantages of system calls. Packages in R • Contributed by people world wide. • Allow scientists or statisticians to push their ideas. • Apply and extend R capabilities to meet the needs of specific communities. • Accompany many statistical textbooks A sample of useful packages • • • • • verification fields (spatial stats) radiosondes extRemes BMA(BayesianModel Averaging) • BMAensemble • circular • Rsqlite • Rgis, spatstat (GIS) • ncdf ( support for netcdf files ) • Rcolorbrewer • randomForests Packages • Packages must be installed to call. • Packages must be called to use. • Base packages are installed by default. 10 most useful function in R • aggregate - applies a function to groups of data subset by categories. • apply - incredibly efficient in avoiding loops. Applies functions across dimensions of arrays. • layout - creatively divide a print region. • xyplot (in the lattice package) slightly advance graphic techniques • %in% returns logical showing which elements in A are in B. (e.g A%in%B) More top 10 • • • • • table – create contingency tabel counts. boot – apply bootstrap function correctly read.fwf – read fixed width format data par – control everything in a graph system( ) – allows you to call system command from R • pairs – the most under utilized plot – plots a matrix of 4 columns in a 4x4 plot layout Login, start your windowing system. $R Start R as appropriate for your platform. The R program begins, with a banner. (Within R, the prompt on the left hand side will not be shown to avoid confusion.) help.start() Start the HTML interface to on-line help (using a web browser available at your machine). You should briefly explore the features of this facility with the mouse. In particular, work through 1.5, 2.1 – 2.3, and appendix A (just the first one or two sections) R Exercises • • • • • Choose groups of 3-4 – find a computer Log onto machines Bring up at least 2 xterms >cd /home/user/Desktop/longlead >vi intro2R.2013.R And work through the commands given …