What is R

advertisement
What is R?
Official website: http://www.cran.r-project.org
1. R is free, open-source statistical analysis software. It’s
a competitor to many commercial software packages
such as MathLab, Microsoft Excel, SPSS, etc.
2. R is open source and has a very active user groups and
contributors
3. R architecture  Basic functionality + additional
packages(optional)
4. Basic functionality comes with installation
5. Additional packages are imported as needed and
loaded before they can be used
6. R has a Good online help!
1
Example of Utility of R : Statistical Data Analysis & Data
Visualization
• Exploratory data analysis
When analyzing data in sciences (Data mining, Machine learning,
Social Science, etc.) , most researchers use Ms Excel, MathLab, SPSS,
etc. to store, edit, and analyze their data. For example, if researcher is
studying students appraisal of a courses, he may have participants
complete an online survey. The researcher might combine individual
answers to create a global course score. Then, the next step would be
to perform a statistical test to look for group differences among
students on a particular course, or calculate correlations with other
items of interest (Mean, median, correlation, standard deviation,
etc.).
• Data visualization
In many cases one would like to visualize the dataset
R provides a great way to achieve the above objectives.
2
Basic structure of R
R
Basic
Packages
Add-on Packages
(Import)
(Installation)
• Basic packages are available after installation.
Usually located at ‘C:/Program Files/R/R-2.15.1/library’
This location could be different depending on your particular installation
• Additional libraries are imported as needed
3
Illustrating Few Key Features of R
1-Scatterplot
s3d <-scatterplot3d(SepalWidth,PetalLength,SepalLength, pch=16, highlight.3d=TRUE,
type="h", main="3D Iris Scatterplot")
fit <- lm(SepalLength ~ SepalWidth+PetalLength)
s3d$plane3d(fit)
6
5
PetalLength
7
7
6
5
4
3
2
1
4
SepalLength
8
3D Iris Scatterplot
2.0
2.5
3.0
3.5
4.0
4.5
SepalWidth
4
Scaterpolt
scatter3d(SepalWidth,PetalLength,SepalLength,sphere.size=2, surface=TRUE,
fit="linear",model.summary="TRUE", parallel=FALSE,
elliposiod=TRUE,surface.col=c("green", "red", "blue", "gold", "firebrick3"))
5
• 2-Object manipulation/Regression analysis
6
14
13
12
1
3
3
3
11
1
1 1
1111 1
1
11 1
1 11111
11111
1 1
11111111
1 11
1
3 3 3
3
2
33333333 3
3
222223
3 33
2 3 33 3 3
3
2 3 33
222 2222222 32 33
2 22 22 2 33
2 22
222 22 2222 3
2 2222 2 2
2
2 22 2 3
22 222
2 2
22
9
#plot result
plot(mysubSet,obj$cluster,pch=obj$cluste
r)
plotcluster(mysubSet,obj$cluster)
#Clustering quality result
obj$centers
obj$totss
obj$withinss
obj$size
10
11
dc 2
#Demonstrating k-mean clustering
library(cluster)
library(fpc)
dataset =
read.csv("C:/Users/paul/Desktop/R_wd/L
ab/iris.csv")
mysubSet<- dataset[1:4]
obj<-kmeans(mysubSet,centers=3)
15
• 2-Object manipulation/clustering analysis
2
0
5
10
dc 1
7
8
Installation
1. Go to http://cran.r-project.org/mirrors.html.
The R installations are distributed by the Comprehensive R Archive Network (CRAN). CRAN is a
collection of sites which carry identical materials and were created as mirror sites to lessen the load on
any one server.
2. Click on one of the USA links, select for example http://lib.stat.cmu.edu/R/CRAN/, which brings you
to Carnegie Mellon University's Statlib mirror site. (Select a site close to where you are)
3. In the ‘Download and Install R’ box, click on the `Windows' link.
4. Click on the `base' link.
5. Right-click on the `Download R 2.9.1 for Windows' link and choose `Save Link As. . . '.
6. Save ‘the .exe _le’ to your Desktop (the R-2.9.1-win32.exe _le is R-3.0.1-win.exe approximately 36Mb
52 Mb).
7. Double-click on the .exe icon and follow the instructions.
8. When asked to `Select the components you want to install', choose the (default) `User' installation.
Don't worry about `customizing the startup options'.
In general, you should install R perfectly by just clicking on the `Okay', `Next', or `Finish' buttons at each
step and letting the R set-up use the default choices.
9
Beginner reference
http://cran.r-project.org/doc/manuals/R-intro.html#The-R-environment
• R is an interpreter. It is built with the S language
Command line. You type
the command and R executes
it
10
Preliminaries
• R is case sensitive
• # is the comment tag
• R is installed with a default library/packages. You
add/ import additional Packages to the library
using the command library()
• To use a library function you must load it first into
memory using the command load()
• Basic instructions are memory resident (you do
not need to load them)
• Variable names cannot start with “.” (dot), “+”
(plus sign) or “-”(minus sign)
11
Preliminaries
• No variable declaration is needed. Variables are called objects and are
memory residents.
• Assignment is achieved with the command assign()
For example: assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7)) put the vector into x
Symbol for assignment : <- ( this is a shortcut)
• To Print on screen just type the variable name followed by ENTER or use
cat(), or print()
Example
A<-2
cat(A) will display 2 on the screen
• Command are separated by “;” or by new line character
• Use the setwd command to set a working directory
• For example : setwd("C:/Documents and Settings/username/My
Documents/xyz/")
12
Few Basic Commands
• help(topic) or help(help=topic) or ?topic
In the command line if you type help(topic) R
will fetch information about the topic you need
help with. A topic is either an instruction or a
package name
For example, ‘help()’ or ‘help(help)’ will provide
help about the help instruction
• example(topic) # will provide examples of
how to use the instruction (topic)
13
q() to quit R.
source(path) to execute several lines of
instructions stored in a file (sometime better
than interactive mode).
Path: where is the path to the file.
File containing R script have the extension .r
For example
Source (source1.r) will execute everything in the
file assuming source1.r is in the working
directory
14
source() is also available under the menu.
(window)
File>> Source R Code … then select the file
sink( outputFile) command will redirect all
output to the outputFile. For example
sink("record.lis") will output to record.lis
sink() will restore the output back to the
console. (no argument provided)
15
Data permanency and removing objects
l() :To print the workspace on the console
workspace :list of most objects currently in
memory
rm: (object names separated by commas) to
remove one or more objects from the
workspace
You may use the File menu to save or load
workspaces
16
Non interactive mode: R-editor
• You may use the R-editor to edit a script then
run/save. (File >> New Script)
17
Non interactive mode: RCommander
RCommander
RCommander is an external Reditor package that needs to be
imported
Packages>> load packages…
scroll until you find Rcmdr
18
rCommander screen.
You type your script here
Here is the output window
Compilation error/warning
window
19
links
1-Official website:
http://www.cran.r-project.org
2-Quick-R :To learn about graphics
http://www.statmethods.net/graphs/
3- Beginner reference
http://cran.r-project.org/doc/manuals/Rintro.html#The-R-environment
20
THANK YOU!
21
Download