Uploaded by angelica.correa.c54

Summary R Programming Basics

advertisement
Module overview
Julien Gagneur
Tidy data and combining tables
3 / 59
Introduction to R, RStudio and R markdown
Rstudio
Rstudio is a software that allows to program in R and interactively analyze data with R
It organizes the R session into 4 panels:
Julien Gagneur
Lecture 1 - R Basics
6 / 38
Introduction to R, RStudio and R markdown
R markdown
R markdown allows us to combine R commands with natural text
Create an R markdown file: File -> New file -> R markdown
Use Shift+Ctrl+K to start the knit process or press the button Knit
Each R markdown document contains a YAML header, defining document wide settings like
title, author and output type (html, pdf, doc . . . )
R markdown cheatsheet
[https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf]
Julien Gagneur
Lecture 1 - R Basics
7 / 38
Introduction to R, RStudio and R markdown
Installing and loading packages
Packages are the fundamental units of reproducible R code. Several packages are automatically
included when installing R.
We can install and load new packages by typing:
install.packages("vegan") # install new package called vegan, do only once!
library(vegan) # and load it (on every script)
Vegan is a package to analyze biodiversity. To lean more about an installed package try:
browseVignettes(“vegan”)
Julien Gagneur
Lecture 1 - R Basics
8 / 38
OPIC 1 : RBASKS
•
•
Lecture
AH
qvestions
First
•
Central
exercise
Stack
in
with R
Steps
Assignment s
We
I
use
Example
•
and
tutorial
,
-
to
dssign
use
to
variables
.
:
Objects
We
va / ves
the term
Variable
or
print / variable ,
p
object
}
I.
variables Functions
y /
,
to describe
" MM " "
stvff that
" see
the value
is
,
.
.
.
)
Store d
in
R
.
stored
ir
a va , , ¿ me
•
B- meta OMSS
Performs
Take
E.
Tasks In R
in
inputs
Function 's
g
can
catled
default
Square
Is I I
Command
?
toy
"
sqrt )
"
speclfled
•
.
sqrtal
help (
be
arguments
return
Function 's
outputs
default
value]
.
va / ves
root of
}
and
Manual /y
to
a
see
alt the variables
Find out
what the
what
saved in
my Workspace
function experts
it does
and
.
Other
•
prebuilt objects
data l )
•
Variable
:
Convention
Lower
see
datasets
alt the available
.
Case
,
use
:
and
meaningful
underscores (
use
Word, that
•
)
di
a
describe
what
Substitute
is
stored
use
,
for Spaces
.
Reusing scripts
That 's
•
to
ndmes
Nile
•
dllows
redefine
Comment
#
the variables that I
Ing your
Create
and
re
compute
the
so
Code
It's not evaluaied
particular
already
Code
.
.
Use this to
Write
reminders
ofwhy
we
wroie
/ vtion
.
Types
Data
Variables
can
class ( )
Example
•
different types
be of
Helps
determine
vs
Ey
.
what
numbers
type
of
,
characters
object
we
Tables , Hits
have
.
.
:
Data Frame
Table with
rows
(
( representan y
( representa ng
olumns
/
#
#
too
Find
King
out
toading
Loadvng
at the Structure
more
Observations
the
the
and
different
the
variables
for each
observaron
Datagramas
Useful
reported
'
library
murder s
data set
cause
of
a
we
different
into
about the Structure
strlmurders )
)
One
data Frame
of
an
object
with the function
sir
can
)
.
combine
data
object
types
.
lo / vmn
OF
mames
a
Frame
Get the columns
( mvrders )
names
data
with the function
First
lines
n
of
head ( murder s
n =3
[ 1 :3
Access
a
Use the
,
frame
.
Y
Show the first
variable ir
a
data
operator
access or
murders
data
r
( mvrders )
head
name
a
)
or
Murder s
oj
data Frame
a
,
memes
Frame
$
$ population
t
t
Select ed
dataset
MÍANS
access
Variable
the differeni
variables
represente d by
columns
included
dataset
.
in
the
six
lines
of the
data Frame
.
.
Access
Use
rows
single
and
( Olumns
Square brdlkets
Of
a
data Frame
[ |
•
Vector s
with
Objects
pop
murders #
C-
length
entries
.
}
populatlon
( pop )
Creating
•
several
1)
Tellus
are
Vector s
stands for
in
how
the
Many
vector
entries
.
i
combine
or
concatenateentries
.
*
Create
vector
Nvmerics
It's
•
numerk
population
nvnbers
Characters
'
cause
Sites
.
are
using
the
function
C
Example
2:
•
me use
the
characters
quoi esto
rather
denote that
than
variables
country <- c("italy", "canada", "egypt")
*
We
can
also
Logica /
C
(
/ FALSE
(FALSE
3
! =3
,
,
FALSE )
TRUE
=
=
FALSEI
use
single
quites
'
'
the
entres
ndmes
.
are
⑧
1Kerµ•M•ta•ms4ocreaTer•memc_
•
Create
Weston
seql )
Sequence using
•
i
second
arg
.
Is the
flrstargument is
1
-
#
I
increment
of
I
•
Third
to
•
Repeat
6 7
the
8
in
Sequence
9
6
7
8
9
2
difjerent
Ways
:
the start
atgvment how
jump
by
.
end
multi
.
.
A- crees
ing Vectores
Access
•
specific
elements of
a
vector
Access
•
Access
Ottmar
•
more
than
One
Functions
entry
1- •
brackets
to the second
element
[ ]
the vector
of
•
my
Square
using
of
app / y
vector
a
on
using
multi
numerical
-
entry
vector
as
an
Index
.
factors
Sort Ing
IF
we want
5. times this
E.
to rank
function
any
it's
variable
from
enough
not
g
•
result of
sort
'
M
least
because
to
it
most
.
doesrit
gire
us
enough
inpormation
•
Order
Takes
the
a
input
Index
✗
vector
vector
L
-
as
and
input
a
re Turn
the
vector
of the Index es
that sort ,
.
Order
(
✗
I
[ Index ]
Example
a
a
1
2
3
6
Akam Img The
Define
•
enfríes copa vector
vector
a
Nse the
name ,
of
to
Codes
Country
Connect
the
✓
•
Using
the
Manes
function
:
tuvo
Other functions to apply on numerical vectors
max and which.max
v
## [1] 8 3 4
# maximal value
max(v)
min Iv )
## [1] 8
# index containing the maximal value
which.max(v)
which min (v )
.
## [1] 1
try also min and which.min()
Julien Gagneur
Lecture 1 - R Basics
37 / 38
•
a
Which
Match
vrder $ state
)
•
% in to
state
•
Factor s
Factor s
are
(
Levels
By
Useful
for
Murder s $
default the
al
Staring categoria
data
Region
is a
region
levels
}
)
are
lnspect
um
factor
the
by using
the
.
que
Levels
or
levels )
of
.
va / ves
t
sort ed
Example
categorias (
.
Con struct
d
Factor
by alpha numerical
Order
.
a
¡ actor
Recorder
a
factor
The default in R is for the levels to follow alphabetical order. However,
often we want the levels to follow a different order. You can specify an
order through the levels argument when creating the factor with the factor
function. For example, in the murders dataset regions are ordered from
east to west. The function reorder lets us change the order of the levels
of a factor variable based on a summary computed on a numeric vector.
Example
r
For
doing
this
,
the value,
are
associated
with each level
Data Types in R
Not availables (NA)
There exists a special value called NA (“not available”) to handle missing data, among other
scenarios. We encounter NAs often as missing data is a very common problem in real-world datasets.
For example, in the following, there is no temperature measurement on day 2.
##
day temperature
## 1 d1
28
## 2 d2
NA
## 3 d3
31
Another example of NA is when a function tries to coerce one type to another that is not possible.
For example:
x <- c("1", "y", "3")
as.numeric(x)
## Warning: NAs introduced by coercion
## [1]
1 NA
3
R does not have any guesses for what number "y" can be, so it is not able to coerce it. More info
in the script!
Julien Gagneur
Lecture 1 - R Basics
32 / 38
Data Types in R
Further data types
Other data types in R include:
lists contain different types of data
matrices for two dimensional data
See the script for more information about them!
Julien Gagneur
Lecture 1 - R Basics
31 / 38
Lisis
they
Useful because
are
it
can
Store
any
µ
P
Combination of
Data frames
Extra [T
are
Use The
ndmes
dccessor
record
$
tu
Of IISTS
Case
Components
the
with variable
•
Special
of
a
another
Use
double
record 2
$
•
_
te
record
Square
[ [ " student
_
Id
"
}]
[[
]]
Using
variable
L
-
IIST
double
hannes
/
"
John
square
records [ [ 11 ]
Component
brackets
nvmbers
chtracter
WHHOUT
Select ed
rector with 5
list
I
•
a
.
stvdent id
/ ¡ gj
.
nvmber
:
operator
types
cbaracter
☐
D
different
:
"
,
1234 )
brackets
>
"
[[
John
'
1)
Matrices
Matrices are similar to data frames in that they are two-dimensional: they have rows
and columns. However, like numeric, character and logical vectors, entries in matrices
have to be all the same type. For this reason data frames are much more useful for
storing data, since we can have characters, factors, and numbers in them.
Yet matrices have a major advantage over data frames: we can perform matrix algebra
operations, a powerful type of mathematical technique.
Mat
L
Access
2nd
row
Mat
•
4
,
ncol =3
3rd
)
mat
Convertir
.
More than
Having
[
Mat
10
the
I
columnar
,
2
D
column
,
a
}
]
,
→
Matrix
data Frame
.
than
Colum
.
[
g
more
2 :3 ]
10
6
a
both
Subset
rows
columns
and
the
leaving
empty
spot
row
,
]
,
•
D
2nd row
[2
Entre
colvmn
]
}
,
empty
mat
ds
=
[ ]
3rd
,
[2
Entre
spot
•
nrow
,
specific entries
Use brackets
•
( 1:12
Matrix
-
9
10
12
data frame
in a
( Mat )
11
.
(
onverting
a
Matrix
in
a
data
frame
.
I
row
Coerción
'
attempt
to
R
Koppers fvntions
to
Chang
H
is
ds
25
.
.
an
chterzcter
numeric
lndexing
•
Logica /
operators
be
to
e
flexible with data
from
one
type
types
of data
to
an
other
.
SUMMARY
•
tal
print
•
Is ( )
•
? Log
•
OFCOMMANDS
# See at the
argsllog )
#
at
look
quick
steved
variables
the
my Workspace
in
arguments
Of
function
to
.
class ( a )
•
ldslabs )
library
•
•
lmvrdersl
#
•
str
•
head ( Murder s )
•
•
•
•
•
•
o
•
•
•
•
•
•
°
o
•
loading
#
loading
data ( Murder )
/ murders )
mames
( pop )
length
I
(
ds
,
nrow
data Frame (
1)
Creating
( )
#
ncol
,
=
of
in
categoria
in
a
specif
entry
y
anobject
reatas
in
Sequence
sort ( X )
Order ( x )
(x )
which
.
mdx (
# The
I
entry
with the
# Index of
murders
as
List / datafvame
variables
the
factor
or
entres
the
largest
largest
value
value
ir
a
]
1 :b
/ ék
stored in the table
|
nvmeric
Max
Same
the vector pop
vector
Creating
a
object
Lisi
a
# Convertir
a
lines
Six
character
.
.
#
=
an
for each of the
are
the
to
access
)
-
seq
ds
1:12
entres
see
Structure of
Components
name
Creating
#
Matrix
•
#
#
[ }
.
Many
a
first
show the
# reveale the
I
about
more
extiact the
# How
library
data
a
can
( mvrders $ region )
levels
as
we
#
popvlttion
murders $
list
#
Ending
dshobs
the
#
data Frame
•
o
•
☐
•
•
•
•
)
min (
which
rank
min
.
(
}
vector
with the rank of the
firsi Cary
I
)
match (
1
1
Table ( x )
insta II.
|
# returns
)
which (
%
L
.
# Tdke
pdckages
("
One
or
Multiple
Package Name
"
)
vector ,
and returns the
frequency
of each element
.
Base R
Cheat Sheet
Getting Help
Accessing the help files
?mean
Get help of a particular function.
help.search(‘weighted mean’)
Search the help files for a word or phrase.
help(package = ‘dplyr’)
Find help for a package.
More about an object
str(iris)
Get a summary of an object’s structure.
class(iris)
Find the class an object belongs to.
Using Packages
install.packages(‘dplyr’)
Download and install a package from CRAN.
library(dplyr)
Load the package into the session, making all
its functions available to use.
dplyr::select
Use a particular function from a package.
data(iris)
Load a built-in dataset into the environment.
Working Directory
setwd(‘C://file/path’)
c(2, 4, 6)
Join elements into
a vector
2 4 6
2:6
2 3 4 5 6
An integer
sequence
seq(2, 3, by=0.5)
2.0 2.5 3.0
A complex
sequence
While Loop
for (variable in sequence){
while (condition){
Do something
Do something
}
}
Example
Example
for (i in 1:4){
rep(1:2, times=3)
1 2 1 2 1 2
Repeat a vector
rep(1:2, each=3)
1 1 1 2 2 2
Repeat elements
of a vector
while (i < 5){
j <- i + 10
print(i)
print(j)
i <- i + 1
}
}
Vector Functions
sort(x)
Return x sorted.
table(x)
See counts of values.
rev(x)
Return x reversed.
unique(x)
See unique values.
Selecting Vector Elements
Functions
If Statements
function_name <- function(var){
if (condition){
Do something
} else {
Do something different
}
Do something
}
return(new_variable)
Example
Example
By Position
x[4]
square <- function(x){
if (i > 3){
print(‘Yes’)
The fourth element.
squared <- x*x
} else {
x[-4]
All but the fourth.
x[2:4]
Elements two to four.
x[-(2:4)]
All elements except
two to four.
x[c(1, 5)]
Elements one and
five.
print(‘No’)
return(squared)
}
}
Reading and Writing Data
Input
Also see the readr package.
Ouput
Description
write.table(df, ‘file.txt’)
Read and write a delimited text
file.
df <- read.csv(‘file.csv’)
write.csv(df, ‘file.csv’)
Read and write a comma
separated value file. This is a
special case of read.table/
write.table.
load(‘file.RData’)
save(df, file = ’file.Rdata’)
Read and write an R data file, a
file type special for R.
df <- read.table(‘file.txt’)
By Value
x[x == 10]
Elements which
are equal to 10.
x[x < 0]
All elements less
than zero.
x[x %in%
c(1, 2, 5)]
Change the current working directory.
Use projects in RStudio to set the working
directory to the folder you are working in.
For Loop
Creating Vectors
getwd()
Find the current working directory (where
inputs are found and outputs are sent).
Programming
Vectors
Elements in the set
1, 2, 5.
Named Vectors
x[‘apple’]
RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • mhairihmcneill@gmail.com
Element with
name ‘apple’.
Conditions
a == b
Are equal
a > b
Greater than
a >= b
Greater than
or equal to
is.na(a)
Is missing
a != b
Not equal
a < b
Less than
a <= b
Less than or
equal to
is.null(a)
Is null
Learn more at web page or vignette • package version • Updated: 3/15
Types
Converting between common data types in R. Can always go
from a higher value in the table to a lower value.
Matrices
Strings
m <- matrix(x, nrow = 3, ncol = 3)
Create a matrix from x.
paste(x, y, sep = ' ')
TRUE, FALSE, TRUE
as.numeric
Boolean values (TRUE or FALSE).
Integers or floating point
numbers.
1, 0, 1
as.character
'1', '0', '1'
Character strings. Generally
preferred to factors.
as.factor
'1', '0', '1',
levels: '1', '0'
Character strings with preset
levels. Needed for some
statistical models.
log(x)
Natural log.
sum(x)
Sum.
exp(x)
Exponential.
mean(x)
Mean.
max(x)
Largest element.
median(x)
Median.
min(x)
Smallest element.
quantile(x)
Percentage
quantiles.
Round to n decimal
places.
rank(x)
Round to n
significant figures.
var(x)
Correlation.
sd(x)
round(x, n)
signif(x, n)
cor(x, y)
Rank of elements.
The variance.
m[ , 1] - Select a column
m[2, 3] - Select an element
The standard
deviation.
> a <- 'apple'
> a
[1] 'apple'
l[1]
l$x
l['y']
Second element
of l.
New list with
only the first
element.
Element named
x.
New list with
only element
named y.
Also see the
dplyr package.
Data Frames
df <- data.frame(x = 1:3, y = c('a', 'b', 'c'))
A special case of a list where all elements are the same length.
x
y
1
a
2
b
The Environment
ls()
List all variables in the
environment.
rm(x)
Remove x from the
environment.
Matrix subsetting
df[ , 2]
df[2, ]
Remove all variables from the
environment.
You can use the environment panel in RStudio to
browse variables in your environment.
c
df[2, 2]
RStudio® is a trademark of RStudio, Inc. • CC BY Mhairi McNeill • mhairihmcneill@gmail.com • 844-448-1212 • rstudio.com
Join elements of a vector together.
Find regular expression matches in x.
gsub(pattern, replace, x) Replace matches in x with a string.
toupper(x)
Convert to uppercase.
tolower(x)
Convert to lowercase.
nchar(x)
Number of characters in a string.
Factors
l[[2]]
3
grep(pattern, x)
Lists
List subsetting
Variable Assignment
rm(list = ls())
w
w
w
ww
w
w
w
w
ww
w
w
w
w
ww
w
t(m)
Transpose
m %*% n
Matrix Multiplication
solve(m, n)
Find x in: m * x = n
] - Select a row
l <- list(x = 1:5, y = c('a', 'b'))
A list is a collection of elements which can be of different types.
Maths Functions
Join multiple vectors together.
paste(x, collapse = ' ')
m[2,
as.logical
Also see the stringr package.
df[[2]]
df$x
factor(x)
Turn a vector into a factor. Can
set the levels of the factor and
the order.
cut(x, breaks = 4)
Turn a numeric vector into a
factor by ‘cutting’ into
sections.
Statistics
lm(y ~ x, data=df)
Linear model.
glm(y ~ x, data=df)
Generalised linear model.
summary
Get more detailed information
out a model.
t.test(x, y)
Perform a t-test for
difference between
means.
pairwise.t.test
Perform a t-test for
paired data.
prop.test
Test for a
difference
between
proportions.
aov
Analysis of
variance.
Distributions
Understanding a data frame
View(df)
See the full data
frame.
head(df)
See the first 6
rows.
nrow(df)
Number of rows.
ncol(df)
Number of
columns.
dim(df)
Number of
columns and
rows.
cbind - Bind columns.
Random
Variates
Cumulative
Distribution
Quantile
Normal
rnorm
dnorm
pnorm
qnorm
Poisson
rpois
dpois
ppois
qpois
Binomial
rbinom
dbinom
pbinom
qbinom
Uniform
runif
dunif
punif
qunif
Plotting
rbind - Bind rows.
Density
Function
plot(x)
Values of x in
order.
Dates
Also see the ggplot2 package.
plot(x, y)
Values of x
against y.
hist(x)
Histogram of
x.
See the lubridate package.
Learn more at web page or vignette • package version • Updated: 3/15
Other functions to apply on numerical vectors
Curious about learning more R basics?
Read the first chapter and appendix of our script!
Actively participate in the exercise sessions
Ask questions on Slack
Practice with DataCamp
Julien Gagneur
Lecture 1 - R Basics
38 / 38
Download