Tidy Portfoliomanagement in R

advertisement
Sebastian Stöckl
Tidy Portfoliomanagement
in R
DEDICATION
Contents
List of Tables
vii
List of Tables
vii
List of Figures
ix
List of Figures
ix
0.1
Introduction
0.1.1
0.1.2
0.2
0.3
. . . . . . . . . . . . . . . . . . . .
xi
Introduction to Timeseries . . . . . . . . .
xi
0.1.1.1
Date and Time . . . . . . . . . .
xii
0.1.1.2
eXtensible Timeseries . . . . . .
xxii
0.1.1.3
Downloading timeseries and basic
visualization with quantmod . . xxvii
Introduction to the tidyVerse . . . . . .
xxx
0.1.2.1
Tibbles . . . . . . . . . . . . . .
xxx
0.1.2.2
Summary statistics
. . . . . . .
xxxiii
0.1.2.3
Plotting . . . . . . . . . . . . . .
xxxiii
Managing Data . . . . . . . . . . . . . . . . . . .
xxxvi
0.2.1
xxxvi
Getting Data . . . . . . . . . . . . . . . .
0.2.1.1
Downloading from Online Datasources . . . . . . . . . . . . . . xxxvi
0.2.1.2
Manipulate Data . . . . . . . . .
l
Exploring Data . . . . . . . . . . . . . . . . . . .
lvi
iii
iv
Contents
0.3.1
Plotting Data . . . . . . . . . . . . . . . .
lvii
0.3.1.1
Time-series plots . . . . . . . . .
lvii
0.3.1.2
Box-plots . . . . . . . . . . . . .
lvii
0.3.1.3
Histogram and Density Plots . .
lvii
0.3.1.4
Quantile Plots . . . . . . . . . .
lvii
Analyzing Data . . . . . . . . . . . . . . .
lvii
0.3.2.1
Calculating Statistics . . . . . .
lvii
0.3.2.2
Testing Data . . . . . . . . . . .
lvii
0.3.2.3
Exposure to Factors . . . . . . .
lvii
Managing Portfolios . . . . . . . . . . . . . . . .
lviii
0.4.1
. . . . . . . . . . . . . . . .
lviii
0.4.1.1
The portfolio.spec() Object .
lviii
0.4.1.2
Constraints . . . . . . . . . . . .
lxi
0.4.1.3
Objectives . . . . . . . . . . . .
lxviii
0.4.1.4
Solvers . . . . . . . . . . . . . .
lxxi
Mean-variance Portfolios . . . . . . . . . .
lxxiii
0.4.2.1
Introduction and Theoretics . . .
lxxiii
Mean-CVaR Portfolios . . . . . . . . . . .
lxxiv
Managing Portfolios in the Real World . . . . . .
lxxiv
0.5.1
Rolling Portfolios . . . . . . . . . . . . . .
lxxiv
0.5.2
Backtesting . . . . . . . . . . . . . . . . .
lxxiv
Further applications in Finance . . . . . . . . . .
lxxiv
0.6.1
Portfolio Sorts . . . . . . . . . . . . . . .
lxxiv
0.6.2
Fama-MacBeth-Regressions . . . . . . . .
lxxiv
0.6.3
Risk Indices . . . . . . . . . . . . . . . . .
lxxiv
0.3.2
0.4
0.4.2
0.4.3
0.5
0.6
0.7
Introduction
References
.0.1
. . . . . . . . . . . . . . . . . . . . .
lxxiv
Introduction to R . . . . . . . . . . . . . .
lxxiv
Contents
v
.0.1.1
Getting started . . . . . . . . . .
lxxv
.0.1.2
Working directory . . . . . . . .
lxxv
.0.1.3
Basic calculations . . . . . . . .
lxxvi
.0.1.4
Mapping variables . . . . . . . .
lxxvii
.0.1.5
Sequences, vectors and matrices
lxxx
.0.1.6
Vectors and matrices
lxxxiii
.0.1.7
Functions in R . . . . . . . . . .
lxxxv
.0.1.8
Plotting . . . . . . . . . . . . . .
lxxxviii
.0.1.9
Control Structures . . . . . . . .
lxxxix
. . . . . .
Bibliography
xci
Bibliography
xci
List of Tables
vii
List of Figures
Preface
This book should accompany my lectures “Research Methods”, “Quantitative Analysis”, “Portoliomanagement and Financial Analysis” and (to a smaller degree) “Empirical Methods in
Finance”. In the past years I have been a heavy promoter of the
Rmetrics1 tools for my lectures and research. However, in the last
year the development of the project has stagnated due to the tragic
death of its founder Prof. Dr. Diethelm Würtz2 . It therefore happened several times that code from past semesters and lectures has
stopped working and no more support for the project was available.
Also, in the past year I have started to be a heavy user of the
tidyverse3 and the financial packages that have been developed
on top (e.g. tidyquant). Therefore I have taken the chance, to
put together some material from my lectures and start writing this
book. In structure it is kept similar to the excellent RMetrics book
Würtz et al. (2015) on Portfolio Optimization with R/Rmetrics4 ,
that I have been heavily using and recommending to my students
in the past years!
1
https://www.rmetrics.org/
https://www.rmetrics.org/about
3
https://www.tidyverse.org/
4
https://www.rmetrics.org/ebooks-portfolio
2
ix
x
List of Figures
Why read this book
Because it may help my students :-)
Structure of the book
Not yet fixed. But the book will start with an introduction to the
most important tools for the portfolio analysis: timeseries and the
tidyverse. Afterwards, the possibilities of managing and exploring financial data will be developed. Then we do portfolio optimization for mean-Variance and Mean-CVaR portfolios. This will
be followed by a chapter on backtesting, before I show further applications in finance, such as predictions, portfolio sorting, FamaMacBeth-regressions etc.
Prerequisites
To start, install/load all necessary packages using the pacmanpackage (the list will be expanded with the growth of the book).
pacman::p_load(tidyverse,tidyquant,PortfolioAnalytics,quantmod,PerformanceAna
tibbletime,timetk,ggthemes,timeDate,Quandl,alphavantager,readx
DEoptim,pso,GenSA,Rglpk,ROI,ROI.plugin.glpk,ROI.plugin.quadpro
Acknowledgments
I thank my family…
I especially thank the developers of:
• the excellent fPortfolio-Book
• the tidyquant package and its vignettes
• the PerformanceAnalytics developers and the package vignettes
• the portfolioAnalytics developers (currently working very
hard on the package) and its package vignettes
Introduction
xi
Sebastian Stöckl University of Liechtenstein Vaduz, Liechtenstein
0.1
Introduction
0.1.1 Introduction to Timeseries
For an introduction to R see the Appendix @ref(ss_991IntrotoR)
Many of the datasets we will be working with have a (somehow regular) time dimension, and are therefore often called timeseries.
In R there are a variety of classes available to handle data, such
as vector, matrix, data.frame or their more modern implementation: tibble.[^According to the Vignette5 of the xts.] Adding a
time dimension creates a timeseries from these objects. The most
common/flexible package in R that handles timeseries based on
the first three formats is xts, which we will discuss in the following. Afterwards we will introduce the package timetk-package
that allows xts to interplay with tibbles to create the most powerful framework to handle (even very large) time-based datasets
(as we often encounter in finance).
The community is currently working heavily to develop timeaware tibbles to bring together the powerful grouping feature from the dplyr package (for tibbles) with the abbilities
of xts, which is the most powerful and most used timeseries
method in finance to date, due to the available interplay with
quantmod and other financial package. See also this link6 for
more information.
5
https://cran.r-project.org/web/packages/xts/vignettes/xts.pdf
xii
List of Figures
All information regarding tibbles and the financial universe is
summarized and updated on the business-science.io-Website7 .
In the following, we will define a variety of date and time classes,
before we go and introduce xts, tibble and tibbletime. Most
of this packages come with some excellent vignettes, that I will
reference for further reading, while I will only pickup the necessary
features for portfolio management, which is the focus of this book.
0.1.1.1 Date and Time
There some basic functionalities in base-R, but most of the time
we will need additional functions to perform all necessary tasks.
Available date (time) classes are Date, POSIXct, (chron), yearmon,
yearqtr and timeDate (from the Rmetrics bundle).
0.1.1.1.1
Basic Date and Time Classes
There are several Date and Time Classes in R that can all be used
as time-index for xts. We start with the most basic as.Date()
d1 <- "2018-01-18"
str(d1) # str() checks the structure of the R-object
##
chr "2018-01-18"
d2 <- as.Date(d1)
str(d2)
##
Date[1:1], format: "2018-01-18"
Introduction
xiii
In the second case, R automatically detects the format of the
Date-object, but if there is something more complex involved you
can specify the format (for all available format definitions, see
?strptime())
d3 <- "4/30-2018"
as.Date(d3, "%m/%d-%Y") # as.Date(d3) will not work
## [1] "2018-04-30"
If you are working with monthly or quarterly data, yearmon and
yearqtr will be your friends (both coming from the zoo-package
that serves as foundation for xts)
as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))
## [1] "Jan 2018"
## [1] "Apr 2018"
as.yearqtr(d1); as.yearqtr(as.Date(d3, "%m/%d-%Y"))
## [1] "2018 Q1"
## [1] "2018 Q2"
Note, that as.yearmon shows dates in terms of the current locale
of your computer (e.g. Austrian German). You can find out about
your locale with Sys.getlocale() and set a different locale with
Sys.setlocale()
xiv
List of Figures
Sys.setlocale("LC_TIME","German_Austria")
## [1] "German_Austria.1252"
as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))
## [1] "Jän 2018"
## [1] "Apr 2018"
Sys.setlocale("LC_TIME","English")
## [1] "English_United States.1252"
as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y"))
## [1] "Jan 2018"
## [1] "Apr 2018"
When your data wants you to also include information on time,
then you will either need the POSIXct (which is the basic package
behind all times and dates in R) or the timeDate-package. The
latter one includes excellent abilities to work with financial data
(see the next section). Note that talking about time also requires
you to talk about timezones! We start with several examples of
the POSIXct-class:
Introduction
xv
strptime("2018-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS") # converts from ch
## [1] "2018-01-15 13:55:23 CET"
as.POSIXct("2009-01-05 14:19:12", format="%Y-%m-%d %H:%M:%S", tz="UTC")
## [1] "2009-01-05 14:19:12 UTC"
We will mainly use the timeDate-package that provides many useful functions for financial timeseries.
An introduction to timeDate by the Rmetrics group can be
found at https://www.rmetrics.org/sites/default/files/201002-timeDateObjects.pdf.
Dates <- c("1989-09-28","2001-01-15","2004-08-30","1990-02-09")
Times <- c( "23:12:55", "10:34:02", "08:30:00", "11:18:23")
DatesTimes <- paste(Dates, Times)
as.Date(DatesTimes)
## [1] "1989-09-28" "2001-01-15" "2004-08-30" "1990-02-09"
as.timeDate(DatesTimes)
## GMT
## [1] [1989-09-28 23:12:55] [2001-01-15 10:34:02] [2004-08-30 08:30:00]
## [4] [1990-02-09 11:18:23]
xvi
List of Figures
You see, that the timeDate comes along with timezone information
(GMT) that is set to your computers locale. timeDate allows you
to specify the timezone of origin zone as well as the timezone to
transfer data to FinCenter:
timeDate(DatesTimes, zone = "Tokyo", FinCenter = "Zurich")
## Zurich
## [1] [1989-09-28 15:12:55] [2001-01-15 02:34:02] [2004-08-30 01:30:00]
## [4] [1990-02-09 03:18:23]
timeDate(DatesTimes, zone = "Tokyo", FinCenter = "NewYork")
## NewYork
## [1] [1989-09-28 10:12:55] [2001-01-14 20:34:02] [2004-08-29 19:30:00]
## [4] [1990-02-08 21:18:23]
timeDate(DatesTimes, zone = "NewYork", FinCenter = "Tokyo")
## Tokyo
## [1] [1989-09-29 12:12:55] [2001-01-16 00:34:02] [2004-08-30 21:30:00]
## [4] [1990-02-10 01:18:23]
listFinCenter("Europe/Vi*") # get a list of all financial centers available
## [1] "Europe/Vaduz"
## [4] "Europe/Vilnius"
"Europe/Vatican"
"Europe/Vienna"
"Europe/Volgograd"
Introduction
xvii
Date as well as the timeDate package allow you to create time
sequences (necessary if you want to manually create timeseries)
dates1 <- seq(as.Date("2017-01-01"), length=12, by="month"); dates1 # or to=
## [1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01" "2017-05-01"
## [6] "2017-06-01" "2017-07-01" "2017-08-01" "2017-09-01" "2017-10-01"
## [11] "2017-11-01" "2017-12-01"
dates2 <- timeSequence(from = "2017-01-01", to = "2017-12-31", by = "month");
## GMT
## [1] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01] [2017-05-01]
## [6] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01] [2017-10-01]
## [11] [2017-11-01] [2017-12-01]
Now there are several very useful functions in the timeDate package to determine first/last days of months/quarters/… (I let them
speak for themselves)
timeFirstDayInMonth(dates1 -7) # btw check the difference between "dates1-7"
## GMT
## [1] [2016-12-01] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01]
## [6] [2017-05-01] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01]
## [11] [2017-10-01] [2017-11-01]
xviii
List of Figures
timeFirstDayInQuarter(dates1)
## GMT
## [1] [2017-01-01] [2017-01-01] [2017-01-01] [2017-04-01] [2017-04-01]
## [6] [2017-04-01] [2017-07-01] [2017-07-01] [2017-07-01] [2017-10-01]
## [11] [2017-10-01] [2017-10-01]
timeLastDayInMonth(dates1)
## GMT
## [1] [2017-01-31] [2017-02-28] [2017-03-31] [2017-04-30] [2017-05-31]
## [6] [2017-06-30] [2017-07-31] [2017-08-31] [2017-09-30] [2017-10-31]
## [11] [2017-11-30] [2017-12-31]
timeLastDayInQuarter(dates1)
## GMT
## [1] [2017-03-31] [2017-03-31] [2017-03-31] [2017-06-30] [2017-06-30]
## [6] [2017-06-30] [2017-09-30] [2017-09-30] [2017-09-30] [2017-12-31]
## [11] [2017-12-31] [2017-12-31]
timeNthNdayInMonth("2018-01-01",nday = 5, nth = 3) # useful for option expir
## GMT
## [1] [2018-01-19]
Introduction
xix
timeNthNdayInMonth(dates1,nday = 5, nth = 3)
## GMT
## [1] [2017-01-20] [2017-02-17] [2017-03-17] [2017-04-21] [2017-05-19]
## [6] [2017-06-16] [2017-07-21] [2017-08-18] [2017-09-15] [2017-10-20]
## [11] [2017-11-17] [2017-12-15]
If one wants to create a more specific sequence of times, this can
be done with timeCalendar using time ‘atoms’:
timeCalendar(m = 1:4, d = c(28, 15, 30, 9), y = c(1989, 2001, 2004, 1990), F
## Europe/Zurich
## [1] [1989-01-28 01:00:00] [2001-02-15 01:00:00] [2004-03-30 02:00:00]
## [4] [1990-04-09 02:00:00]
timeCalendar(d=1, m=3:4, y=2018, h = c(9, 14), min = c(15, 23), s=c(39,41),
## Europe/Zurich
## [1] [2018-03-01 10:15:39] [2018-04-01 16:23:41]
0.1.1.1.2
Week-days and Business-days
One of the most important functionalities only existing in the
timeDate-package is the possibility to check for business days in
almost any timezone. The most important ones can be called by
holidayXXX()
xx
List of Figures
holidayNYSE()
## NewYork
## [1] [2018-01-01] [2018-01-15] [2018-02-19] [2018-03-30] [2018-05-28]
## [6] [2018-07-04] [2018-09-03] [2018-11-22] [2018-12-25]
holiday(year = 2018, Holiday = c("GoodFriday","Easter","FRAllSaints"))
## GMT
## [1] [2018-03-30] [2018-04-01] [2018-11-01]
dateSeq <- timeSequence(Easter(year(Sys.time()), -14), Easter(year(Sys.time(
##
##
##
##
##
##
##
GMT
[1]
[6]
[11]
[16]
[21]
[26]
[2018-03-18]
[2018-03-23]
[2018-03-28]
[2018-04-02]
[2018-04-07]
[2018-04-12]
[2018-03-19]
[2018-03-24]
[2018-03-29]
[2018-04-03]
[2018-04-08]
[2018-04-13]
[2018-03-20]
[2018-03-25]
[2018-03-30]
[2018-04-04]
[2018-04-09]
[2018-04-14]
[2018-03-21]
[2018-03-26]
[2018-03-31]
[2018-04-05]
[2018-04-10]
[2018-04-15]
[2018-03-22]
[2018-03-27]
[2018-04-01]
[2018-04-06]
[2018-04-11]
dateSeq2 <- dateSeq[isWeekday(dateSeq)]; dateSeq2 # select only weekdays
## GMT
## [1]
## [6]
## [11]
## [16]
[2018-03-19]
[2018-03-26]
[2018-04-02]
[2018-04-09]
[2018-03-20]
[2018-03-27]
[2018-04-03]
[2018-04-10]
[2018-03-21]
[2018-03-28]
[2018-04-04]
[2018-04-11]
[2018-03-22]
[2018-03-29]
[2018-04-05]
[2018-04-12]
[2018-03-23]
[2018-03-30]
[2018-04-06]
[2018-04-13]
Introduction
xxi
dayOfWeek(dateSeq2)
##
##
##
##
##
##
##
##
2018-03-19
"Mon"
2018-03-27
"Tue"
2018-04-04
"Wed"
2018-04-12
"Thu"
2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26
"Tue"
"Wed"
"Thu"
"Fri"
"Mon"
2018-03-28 2018-03-29 2018-03-30 2018-04-02 2018-04-03
"Wed"
"Thu"
"Fri"
"Mon"
"Tue"
2018-04-05 2018-04-06 2018-04-09 2018-04-10 2018-04-11
"Thu"
"Fri"
"Mon"
"Tue"
"Wed"
2018-04-13
"Fri"
dateSeq3 <- dateSeq[isBizday(dateSeq, holidayZURICH(year(Sys.time())))]; dat
## GMT
## [1]
## [6]
## [11]
## [16]
[2018-03-19]
[2018-03-26]
[2018-04-04]
[2018-04-11]
[2018-03-20]
[2018-03-27]
[2018-04-05]
[2018-04-12]
[2018-03-21] [2018-03-22] [2018-03-23]
[2018-03-28] [2018-03-29] [2018-04-03]
[2018-04-06] [2018-04-09] [2018-04-10]
[2018-04-13]
dayOfWeek(dateSeq3)
## 2018-03-19 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26
##
"Mon"
"Tue"
"Wed"
"Thu"
"Fri"
"Mon"
## 2018-03-27 2018-03-28 2018-03-29 2018-04-03 2018-04-04 2018-04-05
##
"Tue"
"Wed"
"Thu"
"Tue"
"Wed"
"Thu"
## 2018-04-06 2018-04-09 2018-04-10 2018-04-11 2018-04-12 2018-04-13
##
"Fri"
"Mon"
"Tue"
"Wed"
"Thu"
"Fri"
Now, one of the strongest points for the timeDate package is made,
when one puts times and dates from different timezones together.
xxii
List of Figures
This could be a challenging task (imagine hourly stock prices from
London, Tokyo and New York). Luckily the timeDate-package can
handle this easily:
ZH <- timeDate("2015-01-01 16:00:00", zone = "GMT", FinCenter = "Zurich")
NY <- timeDate("2015-01-01 18:00:00", zone = "GMT", FinCenter = "NewYork")
c(ZH, NY)
## Zurich
## [1] [2015-01-01 17:00:00] [2015-01-01 19:00:00]
c(NY, ZH) # it always takes the Financial Center of the first entry
## NewYork
## [1] [2015-01-01 13:00:00] [2015-01-01 11:00:00]
0.1.1.1.3
Assignments
Create a daily time series for 2018:
1. Find the subset of first and last days per month/quarter
(uniquely)
2. Take December 2017 and remove all weekends and holidays in Zurich (Tokyo)
3. create a series of five dates & times in New York. Show
them for New York, London and Belgrade
0.1.1.2 eXtensible Timeseries
The xts format is based on the timeseries format zoo, but extends
its power to be more compatible with other data classes. For example, if one converts dates from the timeDate, xts will be so
Introduction
xxiii
flexible as to memorize the financial center the dates were coming
from and upon retransformation to this class will be reassigned
values that would have been lost upon transformation to a pure
zoo-object. As quite often we (might) want to transform our data
to and from xts this is a great feature and makes our lifes a lot
easier. Also xts comes with a bundle of other features.
For
the
reader
who
wants
to
dig
deeper,
we
recommend
the
excellent
zoo
vignettes
(vignette("zoo-quickref"),
vignette("zoo"),
vignette("zoo-faq"),
vignette("zoo-design")
and
vignette("zoo-read")). Read up on xts in vignette("xts")
and vignette("xts-faq").
To start, we create an xts object consisting of a series of randomly
created data points:
data <- rnorm(5) # 5 std. normally distributed random numbers
dates <- seq(as.Date("2017-05-01"), length=5, by="days")
xts1 <- xts(x=data, order.by=dates); xts1
##
##
##
##
##
##
2017-05-01
2017-05-02
2017-05-03
2017-05-04
2017-05-05
[,1]
0.72838032
0.47100977
-0.04537768
1.61845234
0.07191067
coredata(xts1) # access data
xxiv
##
##
##
##
##
##
[1,]
[2,]
[3,]
[4,]
[5,]
List of Figures
[,1]
0.72838032
0.47100977
-0.04537768
1.61845234
0.07191067
index(xts1)
# access time (index)
## [1] "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05"
Here, the xts object was built from a vector and a series of Dates.
We could also have used timeDate, yearmon or yearqtr and a
data.frame:
s1 <- rnorm(5); s2 <- 1:5
data <- data.frame(s1,s2)
dates <- timeSequence("2017-01-01",by="months",length.out=5,zone = "GMT")
xts2 <- xts(x=data, order.by=dates); xts2
## Warning: timezone of object (GMT) is different than current timezone ().
##
##
##
##
##
##
s1 s2
2017-01-01 0.7462329 1
2017-02-01 -0.1551448 2
2017-03-01 -0.9693310 3
2017-04-01 0.3428151 4
2017-05-01 0.4692079 5
Introduction
xxv
dates2 <- as.yearmon(dates)
xts3 <- xts(x=data, order.by = dates2)
In the next step we evaluate the merging of two timeseries:
set.seed(1)
xts3 <- xts(rnorm(6), timeSequence(from = "2017-01-01", to = "2017-06-01", b
xts4 <- xts(rnorm(5), timeSequence(from = "2017-04-01", to = "2017-08-01", b
colnames(xts3) <- "tsA"; colnames(xts4) <- "tsB"
merge(xts3,xts4)
Please be aware that joining timeseries in R does sometimes want
you to do a left/right/inner/outer join of the two objects
merge(xts3,xts4,join = "left")
merge(xts3,xts4,join = "right")
merge(xts3,xts4,join = "inner")
merge(xts3,xts4,join="outer",fill=0)
In the next step, we subset and replace parts of xts objects
xts5 <- xts(rnorm(24), timeSequence(from = "2016-01-01", to = "2017-12-01",
xts5["2017-01-01"]
xts5["2017-05-01/2017-08-12"]
xts5[c("2017-01-01","2017-05-01")] <- NA
xts5["2016"] <- 99
xts5["2016-05-01/"]
first(xts5)
last(xts5)
first(xts5,"3 months")
xts6 <- last(xts5,"1 year")
xxvi
List of Figures
Now let us handle the missing value we introduced. One possibility is just to omit the missing value using na.omit(). Other
possibilities would be to use the last value na.locf() or linear
interpolation with na.approx()
na.omit(xts6)
na.locf(xts6)
na.locf(xts6,fromLast = TRUE,na.rm = TRUE)
na.approx(xts6,na.rm = FALSE)
Finally, standard calculations can be done on xts objects, AND
there are some pretty helper functions to make life easier
periodicity(xts5)
nmonths(xts5); nquarters(xts5); nyears(xts5)
to.yearly(xts5)
to.quarterly(xts6)
round(xts6^2,2)
xts6[which(is.na(xts6))] <- rnorm(2)
# For aggregation of timeseries
ep1 <- endpoints(xts6,on="months",k = 2) # for aggregating timesries
ep2 <- endpoints(xts6,on="months",k = 3) # for aggregating timesries
period.sum(xts6, INDEX = ep2)
period.apply(xts6, INDEX = ep1, FUN=mean) # 2month means
period.apply(xts6, INDEX = ep2, FUN=mean) # 3month means
# Lead, lag and diff operations
cbind(xts6,lag(xts6,k=-1),lag(xts6,k=1),diff(xts6))
Finally, I will show some applications that go beyond xts, for example the use of lapply to operate on a list
Introduction
xxvii
# splitting timeseries (results is a list)
xts6_yearly <- split(xts5,f="years")
lapply(xts6_yearly,FUN=mean,na.rm=TRUE)
# using elaborate functions from the zoo-package
rollapply(as.zoo(xts6), width=3, FUN=sd) # rolling standard deviation
Last and least, we plot xts data and save it to a (csv) file, then
open it again:
tmp <- tempfile()
write.zoo(xts2,sep=",",file = tmp)
xts8 <- as.xts(read.zoo(tmp, sep=",", FUN=as.yearmon))
plot(xts8)
0.1.1.3 Downloading timeseries and basic visualization with quantmod
Many downloading and plotting functions are (still) available in
quantmod. We first require the package, then download data for
Google, Apple and the S&P500 from yahoo finance. Each of these
“Symbols” will be downloaded into its own “environment”. For
plotting there are a large variety of technical indicators available,
for an overview see here8 .
Quantmod is developed by Jeffrey Ryan and Joshua Ulrich9 and
has a homepage10 . The homepage includes an Introduction11 ,
describes how Data can be handled between xts and quantmod12
and has examples about Financial Charting with quantmod and
TTR13 . More documents will be developed within 2018.
8
https://www.r-bloggers.com/a-guide-on-r-quantmod-package-how-toget-started/
xxviii
List of Figures
require(quantmod)
# the easiest form of getting data is for yahoo finance where you know the
getSymbols(Symbols = "AAPL", from="2010-01-01", to="2018-03-01", periodicity=
head(AAPL)
is.xts(AAPL)
plot(AAPL[, "AAPL.Adjusted"], main = "AAPL")
chartSeries(AAPL, TA=c(addVo(),addBBands(), addADX())) # Plot and add techni
getSymbols(Symbols = c("GOOG","^GSPC"), from="2000-01-01", to="2018-03-01", p
getSymbols('DTB3', src='FRED') # fred does not recognize from and to
Now we create an xts from all relevant parts of the data
stocks <- cbind("Apple"=AAPL[,"AAPL.Adjusted"],"Google"=GOOG[,"GOOG.Adjusted"
rf.daily <- DTB3["2010-01-01/2018-03-01"]
rf.monthly <- to.monthly(rf.daily)[,"rf.daily.Open"]
rf <- xts(coredata(rf.monthly),order.by = as.Date(index(rf.monthly)))
One possibility (that I adopted from (here)[https://www.
quantinsti.com/blog/an-example-of-a-trading-strategy-coded-inr/]) is to use the technical indicators provided by quantmod to
devise a technical trading strategy. We make use of a fast and
a slow moving average (function MACD in the TTR package that
belongs to quantmod). Whenever the fast moving average crosses
the slow moving one from below, we invest (there is a short term
trend to exploit) and we drop out of the investment once the red
(fast) line falls below the grey (slow) line. To evaluate the trading
strategy we need to also calculate returns for the S&P500 index
using ROC.
chartSeries(GSPC, TA=c(addMACD(fast=3, slow=12,signal=6,type=SMA)))
macd <- MACD(GSPC[,"GSPC.Adjusted"], nFast=3, nSlow=12,nSig=6,maType=SMA, per
buy_sell_signal <- Lag(ifelse(macd$macd < macd$signal, -1, 1))
Introduction
xxix
buy_sell_returns <- (ROC(GSPC[,"GSPC.Adjusted"])*buy_sell_signal)["2001-06-01
portfolio <- exp(cumsum(buy_sell_returns)) # for nice plotting we assume tha
plot(portfolio)
For
evaluation
of
trading
strategies/portfolios
and
other financial timeseries, almost every tool is available
through the package PerformanceAnalytics. In this case
charts.PerformanceSummary() calculates cumulative returns
(similar to above), monthly returns and maximum drawdown
(maximum loss in relation to best value, see here14 .
PerformanceAnalytics is a large package with an uncountable
variety of Tools. There are vignettes on the estimation of higher
order (co)moments vignette("EstimationComoments"), performance attribution measures according to Bacon (2008)
vignette("PA-Bacon"), charting vignette("PA-charts") and
more that can be found on the PerformanceAnalytics cran
page15 .
require(PerformanceAnalytics)
rets <- cbind(buy_sell_returns,ROC(GSPC[,"GSPC.Adjusted"]))
colnames(rets) <- c("investment","benchmark")
charts.PerformanceSummary(rets,colorset=rich6equal)
chart.Histogram(rets, main = "Risk Measures", methods = c("add.density", "ad
14
https://de.wikipedia.org/wiki/Maximum_Drawdown
xxx
List of Figures
0.1.2 Introduction to the tidyVerse
0.1.2.1 Tibbles
Since the middle of 2017 a lot of programmers have put in a huge
effort to rewrite many r functions and data objects in a tidy way
and thereby created the tidyverse16 .
For updates check the tidyverse homepage17 . A very well written
book introducing the tidyverse can be found online: R for Data
Science18 . The core of the tidyverse currently contains several
packages:
– ggplot2 for creating powerful graphs19 (see
vignette("ggplot2-specs"))
– dplyr
for
data
manipulation20
(see
vignette("dplyr"))
– tidyr for tidying data21
– readr
for
importing
datasets22
(see
vignette("readr"))
– purrr for programming23 (see the “)
– tibble
for
modern
data.frames24
(see
vignette("tibble"))
the
the
the
the
and many more25 .
require(tidyverse) # install first if not yet there, update regularly: insta
require(tidyquant) # this package wraps all the quantmod etc packages into t
Most of the following is adapted from “Introduction to Statistical
Learning with Applications in R” by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani at http://www.science.
16
https://www.tidyverse.org/
Introduction
xxxi
smith.edu/~jcrouser/SDS293/labs/. We begin by loading in the
Auto data set. This data is part of the ISLR package.
require(ISLR)
data(Auto)
Nothing happens when you run this, but now the data is available in your environment. (In RStudio, you would see the name
of the data in your Environment tab). To view the data, we can
either print the entire dataset by typing its name, or we can “slice”
some of the data off to look at just a subset by piping data using the %>% operator into the slice function. The piping operator
is one of the most useful tools of the tidyverse. Thereby you can
pipe command into command into command without saving and
naming each Intermittent step. The first step is to transform this
data.frame into a tibble (similar concept but better26 ). A tibble
has observations in rows and variables in columns. Those variables
can have many different formats:
Auto %>% slice(1:10)
tbs1 <- tibble(
Date = seq(as.Date("2017-01-01"), length=12, by="months"),
returns = rnorm(12),
letters = sample(letters, 12, replace = TRUE)
)
As you can see all three columns of tbs1 have different formats.
One can get the different variables by name and position. If you
want to use the pipe operator you need to use the special placeholder ..
26
http://r4ds.had.co.nz/tibbles.html
xxxii
List of Figures
tbs1$returns
tbs1[[2]]
tbs1 %>% .[[2]]
Before we go on an analysis a large tibble such as Auto, we quickly
talk about reading and saving files with tools from the tidyverse.
We save the file as csv using write_csv and read it back using
read_csv. because the columns of the read file are not in the exact
format as before, we use mutate to transform the columns.
Auto <- as.tibble(Auto) # make tibble from Auto
tmp <- tempfile()
write_csv(Auto,path = tmp) # write
Auto2 <- read_csv(tmp)
Auto2 <- Auto2 %>%
mutate(cylinders=as.double(cylinders),horsepower=as.double(horsepower),yea
all.equal(Auto,Auto2) # only the factor levels differ
Notice that the data looks just the same as when we loaded it from
the package. Now that we have the data, we can begin to learn
things about it.
dim(Auto)
str(Auto)
names(Auto)
The dim() function tells us that the data has 392 observations and
nine variables. The original data had some empty rows, but when
we read the data in R knew to ignore them. The str() function
tells us that most of the variables are numeric or integer, although
the name variable is a character vector. names() lets us check the
variable names.
Introduction
xxxiii
0.1.2.2 Summary statistics
Often, we want to know some basic things about variables in our
data. summary() on an entire dataset will give you an idea of some
of the distributions of your variables. The summary() function produces a numerical summary of each variable in a particular data
set.
summary(Auto)
The summary suggests that origin might be better thought of
as a factor. It only seems to have three possible values, 1, 2 and
3. If we read the documentation about the data (using ?Auto)
we will learn that these numbers correspond to where the car is
from: 1. American, 2. European, 3. Japanese. So, lets mutate()
that variable into a factor (categorical) variable.
Auto <- Auto %>%
mutate(origin = factor(origin))
summary(Auto)
0.1.2.3 Plotting
We can use the ggplot2 package to produce simple graphics.
ggplot2 has a particular syntax, which looks like this
ggplot(Auto) + geom_point(aes(x=cylinders, y=mpg))
The basic idea is that you need to initialize a plot with ggplot()
and then add “geoms” (short for geometric objects) to the plot.
xxxiv
List of Figures
The ggplot2 package is based on the Grammar of Graphics27 ,
a famous book on data visualization theory. It is a way to map
attributes in your data (like variables) to “aesthetics” on the plot.
The parameter aes() is short for aesthetic.
For more about the ggplot2 syntax, view the help by typing
?ggplot or ?geom_point. There are also great online resources
for ggplot2, like the R graphics cookbook28 .
The cylinders variable is stored as a numeric vector, so R has
treated it as quantitative. However, since there are only a small
number of possible values for cylinders, one may prefer to treat it
as a qualitative variable. We can turn it into a factor, again using
a mutate() call.
Auto = Auto %>%
mutate(cylinders = factor(cylinders))
To view the relationship between a categorical and a numeric variable, we might want to produce boxplots. As usual, a number of
options can be specified in order to customize the plots.
ggplot(Auto) + geom_boxplot(aes(x=cylinders, y=mpg)) + xlab("Cylinders") + y
The geom geom_histogram() can be used to plot a histogram.
27
https://www.google.com/url?sa=t&rct=j&q=
&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=
0ahUKEwjV6I6F4ILPAhUFPT4KHTFiBwgQFggcMAA&
url=https%3A%2F%2Fwww.amazon.com%2FGrammarGraphics-Statistics-Computing%2Fdp%2F0387245448&
usg=AFQjCNF5D6H3ySCsgqBTdp96KNF3bGyU2Q&sig2=
GnNgoN6Ztn3AJSTJYaMPwA
28
http://www.cookbook-r.com/Graphs/
Introduction
xxxv
ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5)
For small datasets, we might want to see all the bivariate relationships between the variables. The GGally package has an extension of the scatterplot matrix that can do just that. We make use
of the select operator to only select the two variables mpg and
cylinders and pipe it into the ggpairs() function
Auto %>% select(mpg, cylinders) %>% GGally::ggpairs()
Because there are not many cars with 3 and 5 cylinders we use
filter to only select those cars with 4, 6 and 8 cylinders.
Auto %>% select(mpg, cylinders) %>% filter(cylinders %in% c(4,6,8)) %>% GGal
Sometimes, we might want to save a plot for use outside of R. To
do this, we can use the ggsave() function.
ggsave("histogram.png",ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5)
TO DO: * Tidyquant: Document more technical features. * For
extensive manipulations a la timeseries, there is an extension of the
tibble objects: time aware tibbles, that allow for many of the
xts functionality without the necessary conversion tibbletime29 .
29
https://github.com/business-science/tibbletime
xxxvi
0.2
List of Figures
Managing Data
In this chapter we will learn how to download/import data from
various sources. Most importantly we will use the quantmod library
through tidyquant to download financial data from a variety of
sources. We will also lear how to import ‘.xlsx’ (Excel) files.
0.2.1 Getting Data
0.2.1.1 Downloading from Online Datasources
The tidyquant package comes with a variiety of readily compiled
datasets/datasources. For whole collections of data, there are the
following commands available
tq_exchange_options() # find all exchanges available
## [1] "AMEX"
"NASDAQ" "NYSE"
tq_index_options() # find all indices available
## [1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW"
## [6] "SP400"
"SP500"
"SP600"
"SP1000"
"DOWGLOBAL"
tq_get_options() # find all data sources available
## [1] "stock.prices"
## [4] "financials"
## [7] "splits"
## [10] "metal.prices"
## [13] "alphavantager"
"stock.prices.google"
"key.ratios"
"economic.data"
"quandl"
"rblpapi"
"stock.prices.japan"
"dividends"
"exchange.rates"
"quandl.datatable"
Managing Data
xxxvii
The commands tq_exchange() and tq_index() will now get you
all symbols and some additional information on the stock listed at
that exchange or contained in that index.30
glimpse(sp500)
##
##
##
##
##
##
##
Observations:
Variables: 5
$ symbol
$ company
$ weight
$ sector
$ shares_held
504
<chr>
<chr>
<dbl>
<chr>
<dbl>
"AAPL", "MSFT", "AMZN", "BRK.B", "FB",
"Apple Inc.", "Microsoft Corporation",
0.044387857, 0.035053855, 0.032730459,
"Information Technology", "Information
53939268, 84297440, 4418447, 21117048,
"JPM", "JNJ...
"Amazon.com...
0.016868330...
Technology"...
26316160, 3...
glimpse(nyse)
##
##
##
##
##
##
##
##
##
Observations: 3,139
Variables: 7
$ symbol
<chr>
$ company
<chr>
$ last.sale.price <dbl>
$ market.cap
<chr>
$ ipo.year
<dbl>
$ sector
<chr>
$ industry
<chr>
"DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "...
"3D Systems Corporation", "3M Company", "500.c...
18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ...
"$2.11B", "$121.26B", "$491.85M", "$10.06B", "...
NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA...
"Technology", "Health Care", "Consumer Service...
"Computer Software: Prepackaged Software", "Me...
glimpse(nasdaq)
30
Note that tq_index() unfortunately makes use of the package XLConnect
that requires Java to be installed on your system.
xxxviii
##
##
##
##
##
##
##
##
##
Observations: 3,405
Variables: 7
$ symbol
<chr>
$ company
<chr>
$ last.sale.price <dbl>
$ market.cap
<chr>
$ ipo.year
<dbl>
$ sector
<chr>
$ industry
<chr>
List of Figures
"YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ...
"111, Inc.", "1347 Property Insurance Holdings...
13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ...
NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8...
2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ...
NA, "Finance", "Finance", "Finance", "Consumer...
NA, "Property-Casualty Insurers", "Property-Ca...
The datset we will be using consists of the ten largest stocks within
the S&P500 that had an IPO before January 2000. Therefore we
need to merge both datasets using inner_join() because we only
want to keep symbols from the S&P500 that are also traded on
NYSE or NASDAQ:
stocks.selection <- sp500 %>%
inner_join(rbind(nyse,nasdaq) %>% select(symbol,last.sale.price,market.cap
filter(ipo.year<2000&!is.na(market.cap)) %>% # filter years with ipo<2000
arrange(desc(weight)) %>% # sort in descending order
slice(1:10)
The ten largest stocks in the S&P500 with a history longer than
January 2000.
symbol
company
weight
sector
shares_held
last.sale.price
market.cap
ipo.year
Managing Data
AAPL
Apple Inc.
0.044
Information Technology
53939268
221.07
$1067.75B
1980
MSFT
Microsoft Corporation
0.035
Information Technology
84297440
111.71
$856.62B
1986
AMZN
Amazon.com Inc.
0.033
Consumer Discretionary
4418447
1990.00
$970.6B
1997
CSCO
Cisco Systems Inc.
0.009
xxxix
xl
Information Technology
51606584
46.89
$214.35B
1990
NVDA
NVIDIA Corporation
0.007
Information Technology
6659463
268.20
$163.07B
1999
ORCL
Oracle Corporation
0.006
Information Technology
32699620
49.34
$196.43B
1986
AMGN
Amgen Inc.
0.005
Health Care
7306144
199.50
List of Figures
Managing Data
xli
$129.13B
1983
ADBE
Adobe Systems Incorporated
0.005
Information Technology
5402625
267.79
$131.13B
1986
QCOM
QUALCOMM Incorporated
0.004
Information Technology
15438597
71.75
$105.41B
1991
GILD
Gilead Sciences Inc.
0.004
Health Care
14310276
73.97
$95.89B
1992
In a next step, we will download stock prices from yahoo.
xlii
List of Figures
Data from that source usually comes in the OHLC format
(open,high,low,close) with additional information (volume, adjusted). We will additionall download data for the S&P500-index
itself. Note, that we get daily prices:
stocks.prices <- stocks.selection$symbol %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
index.prices <- "^GSPC" %>%
tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31")
stocks.prices %>% slice(1:2) # show the first two entries of each group
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
# A tibble: 20 x 8
# Groups:
symbol [10]
symbol date
open
high
<chr> <date>
<dbl> <dbl>
1 AAPL
2000-01-03 3.75
4.02
2 AAPL
2000-01-04 3.87
3.95
3 ADBE
2000-01-03 16.8
16.9
4 ADBE
2000-01-04 15.8
16.5
5 AMGN
2000-01-03 70
70
6 AMGN
2000-01-04 62
64.1
7 AMZN
2000-01-03 81.5
89.6
8 AMZN
2000-01-04 85.4
91.5
9 CSCO
2000-01-03 55.0
55.1
10 CSCO
2000-01-04 52.8
53.5
11 GILD
2000-01-03 1.79
1.80
12 GILD
2000-01-04 1.70
1.72
13 MSFT
2000-01-03 58.7
59.3
14 MSFT
2000-01-04 56.8
58.6
15 NVDA
2000-01-03 3.94
3.97
16 NVDA
2000-01-04 3.83
3.84
17 ORCL
2000-01-03 31.2
31.3
18 ORCL
2000-01-04 28.9
29.7
19 QCOM
2000-01-03 99.6 100
20 QCOM
2000-01-04 86.3
87.7
low
<dbl>
3.63
3.61
16.1
15.0
62.9
57.7
79.0
81.8
51.8
50.9
1.72
1.66
56
56.1
3.68
3.60
27.9
26.2
87
80
close
volume adjusted
<dbl>
<dbl>
<dbl>
4.00 133949200
3.54
3.66 128094400
3.24
16.4
7384400
16.1
15.0
7813200
14.8
62.9
22914900
53.5
58.1
15052600
49.4
89.4
16117600
89.4
81.9
17487400
81.9
54.0
53076000
43.6
51
50805600
41.2
1.76 54070400
1.61
1.68 38960000
1.54
58.3
53228400
42.5
56.3
54119000
41.0
3.90
7522800
3.61
3.80
7512000
3.51
29.5
98114800
26.4
26.9 116824800
24.0
89.7
91334000
65.7
81.0
63567400
59.4
Managing Data
xliii
Dividends and stock splits can also be downloaded:
stocks.dividends <- stocks.selection$symbol %>%
tq_get(get = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
stocks.splits <- stocks.selection$symbol %>%
tq_get(get = "splits",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
We additionally can download financial for the different stocks.
Therein we have key ratios (financials, profitability, growth,
cash flow, financial health, efficiency ratios and valuation
ratios). These ratios are from Morningstar31 and come in a nested
form, that we will have to ‘dig out’ using unnest.
stocks.ratios <- stocks.selection$symbol %>%
tq_get(get = "key.ratios",from = "2000-01-01",to = "2017-12-31") %>%
group_by(symbol)
## # A tibble: 42 x 3
## # Groups:
symbol [6]
##
symbol section
##
<chr> <chr>
## 1 AAPL
Financials
## 2 AAPL
Profitability
## 3 AAPL
Growth
## 4 AAPL
Cash Flow
## 5 AAPL
Financial Health
## 6 AAPL
Efficiency Ratios
## 7 AAPL
Valuation Ratios
## 8 MSFT
Financials
## 9 MSFT
Profitability
31
http://www.morningstar.com/
data
<list>
<tibble
<tibble
<tibble
<tibble
<tibble
<tibble
<tibble
<tibble
<tibble
[150 x 5]>
[170 x 5]>
[160 x 5]>
[50 x 5]>
[240 x 5]>
[80 x 5]>
[40 x 5]>
[150 x 5]>
[170 x 5]>
xliv
List of Figures
## 10 MSFT
Growth
## # ... with 32 more rows
<tibble [160 x 5]>
We find that financial ratios are only available for a subset of
the ten stocks. We first filter for the ‘Growth’-information, then
unnest() the nested tibbles and filter again for ‘EPS %’ and the
‘Year over Year’ information. Then we use ggplot() to plot the
timeseries of Earnings per Share for the different companies.
stocks.ratios %>% filter(section=="Growth") %>% unnest() %>%
filter(sub.section=="EPS %",category=="Year over Year") %>%
ggplot(aes(x=date,y=value,color=symbol)) + geom_line(lwd=1.1) +
labs(title="Year over Year EPS in %", x="",y="") +
theme_tq() + scale_color_tq()
Year over Year EPS in %
300
200
100
0
-100
2010
2012
symbol
2014
2016
AAPL
AMZN
MSFT
AMGN
CSCO
NVDA
2018
A variety of other (professional) data services are available, that
are integrated into tidyquant which I will list in the following
subsections:
Managing Data
0.2.1.1.1
xlv
Quandl
Quandl32 provides access to many different financial and economic
databases. To use it, one should acquire an api key by creating a
Quandl account.33 Searches can be done using quandl_search()
(I personally would use their homepage to do that). Data can
be downloaded as before with tq_get(), be aware that you can
download either single timeseries or entire datatables with the arguments get = "quandl" and get = "quandl.datatable". Note
that in the example for ‘Apple’ below, the adjusted close prices are
different from the ones of Yahoo. An example for a datatable is
Zacks Fundamentals Collection B34 .
quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
tq_get(get
= "quandl",
= "2000-01-01",
from
to
= "2017-12-31",
column_index = 11, # numeric column number (e.g. 1)
collapse
= "daily", # can be “none”, “daily”, “weekly”, “mon
transform
= "none")
# for summarizing data: “none”, “diff”,
##
##
##
##
##
##
##
##
32
Oil India Limited
Code: NSE/OIL
Desc: Historical prices for Oil India Limited<br><br>National Stock Exchan
Freq: daily
Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur
Oil Country Tubular Limited
Code: NSE/OILCOUNTUB
https://www.quandl.com/
If you do not use an API key, you are limited to 50 calls per day.
34
https://www.quandl.com/databases/ZFB/documentation/about
33
xlvi
List of Figures
##
##
##
##
##
##
##
##
##
Desc: Historical prices for Oil Country Tubular Limited<br><br>National St
Freq: daily
Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur
##
##
##
##
##
##
##
##
##
# A tibble: 3 x 13
id dataset_code database_code name description refreshed_at
* <int> <chr>
<chr>
<chr> <chr>
<chr>
1 6668 OIL
NSE
Oil ~ Historical~ 2018-09-13T~
2 6669 OILCOUNTUB
NSE
Oil ~ Historical~ 2018-09-13T~
3 6041 ESSAROIL
NSE
Essa~ Historical~ 2016-02-09T~
# ... with 7 more variables: newest_available_date <chr>,
#
oldest_available_date <chr>, column_names <list>, frequency <chr>,
#
type <chr>, premium <lgl>, database_id <int>
##
##
##
##
##
##
##
##
##
##
# A tibble: 5 x 12
date
open high
low close volume ex.dividend split.ratio
<date>
<dbl> <dbl> <dbl> <dbl> <dbl>
<dbl>
<dbl>
1 2000-01-03 105.
112. 102. 112. 4.78e6
0
1
2 2000-01-04 108.
111. 101. 102. 4.57e6
0
1
3 2000-01-05 104.
111. 103
104
6.95e6
0
1
4 2000-01-06 106.
107
95
95
6.86e6
0
1
5 2000-01-07 96.5 101
95.5 99.5 4.11e6
0
1
# ... with 4 more variables: adj.open <dbl>, adj.high <dbl>,
#
adj.low <dbl>, adj.close <dbl>
Essar
Code:
Desc:
Freq:
Cols:
0.2.1.1.2
Oil Limited
NSE/ESSAROIL
Historical prices for Essar Oil Limited<br><br>National Stock Exchan
daily
Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur
Alpha Vantage
Alpha Vantage35 provides access to a real-time and historical financial data. Here we also need to get and set an api key (for
free).
35
https://www.alphavantage.co
Managing Data
xlvii
av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
tq_get(get
= "alphavantager",
av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
tq_get(get
= "alphavantager",
av_fun="TIME_SERIES_INTRADAY", # for intraday data
interval="5min") # 5 minute intervals
##
##
##
##
##
##
##
##
##
# A tibble: 5 x 9
timestamp
open high
low close adjusted_close
<date>
<dbl> <dbl> <dbl> <dbl>
<dbl>
1 2018-04-24 166. 166. 161. 163.
162.
2 2018-04-25 163. 165. 162. 164.
162.
3 2018-04-26 164. 166. 163. 164.
163.
4 2018-04-27 164
164. 161. 162.
161.
5 2018-04-30 162. 167. 162. 165.
164.
# ... with 1 more variable: split_coefficient <dbl>
##
##
##
##
##
##
##
##
# A tibble: 5 x 6
timestamp
open high
low close volume
<dttm>
<dbl> <dbl> <dbl> <dbl> <int>
1 2018-09-11 14:25:00 224. 224. 224. 224. 261968
2 2018-09-11 14:30:00 224. 224. 224. 224. 334069
3 2018-09-11 14:35:00 224. 224. 224. 224. 285138
4 2018-09-11 14:40:00 224. 224. 224. 224. 229329
5 2018-09-11 14:45:00 224. 224. 224. 224. 193316
0.2.1.1.3
FRED (Economic Data)
A large quantity of economic data can be extracted from the Federal Reserve Economic Data (FRED) database36 . Below we download the 1M- and 3M- risk-free-rate for the US. Note that these
are annualized rates!
36
https://fred.stlouisfed.org/
volume dividend_amount
<int>
<dbl>
3.37e7
0
2.84e7
0
2.80e7
0
3.57e7
0
4.24e7
0
xlviii
List of Figures
ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>%
group_by(symbol)
##
##
##
##
##
##
##
##
##
##
# A tibble: 6 x 3
# Groups:
symbol [2]
symbol date
price
<chr> <date>
<dbl>
1 TB1YR 2018-08-01 2.36
2 TB1YR 2018-07-01 2.31
3 TB1YR 2018-06-01 2.25
4 TB3MS 2018-08-01 2.03
5 TB3MS 2018-07-01 1.96
6 TB3MS 2018-06-01 1.9
0.2.1.1.4
OANDA (Exchange Rates and Metal Prices)
Oanda37 provides a large quantity of exchange rates (currently
only for the last 180 days). Enter them as currency pairs using “/”
notation (e.g “EUR/USD”), and set get = "exchange.rates".
Note that most of the data (having a much larger horizon) is also
available on FRED.
eur_usd <- tq_get("EUR/USD",
get = "exchange.rates",
from = Sys.Date() - lubridate::days(10))
plat_price_eur <- tq_get("plat", get = "metal.prices",
from = Sys.Date() - lubridate::days(10),
base.currency = "EUR")
eur_usd %>% arrange(desc(date)) %>% slice(1:3)
## # A tibble: 3 x 2
37
https://www.oanda.com
Managing Data
xlix
##
date
exchange.rate
##
<date>
<dbl>
## 1 2018-09-12
1.16
## 2 2018-09-11
1.16
## 3 2018-09-10
1.16
plat_price_eur %>% arrange(desc(date)) %>% slice(1:3)
##
##
##
##
##
##
# A tibble: 3 x 2
date
price
<date>
<dbl>
1 2018-09-12 681.
2 2018-09-11 681.
3 2018-09-10 680.
0.2.1.1.5
Bloomberg and Datastream
Bloomberg is officially integrated into the tidyquant-package, but
one needs to have Bloomberg running on the terminal one is using.
Datastream is not integrated but has a nice R-Interface in the
package rdatastream38 . However, you need to have the Thomson
Dataworks Enterprise SOAP API (non free)39 licensed, then the
package allows for convienient retrieval of data. If this is not the
case, then you have to manually retrieve your data, save it as
“.xlsx” Excel-file that we can import using readxl::read_xlsx()
from the readxl-package.
0.2.1.1.6
Fama-French Data (Kenneth French’s Data Library)
To download Fama-French data in batch there is a package
38
39
https://github.com/fcocquemas/rdatastream
http://dataworks.thomson.com/Dataworks/Enterprise/1.0/
l
List of Figures
FFdownload that I updated and that now can be installed
via devtools::install_bitbucket("sstoeckl/FFdownload").
Currently you can either download all data or skip the (large)
daily files using the command exclude_daily=TRUE. The result
is a list of data.frames that has to be cleaned somehow but
nonetheless is quite usable.
FFdownload(output_file = "FFdata.RData",
# output file for the final
tempdir = NULL, # where should the temporary downloads go to (cre
exclude_daily = TRUE, # exclude daily data
download = FALSE) # if false, data already in the temp-directory
load(file = "FFdata.RData")
factors <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>%
tk_tbl(rename_index="date") %>% # make tibble
mutate(date=as.Date(date, frac=1)) %>% # make proper month-end dat
gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1:2)
##
##
##
##
##
##
##
##
##
##
##
##
# A tibble: 8 x 3
# Groups:
FFvar [4]
date
FFvar price
<date>
<chr> <dbl>
1 1926-07-31 HML
-2.87
2 1926-08-31 HML
4.19
3 1926-07-31 Mkt.RF 2.96
4 1926-08-31 Mkt.RF 2.64
5 1926-07-31 RF
0.22
6 1926-08-31 RF
0.25
7 1926-07-31 SMB
-2.3
8 1926-08-31 SMB
-1.4
0.2.1.2 Manipulate Data
A variety of transformations can be applied to (financial) timeseries data. We will present some examples merging together our
Managing Data
li
stock file with the index, the risk free rate from FRED and the
Fama-French-Factors.
Doing data transformations in tidy datasets is either called
a transmute (change variable/dataset, only return calculated
column) or a mutate() (add transformed variable). In the
tidyquant-package these functions are called tq_transmute and
tq_mutate, because they simultaneously allow changes of periodicity (daily to monthly) and therefore the returned dataset can have
less rows than before. The core of these functions is the provision
of a mutate_fun that can come from the the xts/zoo, quantmod
(Quantitative Financial Modelling & Trading Framework for R40 )
and TTR (Technical Trading Rules41 ) packages.
In the examples below, we show how to change the periodicity of
the data (where we keep the adjusted close price and the volume
information) and calculate monthly log returns for the ten stocks
and the index. We then merge the price and return information
for each stock, and at each point in time add the return of the
S&P500 index and the 3 Fama-French-Factors.
stocks.prices.monthly <- stocks.prices %>%
tq_transmute(select = c(adjusted,volume), # which column t
mutate_fun = to.monthly,
# funtion: make
indexAt = "lastof") %>%
# ‘yearmon’, ‘ye
ungroup() %>% mutate(date=as.yearmon(date))
stocks.returns <- stocks.prices %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
# create monthly
period="monthly",
type="arithmetic") %>%
ungroup() %>% mutate(date=as.yearmon(date))
index.returns <- index.prices %>%
tq_transmute(select = adjusted,mutate_fun = periodReturn,
period="monthly", type="arithmetic") %>%
40
41
https://www.quantmod.com/
https://www.rdocumentation.org/packages/TTR/
lii
List of Figures
mutate(date=as.yearmon(date))
factors.returns <- factors %>% mutate(price=price/100) %>% # already is mon
mutate(date=as.yearmon(date))
stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries
##
##
##
##
##
##
##
##
# A tibble: 5 x 4
symbol date
adjusted
volume
<chr> <S3: yearmon>
<dbl>
<dbl>
1 AAPL
Jan 2000
3.28 175420000
2 AAPL
Feb 2000
3.63 92240400
3 AAPL
Mrz 2000
4.30 101158400
4 AAPL
Apr 2000
3.93 62395200
5 AAPL
Mai 2000
2.66 108376800
stocks.returns %>% ungroup() %>% slice(1:5)
##
##
##
##
##
##
##
##
# show first 5 entries
# A tibble: 5 x 3
symbol date
monthly.returns
<chr> <S3: yearmon>
<dbl>
1 AAPL
Jan 2000
-0.0731
2 AAPL
Feb 2000
0.105
3 AAPL
Mrz 2000
0.185
4 AAPL
Apr 2000
-0.0865
5 AAPL
Mai 2000
-0.323
index.returns %>% ungroup() %>% slice(1:5)
## # A tibble: 5 x 2
##
date
monthly.returns
##
<S3: yearmon>
<dbl>
# show first 5 entries
Managing Data
##
##
##
##
##
1
2
3
4
5
Jan
Feb
Mrz
Apr
Mai
2000
2000
2000
2000
2000
liii
-0.0418
-0.0201
0.0967
-0.0308
-0.0219
factors.returns %>% ungroup() %>% slice(1:5)
##
##
##
##
##
##
##
##
# A tibble: 5 x
date
<S3: yearmon>
1 Jul 1926
2 Aug 1926
3 Sep 1926
4 Okt 1926
5 Nov 1926
# show first 5 entries
3
FFvar
price
<chr>
<dbl>
Mkt.RF 0.0296
Mkt.RF 0.0264
Mkt.RF 0.0036
Mkt.RF -0.0324
Mkt.RF 0.0253
Now, we merge all the information together
##
##
##
##
##
##
##
##
##
# A tibble: 5 x 10
symbol date
return adjusted volume
<chr> <S3:>
<dbl>
<dbl> <dbl>
1 AAPL
Jan ~ -0.0731
3.28 1.75e8
2 AAPL
Feb ~ 0.105
3.63 9.22e7
3 AAPL
Mrz ~ 0.185
4.30 1.01e8
4 AAPL
Apr ~ -0.0865
3.93 6.24e7
5 AAPL
Mai ~ -0.323
2.66 1.08e8
# ... with 1 more variable: RF <dbl>
sp500 Mkt.RF
SMB
HML
<dbl>
<dbl>
<dbl>
<dbl>
-0.0418 -0.0474 0.0505 -0.0045
-0.0201 0.0245 0.221 -0.106
0.0967 0.052 -0.173
0.0794
-0.0308 -0.064 -0.0771 0.0856
-0.0219 -0.0442 -0.0501 0.0243
Now we can calculate and add additional information, such as
the MACD (Moving Average Convergence/Divergence42 ) and its
driving signal. Be aware, that you have to group_by symbol, or
the signal would just be calculated for one large stacked timeseries:
42
https://en.wikipedia.org/wiki/MACD
liv
List of Figures
stocks.final %>% group_by(symbol) %>%
tq_mutate(select
= adjusted,
mutate_fun = MACD,
col_rename = c("MACD", "Signal")) %>%
select(symbol,date,adjusted,MACD,Signal) %>%
tail() # show last part of the dataset
##
##
##
##
##
##
##
##
##
##
# A tibble: 6 x 5
# Groups:
symbol [1]
symbol date adjusted
<chr> <dbl>
<dbl>
1 GILD
2018.
73.4
2 GILD
2018.
80.8
3 GILD
2018.
78.7
4 GILD
2018.
72.8
5 GILD
2018.
72.6
6 GILD
2018.
70.0
MACD Signal
<dbl> <dbl>
-5.40 -4.38
-3.86 -4.27
-2.85 -3.99
-2.68 -3.73
-2.52 -3.49
-2.66 -3.32
save(stocks.final,file="stocks.RData")
0.2.1.2.1
Rolling functions
One of the most important functions you will need in reality is
the possibility to perform a rolling analysis. One example would
be a rolling regression to get time varying α and β of each stock
with respect to the index or the Fama-French-Factors. To do that
we need to create a function that does everything we want in one
step:
regr_fun <- function(data,formula) {
coef(lm(formula, data = timetk::tk_tbl(data, silent = TRUE)))
}
Managing Data
lv
This function takes a dataset and a regression formula as input,
performs a regression and returns the coefficients, as well as the
residual standard deviation and the respective R2
Step 2: Create a custom function
Next, create a custom regression function, which will be used to
apply over the rolling window in Step 3. An important point is
that the “data” will be passed to the regression function as an xts
object. The timetk::tk_tbl function takes care of converting to a
data frame for the lm function to work properly with the columns
“fb.returns” and “xlk.returns”.
regr_fun <- function(data) { coef(lm(fb.returns ~ xlk.returns,
data = timetk::tk_tbl(data, silent = TRUE))) } Step 3: Apply
the custom function
Now we can use tq_mutate() to apply the custom regression function over a rolling window using rollapply from the zoo package.
Internally, since we left select = NULL, the returns_combined
data frame is being passed automatically to the data argument of
the rollapply function. All you need to specify is the mutate_fun
= rollapply and any additional arguments necessary to apply the
rollapply function. We’ll specify a 12 week window via width = 12.
The FUN argument is our custom regression function, regr_fun.
It’s extremely important to specify by.column = FALSE, which
tells rollapply to perform the computation using the data as a
whole rather than apply the function to each column independently. The col_rename argument is used to rename the added
columns.
returns_combined %>% tq_mutate(mutate_fun = rollapply,
width = 12, FUN = regr_fun, by.column = FALSE, col_rename
= c(“coef.0”, “coef.1”))
As shown above, the rolling regression coefficients were added to
the data frame.
Also check out the functionality of tibbletime for that task
(rollify)!
lvi
0.3
List of Figures
Exploring Data
In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
load("stocks.RData")
glimpse(stocks.final)
##
##
##
##
##
##
##
##
##
##
##
##
Observations: 2,160
Variables: 10
$ symbol
<chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
$ date
<S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
$ return
<dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
$ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
$ volume
<dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
$ sp500
<dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
$ Mkt.RF
<dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
$ SMB
<dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
$ HML
<dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
$ RF
<dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
stocks.final %>% slice(1:2)
##
##
##
##
##
##
# A tibble: 2 x 10
symbol date
return adjusted volume
sp500 Mkt.RF
SMB
HML
<chr> <S3:>
<dbl>
<dbl> <dbl>
<dbl>
<dbl> <dbl>
<dbl>
1 AAPL
Jan ~ -0.0731
3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
2 AAPL
Feb ~ 0.105
3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
# ... with 1 more variable: RF <dbl>
Exploring Data
lvii
0.3.1 Plotting Data
In this chapter we show how to create various graphs of financial
timeseries and their properties, which should help us to get a better
understanding of their properties, before we go on to calculate and
test their statistics.
0.3.1.1 Time-series plots
0.3.1.2 Box-plots
0.3.1.3 Histogram and Density Plots
0.3.1.4 Quantile Plots
0.3.2 Analyzing Data
0.3.2.1 Calculating Statistics
0.3.2.2 Testing Data
0.3.2.3 Exposure to Factors
The stocks in our example all have a certain exposure to risk factors (e.g. the Fama-French-factors we have added to our dataset).
Let us specify these exposures by regression each stocks return on
the factors Mkt.RF, SMB and HML:
stocks.factor_exposure <- stocks.final %>%
nest(-symbol) %>%
mutate(model = map(data, ~ lm(return ~ Mkt.RF + SMB + HML, data= .x)),
tidied = map(model, tidy)) %>%
unnest(tidied, .drop=TRUE) %>%
filter(term != "(Intercept)") %>%
select(symbol,term,estimate) %>%
spread(term,estimate) %>%
select(symbol,Mkt.RF,SMB,HML)
lviii
0.4
List of Figures
Managing Portfolios
In this chapter we show how to explore and analyze data using the
dataset created in Chapter @ref(#s_2Data):
At first we will learn how to full-sample optimize portfolios, then
(in the next chapters) we will do the same thing in a rolling analysis and also perform some backtesting. The major workhorse of
this chapter is the portfolioAnalytics-package developed by Peterson and Carl (2018).
portfolioAnalytics comes with an excellent introductory vignette vignette("portfolio_vignette") and includes more documents, detailing on the use of ROI-solvers
vignette("ROI_vignette"), how to create custom moment functions vignette("custom_moments_objectives")
and
how
to
introduce
CVaR-budgets
vignette("risk_budget_optimization").
0.4.1 Introduction
SHORT INTRODUCTION TO PORTFOLIOMANAGEMENT
We start by first creating a portfolio object, before we…
0.4.1.1 The portfolio.spec() Object
The portfolio object is a so-called S3-object43 , which means, that it
has a certain class (portfolio) describing its properties, behavior
and relation to other objects. Usually such an objects comes with
a variety of methods. To create such an object, we reuse the stock
data set that we have created in Chapter @ref(#s_2Data):
43
http://adv-r.had.co.nz/S3.html
Managing Portfolios
lix
load("stocks.RData")
glimpse(stocks.final)
##
##
##
##
##
##
##
##
##
##
##
##
Observations: 2,160
Variables: 10
$ symbol
<chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL...
$ date
<S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2...
$ return
<dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0....
$ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3...
$ volume
<dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ...
$ sp500
<dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756...
$ Mkt.RF
<dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0...
$ SMB
<dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0...
$ HML
<dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0....
$ RF
<dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004...
stocks.final %>% slice(1:2)
##
##
##
##
##
##
# A tibble: 2 x 10
symbol date
return adjusted volume
sp500 Mkt.RF
SMB
HML
<chr> <S3:>
<dbl>
<dbl> <dbl>
<dbl>
<dbl> <dbl>
<dbl>
1 AAPL
Jan ~ -0.0731
3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045
2 AAPL
Feb ~ 0.105
3.63 9.22e7 -0.0201 0.0245 0.221 -0.106
# ... with 1 more variable: RF <dbl>
For the portfolioAnalytics-package we need our data in xtsformat (see @ref(#sss_112xts)) and therefore first spread the
dataset returns in columns of stocks and the convert to xts using
tk_xts() from the timetk-package.
lx
List of Figures
returns <- stocks.final %>%
select(symbol,date,return) %>%
spread(symbol,return) %>%
tk_xts(silent = TRUE)
Now its time to initialize the portfolio.spec() object passing
along the names of our assets. Afterwards we print the object
(most S3 obejcts come with a printing methods that nicely displays
some nice information).
pspec <- portfolio.spec(assets = stocks.selection$symbol,
category_labels = stocks.selection$sector)
print(pspec)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
**************************************************
PortfolioAnalytics Portfolio Specification
**************************************************
Call:
portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
Number of assets: 10
Asset Names
[1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
Category Labels
Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
Consumer Discretionary : AMZN
Health Care : AMGN GILD
str(pspec)
Managing Portfolios
lxi
## List of 6
## $ assets
: Named num [1:10] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0
##
..- attr(*, "names")= chr [1:10] "AAPL" "MSFT" "AMZN" "CSCO" ...
## $ category_labels:List of 3
##
..$ Information Technology: int [1:7] 1 2 4 5 6 8 9
##
..$ Consumer Discretionary: int 3
##
..$ Health Care
: int [1:2] 7 10
## $ weight_seq
: NULL
## $ constraints
: list()
## $ objectives
: list()
## $ call
: language portfolio.spec(assets = stocks.selection$symb
## - attr(*, "class")= chr [1:2] "portfolio.spec" "portfolio"
Checking the structure of the object str() we find that it contains
several elements: assets which contains the asset names and initial
weights that are equally distributed unless otherwise specified (e.g.
portfolio.spec(assets=c(0.6,0.4))), category_labels to categorize assets by sector (or geography etc.), weight_seq (sequence
of weights for later use by random_portfolios), constraints that
we will set soon, objectives and the call that initialised the object.
Before we go and optimize any portfolio we will show how to set
contraints.
0.4.1.2 Constraints
Constraints define restrictions and boundary conditions on the
weights of a portfolio. Constraints are defined by add.constraint
specifying certain types and arguments for each type, as well as
whether the constraint should be enabled or not (enabled=TRUE
is the default).
0.4.1.2.1
Sum of Weights Constraint
Here we define how much of the available budget can/must be
invested by specifying the maximum/minimum sum of portfolio
lxii
List of Figures
weights. Usually we want to invest our entire budget and therefore set type="full_investment" which sets the sum of weights
to 1. ALternatively we can set the type="weight_sum" to have
mimimum/maximum weight_sum equal to 1.
pspec <- add.constraint(portfolio=pspec,
type="full_investment")
# print(pspec)
# pspec <- add.constraint(portfolio=pspec,type="weight_sum", min_sum=1, max
Another common constraint is to have the portfolio dollar-neutral
type="dollar_neutral" (or equivalent formulations specified below)
#
#
#
#
#
pspec <- add.constraint(portfolio=pspec,
type="dollar_neutral")
print(pspec)
pspec <- add.constraint(portfolio=pspec, type="active")
pspec <- add.constraint(portfolio=pspec, type="weight_sum", min_sum=0, ma
0.4.1.2.2
Box Constraint
Box constraints specify upper and lower bounds on the asset
weights. If we pass min and max as scalars then the same max
and min weights are set per asset. If we pass vectors (that should
be of the same length as the number of assets) we can specify
position limits on individual stocks
pspec <- add.constraint(portfolio=pspec,
type="box",
min=0,
Managing Portfolios
lxiii
max=0.4)
# print(pspec)
# add.constraint(portfolio=pspec,
#
type="box",
#
min=c(0.05, 0, rep(0.05,8)),
#
max=c(0.4, 0.3, rep(0.4,8)))
Another special type of box constraints are long-only constraints,
where we only allow positive weights per asset. These are set
automatically, if no min and max are set or when we use
type="long_only"
# pspec <- add.constraint(portfolio=pspec, type="box")
# pspec <- add.constraint(portfolio=pspec, type="long_only")
0.4.1.2.3
Group Constraints
Group constraints allow the user to specify constraints per groups,
such as industries, sectors or geography.44 These groups can be
randomly defined, below we will set group constraints for the
sectors as specified above. The input arguments are the following: groupslist of vectors specifying the groups of the assets,
group_labels character vector to label the groups (e.g. size, asset
class, style, etc.), group_min and group_max specifying minimum
and maximum weight per group, group_pos to specifying the number of non-zero weights per group (optional).
pspec <- add.constraint(portfolio=pspec,
type="group",
44
Note, that only the ROI, DEoptim and random portfolio solvers support
group constraints. See also @(#sss_4solvers).
lxiv
List of Figures
groups=list(pspec$category_labels$`Information Techno
pspec$category_labels$`Consumer Discretionar
pspec$category_labels$`Health Care`),
group_min=c(0.1, 0.15,0.1),
group_max=c(0.85, 0.55,0.4),
group_labels=pspec$category_labels)
# print(pspec)
0.4.1.2.4
Position Limit Constraint
The position limit constraint allows the user to specify limits on
the number of assets with non-zero, long, or short positions. Its
arguments are: max_pos which defines the maximum number of
assets with non-zero weights and max_pos_long/ max_pos_short
that specify the maximum number of assets with long (i.e. buy)
and short (i.e. sell) positions.45
pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos=3)
# pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos_l
# print(pspec)
0.4.1.2.5
Diversification Constraint
The diversification constraint enables to set a minimum diversification limit by penalizing the optimizer if the deviation is larger
45
Note that not all solvers suüpport the different options. All of them
are supported by the DEoptim and random portfolio solvres, while no ROI
solver supports this type of constraint. The ROI solvers do not support the
long/short position limit constraintsm, and (only) quadprog allows for the
max_pos argument.
Managing Portfolios
lxv
∑
2
46
than 5%. Diversification is defined as N
Its
i=1 wi for N assets.
only argument is the diversification taregt div_target.
pspec <- add.constraint(portfolio=pspec, type="diversification", div_target=
# print(pspec)
0.4.1.2.6
Turnover Constraint
The turnover constraint allows to specify a maximum turnover
from a set of initial weights that can either be given or are the
weights initially specified for the portfolio object. It is also implemented as an optimization penalty if the turnover deviates more
than 5% from the turnover_target.47
pspec <- add.constraint(portfolio=pspec, type="turnover", turnover_target=0.
# print(pspec)
0.4.1.2.7
Target Return Constraint
The target return constraint allows the user to target an average
return specified by return_target.
pspec <- add.constraint(portfolio=pspec, type="return", return_target=0.007)
# print(pspec)
46
Note that diversification constraint is only supported for the global numeric solvers (not the ROI solvers).
47
Note, that the turnover constraint is not currently supported using the
ROI solver for quadratic utility and minimum variance problems.
lxvi
0.4.1.2.8
List of Figures
Factor Exposure Constraint
The factor exposure constraint allows the user to set upper and
lower bounds on exposures to risk factors. We will use the factor
exposures that we have calculated in @(#sss_3FactorExposure).
The major input is a vector or matrix B and upper/lower bounds
for the portfolio factor exposure. If B is a vector (with length equal
to the number of assets), lower and upper bounds must be scalars.
If B is a matrix, the number of rows must be equal to the number of
assets and the number of columns represent the number of factors.
In this case, the length of lower and upper bounds must be equal
to the number of factors. B should have column names specifying
the factors and row names specifying the assets.
B <- stocks.factor_exposure %>% as.data.frame() %>% column_to_rownames("symb
pspec <- add.constraint(portfolio=pspec, type="factor_exposure",
B=B,
lower=c(0.8,0,-1),
upper=c(1.2,0.8,0))
# print(pspec)
0.4.1.2.9
Transaction Cost Constraint
The transaction cost constraint enables the user to specify (porportional) transaction costs.48 Here we will assume the proportional
transation cost ptc to be equal to 1%.
pspec <- add.constraint(portfolio=pspec, type="transaction_cost", ptc=0.01)
# print(pspec)
48
For usage of the ROI (quadprog) solvers, transaction costs are currently
only supported for global minimum variance and quadratic utility problems.
Managing Portfolios
lxvii
0.4.1.2.10 Leverage Exposure Constraint
The leverage exposure constraint specifies a maximum level of
leverage. Below we set leverage to 1.3 to create a 130/30 portfolio.
pspec <- add.constraint(portfolio=pspec, type="leverage_exposure", leverage=
# print(pspec)
0.4.1.2.11 Checking and en-/disabling constraints
Every constraint that is added to the portfolio object gets a number according to the order it was set. If one wants to update
(enable/disable) a specific constraints this can be done by the
indexnum argument.
summary(pspec)
# To get an overview on the specs, their indexnum and whether they are enab
consts <- plyr::ldply(pspec$constraints, function(x){c(x$type,x$enabled)})
consts
pspec$constraints[[which(consts$V1=="box")]]
pspec <- add.constraint(pspec, type="box",
min=0, max=0.5,
indexnum=which(consts$V1=="box"))
pspec$constraints[[which(consts$V1=="box")]]
# to disable constraints
pspec$constraints[[which(consts$V1=="position_limit")]]
pspec <- add.constraint(pspec, type="position_limit", enable=FALSE, # only s
indexnum=which(consts$V1=="position_limit"))
pspec$constraints[[which(consts$V1=="position_limit")]]
lxviii
List of Figures
0.4.1.3 Objectives
For an optimal portfolio there first has to be specified what
optimal in terms of the relevant (business) objective. Such objectives (target functions) can be added to the portfolio object
with add.objective. With this function, the user can specify the
type of objective to add to the portfolio object. Currently available
are ‘return’, ‘risk’, ‘risk budget’, ‘quadratic utility’, ‘weight concentration’, ‘turnover’ and ‘minmax’. Each type of objective has
additional arguments that need to be specified. Several types of
objectives can be added and enabled or disabled by specifying the
indexnum argument.
0.4.1.3.1
Portfolio Risk Objective
Here, the user can specify a risk function that should be minimized. We start by adding a risk objective to minimize portfolio
variance (minimum variance portfolio). Another example could be
the expected tail loss with a confidence level 0.95. Whatever function (even user defined ones are possble, the name must correspond
to a function in R), necessary additional arguments to the function
have to be passed as a named list to arguments. Possible functions
are:
pspec <- add.objective(portfolio=pspec,
type='risk',
name='var')
pspec <- add.objective(portfolio=pspec,
type='risk',
name='ETL',
arguments=list(p=0.95),
enabled=FALSE)
# print(pspec)
Managing Portfolios
0.4.1.3.2
lxix
Portfolio Return Objective
The return objective allows the user to specify a return function to
maximize. Here we add a return objective to maximize the portfolio mean return.
pspec <- add.objective(portfolio=pspec,
type='return',
name='mean')
# print(pspec)
0.4.1.3.3
Portfolio Risk Budget Objective
The portfolio risk objective allows the user to specify constraints to
minimize component contribution (i.e. equal risk contribution) or
specify upper and lower bounds on percentage risk contribution.
Here we specify that no asset can contribute more than 30% to
total portfolio risk.
See the risk budget optimization vignette for more detailed examples of portfolio optimizationswith risk budgets.
pspec <- add.objective(portfolio=pspec,
type="risk_budget",
name="var",
max_prisk=0.3)
pspec <- add.objective(portfolio=pspec,
type="risk_budget",
name="ETL",
arguments=list(p=0.95),
max_prisk=0.3,
enabled=FALSE)
lxx
List of Figures
# for an equal risk contribution portfolio, set min_concentration=TRUE
pspec <- add.objective(portfolio=pspec,
type="risk_budget",
name="ETL",
arguments=list(p=0.95),
min_concentration=TRUE,
enabled=FALSE)
print(pspec)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
**************************************************
PortfolioAnalytics Portfolio Specification
**************************************************
Call:
portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.
Number of assets: 10
Asset Names
[1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD"
Category Labels
Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM
Consumer Discretionary : AMZN
Health Care : AMGN GILD
Constraints
Enabled constraint types
- full_investment
- box
- group
- position_limit
- diversification
- turnover
- return
- factor_exposure
Managing Portfolios
lxxi
##
- transaction_cost
##
- leverage_exposure
##
## Objectives:
## Enabled objective names
##
- var
##
- mean
##
- var
## Disabled objective names
##
- ETL
0.4.1.4 Solvers
Solvers are the workhorse of our portfolio optimization framework, and there are a variety of them available to us through
the portfolioAnalytics-package. I will briefly introduce the
available solvers. Note that these solvers can be specified
through optimize_method in the optimize.portfolio and
optimize.portfolio.rebalancing method.
0.4.1.4.1
DEOptim
This solver comes from the R package DEoptim and is a differential evolution algorithm (a global stochastic optimization
algorithm) developed by Ardia et al. (2016). The help on
?DEoptim gives many more references. There is also a nice
vignette("DEoptimPortfolioOptimization") on large scale
portfolio optimization using the portfolioAnalytics-package.
0.4.1.4.2
Random Portfolios
There are three methods to generate random portfolios contained
in portfolioAnalytics:
lxxii
List of Figures
1. The most flexible but also slowest method is ‘sample’.
It can take leverage, box, group, and position limit constraints into account.
2. The ‘simplex’ method is useful to generate random portfolios with the full investment and min box constraints
(values for min_sum/ max_sum are ignored). Other constraints (box max, group and position limit constraints
will be handled by elimination) which might leave only
very few feasible portfolios. Sometimes it will lalso lead
to suboptimal solutions.
3. Using grid search, the ‘grid’ method only satisfies the min
and max box constraints.
0.4.1.4.3
pso
The psoptim function from the R package pso (Bendtsen., 2012)
and uses particle swarm optimization.
0.4.1.4.4
GenSA
The GenSA function from the R package GenSA (Gubian et al.,
2018) and is based on generalized simmulated annealing (a generic
probabilistic heuristic optimization algorithm)
0.4.1.4.5
ROI
The ROI (R Optimization Infrastructure) is a framework to handle optimization problems in R. It serves as an interface to the
Rglpk package and the quadprog package which solve linear and
quadratic programming problems. Available methods in the context of the portfolioAnalytics-package are given below (see section @(#sss_4Objectives) for available objectives.
1. Maxmimize portfolio return subject leverage, box, group,
Managing Portfolios
2.
3.
4.
5.
lxxiii
position limit, target mean return, and/or factor exposure
constraints on weights.
Globally minimize portfolio variance subject to leverage,
box, group, turnover, and/or factor exposure constraints.
Minimize portfolio variance subject to leverage, box,
group, and/or factor exposure constraints given a desired
portfolio return.
Maximize quadratic utility subject to leverage, box,
group, target mean return, turnover, and/or factor exposure constraints and risk aversion parameter. (The risk
aversion parameter is passed into optimize.portfolio
as an added argument to the portfolio object).
Minimize ETL subject to leverage, box, group, position
limit, target mean return, and/or factor exposure constraints and target portfolio return.
0.4.2 Mean-variance Portfolios
0.4.2.1 Introduction and Theoretics
0.4.2.1.1
The minimum risk mean-variance portfolio
0.4.2.1.2
Feasible Set and Efficient Frontier
0.4.2.1.3
Minimum variance portfolio
0.4.2.1.4
Capital market line and tangency portfolio
0.4.2.1.5
Box and Group Constrained mean-variance portfolios
lxxiv
List of Figures
0.4.2.1.6
Maximum return mean-variance portfolio
0.4.2.1.7
Covariance risk budget constraints
0.4.3 Mean-CVaR Portfolios
0.5
Managing Portfolios in the Real World
0.5.1 Rolling Portfolios
0.5.2 Backtesting
0.6
Further applications in Finance
0.6.1 Portfolio Sorts
0.6.2 Fama-MacBeth-Regressions
0.6.3 Risk Indices
0.7
References
# Appendix{#s_99Appendix}
.0.1 Introduction to R
For everyone that is more interested in all the topics I strongly
recommend this eBook: R for Data Science49
49
http://r4ds.had.co.nz/
References
.0.1.1
lxxv
Getting started
Once you have started R, there are several ways to find help. First
of all, (almost) every command is equipped with a help page that
can be accessed via ?... (if the package is loaded). If the command
is part of a package that is not loaded or you have no clue about the
command itself, you can search the entire help (full-text) by using
??.... Be aware, that certain very-high level commands need to
be put in quotation marks ?'function'. Many of the packages
you find are either equipped with a demo() (get a list of all
available demos using demo(package=.packages(all.available
= TRUE))) and/or a vignette(), a document explaining
the purpose of the package and demonstrating its work
using suitable examples (find all available vignettes with
vignette(package=.packages(all.available = TRUE))).
If
you want to learn how to do a certain task (e.g. conducting an
event study vignette("eventstudies")50 ).
Executing code in Rstudio is simple. Either you highlight the exact
portion of the code that you want to execute and hit ctrl+enter,
or you place the cursor just somewhere to execute this particular
line of code with the same command.51
.0.1.2
Working directory
Before we start to learn how to program, we have to set a working directory. First, create a folder “researchmethods” (preferably never use directory names containing special characters or
empty spaces) somewhere on citrix/your laptop, this will be
your working directory where R looks for code, files to load
and saves everything that is not designated by a full path (e.g.
“D:/R/LAB/SS2018/…”). Note: In contrast to windows paths you
50
If this command shows an error message you need to install the package
first, see further down for how to do that.
51
Under certain circumstances - either using pipes or within loops - RStudio
will execute the en tire loop/pipe structure. In this case you have to highlight
the particular line that you want to execute.
lxxvi
List of Figures
have to use either “/” instead of “” or use two”\“. Now set the
working directory using setwd() and check with getwd()
setwd("D:/R/researchmethods")
getwd()
.0.1.3
Basic calculations
3+5; 3-5; 3*5; 3/5
# More complex including brackets
(5+3-1)/(5*10)
# is different to
5+3-1/5*10
# power of a variable
4*4*4
4^300
# root of a variable
sqrt(16)
16^(1/2)
16^0.5
# exponential and logarithms
exp(3)
log(exp(3))
exp(1)
# Log to the basis of 2
log2(8)
2^log2(8)
# raise the number of digits shown
options(digits=6)
exp(1)
# Rounding
20/3
round(20/3,2)
References
lxxvii
floor(20/3)
ceiling(20/3)
.0.1.4
Mapping variables
Defining variables (objects) in R is done via the arrow operator <that works in both directions ->. Sometimes you will see someone
use the equal sign = but for several (more complex) reasons, this
is not advisable.
n <- 10
n
n <- 11
n
12 -> n
n
n <- n^2
n
In the last case, we overwrite a variable recursively. You might
want to do that for several reasons, but I advise you to rarely
do that. The reason is that - depending on how often you have
executed this part of the code already - n will have a different value.
In addition, if you are checking the output of some calculation, it
is not nice if one of the input variables always has a different value.
In a next step, we will check variables. This is a very important
part of programming.
# check if m==10
m <- 11
m==10 # is equal to
m==11
lxxviii
m!=11
m>10
m<10
m<=11
m>=12
List of Figures
#
#
#
#
#
is
is
is
is
is
not equal to
larger than
smaller than
smaller or equal than
larger or equal than
If one wants to find out which variables are already set use ls().
Delete (Remove) variables using rm() (you sometimes might want
to do that to save memory - in this case always follow the rm()
command with gc()).
ls() # list variables
rm(m) # remove m
ls() # list variables again (m is missing)
Of course, often we do not only want to store numbers but also
characters. In this case enclose the value by quotation marks: name
<- "test". If you want to check whether a variable has a certain
format use available commands starting with is.. If you want to
change the format of a variable use as.
name <- "test"
is.numeric(n)
is.numeric(name)
is.character(n)
is.character(name)
If you do want to find out the format of a variable you can use
class(). Slightly different information will be given by mode()
and typeof()
References
lxxix
class(n)
class(name)
mode(n)
mode(name)
typeof(n)
typeof(name)
# Lets change formats:
n1 <- n
is.character(n1)
n1 <- as.character(n)
is.character(n1)
as.numeric(name) # New thing: NA
Before we learn about NA, we have to define logical variables that
are very important when programming (e.g., as options in a function). Logical (boolean) variables will either assume TRUE or FALSE.
# last but not least we need boolean (logical) variables
n2 <- TRUE
is.numeric(n2)
class(n2)
is.logical(n2)
as.logical(2) # all values except 0 will be converted to TRUE
as.logical(0)
Now we can check whether a condition holds true. In this case, we
check if m is equal to 10. The output (as you have seen before) is
of type logical.
is.logical(n==10)
n3 <- n==10 # we can assign the logical output to a new variable
is.logical(n3)
lxxx
List of Figures
Assignment: Create numeric variable x, set x equal to 5/3. What
happens if you divide by 0? By Inf? Set y<-NA. What could this
mean? Check if the variable is “na”. Is Inf numeric? Is NA numeric?
.0.1.5
Sequences, vectors and matrices
In this chapter, we are going to learn about higher-dimensional
objects (storing more information than just one number).
.0.1.5.1 Sequences
We define sequences of elements (numbers/characters/logicals) via
the concatenation operator c() and assign them to a variable. If
one of the elements of a sequence is of type character, the whole
sequence will be converted to character, else it will be of type
numeric (for other possibilities check the help ?vector). At the
same type it will be of the type vector.
x <- c(1,3,5,6,7)
class(x)
is.vector(x)
is.numeric(x)
To create ordered sequences make use of the command
seq(from,to,by). Please note that often programmers are lazy
and just write seq(1,10,2) instead of seq(from=1,to=10,by=2).
However it makes code much harder to understand, can produce
unintended results, and if a function is changed (which happens as
R is always under construction) yield something very different to
what was intended. Therefore I strongly encourage you to always
specify the arguments of a function by name. To do this I advise
you to make use of the tab a lot. Tab helps you to complete commands, produces a list of different commands starting with the
same letters (if you do not completely remember the spelling for
example), helps you to find out about the arguments and even gives
References
lxxxi
information about the intended/possible values of the arguments.
A nice way and shortcut for creating ordered/regular sequences
with distance (by=) one is given by the : operator: 1:10 is equal
to seq(from=1,to=10,by=1).
x1 <- seq(from=1,to=5,by=1)
x2 <- 1:5
One can operate with sequences in the same way as with numbers.
Be aware of the order of the commands and use brackets where
necessary!
1:10-1
1:(10-1)
1:10^2-2 *3
Assignment: 1. Create a series from -1 to 5 with distances 0.5?
Can you find another way to do it using the : operator and standard mathematical operations? 2. Create the same series, but this
time using the “length”-option 3. Create 20 ones in a row (hint:
find a function to do just that)
Of course, all logical operations are possible for vectors, too. In
this case, the output is a vector of logicals having the same size as
the input vector. You can check if a condition is true for any() or
all() parts of the vector.
.0.1.5.2 Random Sequences
One of the most important tasks of any programming language
that is used for data analysis and research is the ability to generate random numbers. In R all the random number commands
start with an r..., e.g. random normal numbers rnorm(). To find
out more about the command use the help ?rnorm. All of these
lxxxii
List of Figures
commands are a part of the stats package, where you find available commands using the package help: library(help=stats).
Notice that whenever you generate random numbers, they are different. If you prefer to work with the same set of random numbers
(e.g. for testing purposes) you can fix the starting value of the
random number generator by setting the seed to a chosen number set.seed(123). Notice that you have to execute set.seed()
every time before (re)using the random number generator.
rand1 <- rnorm(n = 100) # 100 standard normally distributed random numbers
set.seed(134) # fix the starting value of the random number generator (then
rand1a <- rnorm(n = 100)
Assignment: 1. Create a random sequence of 20 N(0,2)distributed variables and assign it to the variable rand2. 2. Create
a random sequence of 200 Uniform(-1,1) distributed variables and
save to rand3. 3. What other distributions can you find in the stats
package? 4. Use the functions mean and sd. Manipulate the random variables to have a different mean and standard deviation.
Do you remember the normalization process (z-score)?
As in the last assignment you can use all the functions you learned
about in statistics to calculate the mean(), the standard deviation
sd(), skewness() and kurtosis() (the latter two after loading
and installing the moments package). To install/load a package we
use install.packages() (only once) and then load the package
with require().
#install.packages("moments") # only once, no need to reinstall every time
require(moments)
mean(rand1a)
sd(rand1a)
skewness(rand1a)
kurtosis(rand1a)
summary(rand1a)
References
.0.1.6
lxxxiii
Vectors and matrices
We have created (random) sequences above and can determine
their properties, such as their length(). We also know how to
manipulate sequences through mathematical operations, such as
+-*/^. If you want to calculate a vector product, R provides the
%*% operator. In many cases (such as %*%) vectors behave like matrices, automating whether they should be row or column-vectors.
However, to make this more explicit transform your vector into a
matrix using as.matrix. Now, it has a dimension and the property matrix. You can transpose the matrix using t(), calculate its
inverse using solve() and manipulate in any other way imaginable. To create matrices use matrix() and be careful about the
available options!
x <- c(2,4,5,8,10,12)
length(x)
dim(x)
x^2/2-1
x %*% x # R automatically multiplies row and column vector
is.vector(x)
y <- as.matrix(x)
is.matrix(y); is.matrix(x)
dim(y); dim(x)
t(y) %*% y
y %*% t(y)
mat <- matrix(data = x,nrow = 2,ncol = 3, byrow = TRUE)
dim(mat); ncol(mat); nrow(mat)
mat2 <- matrix(c(1,2,3,4),2,2) # set a new (quadratic) matrix
mat2i <- solve(mat2)
mat2 %*% mat2i
mat2i %*% mat2
Assignment: 1. Create this matrix matrix(c(1,2,2,4),2,2)
and try to calculate its inverse. What is the problem? Remember the determinant? Calculate using det(). What do you learn?
lxxxiv
List of Figures
2. Create a 4x3 matrix of ones and/or zeros. Try to matrixmultiply with any of the vectors/matrices used before. 3. Try to
add/subtract/multiply matrices, vectors and scalars.
A variety of special matrices is available, such as diagonal matrices using diag(). You can glue matrices together columnwise
(cbind()) or rowwise (rbind()).
diag(3)
diag(c(1,2,3,4))
mat4 <- matrix(0,3,3)
mat5 <- matrix(1,3,3)
cbind(mat4,mat5)
rbind(mat4,mat5)
.0.1.6.1 The indexing system
We can access the row/column elements of any object with at least
one dimension using [].
########################################################
### 8) The INDEXING System
# We can access the single values of a vector/matrix
x[2] # one-dim
mat[,2] # two-dim column
mat[2,] # two-dim row
i <- c(1,3)
mat[i]
mat[1,2:3] # two-dim select second and third column, first row
mat[-1,] # two-dim suppress first row
mat[,-2] # two-dim suppress second column
Now we can use logical vectors/matrices to subset vectors/matrices. This is very useful for data mining.
References
lxxxv
mat>=5 # which elements are large or equal to 5?
mat[mat>=5] # What are these elements?
which(mat>=5, arr.ind = TRUE) # another way with more explicit information
We can do something even more useful and name the rows and
columns of a matrix usingcolnames() and rownames().
colnames(mat) <- c("a","b","c")
rownames(mat) <- c("A","B")
mat["A",c("b","c")]
.0.1.7
Functions in R
.0.1.7.1 Useful Functions
Of course, there are thousands of functions available in R, especially through the use of packages. In the following you find a demo
of the most useful ones.
x <- c(1,2,4,-1,2,8) # example vector 1
x1 <- c(1,2,4,-1,2,8,NA,Inf) # example vector 2 (more complex)
sqrt(x) # square root of x
x^3 # x to the power of ...
sum(x) # sum of the elements of x
prod(x) # product of the elements of x
max(x) # maximum of the elements of x
min(x) # minimum of the elements of x
which.max(x) # returns the index of the greatest element of x
which.min(x) # returns the index of the smallest element of x
# statistical function - use rand1 and rand2 created before
range # returns the minimum and maximum of the elements of x
mean # mean of the elements of x
lxxxvi
List of Figures
median # median of the elements of x
var # variance of the elements of x
sd # standard deviation of the elements of x
cor # correlation matrix of x
cov # covariance between x and y
cor # linear correlation between x and y
# more complex functions
round(x, n) # rounds the elements of x to n decimals
rev(x) # reverses the elements of x
sort(x) # sorts the elements of x in increasing order
rank(x) # ranks of the elements of x
log(x) # computes natural logarithms of x
cumsum(x) # a vector which ith element is the sum from x[1] to x[i]
cumprod(x) # id. for the product
cummin(x) # id. for the minimum
cummax(x) # id. for the maximum
unique(x) # duplicate elements are suppressed
.0.1.7.2 More complex objects in R
Next to numbers, sequences/vectors and matrices R offers a variety of different and more complex objects that can stow more
complex information than just numbers and characters (e.g. functions, output text. etc). The most important ones are data.frames
(extended matrices) and lists. Check the examples below to see
how to create these objects and how to access specific elements.
df <- data.frame(col1=c(2,3,4), col2=sin(c(2,3,4)), col3=c("a","b", "c"))
li <- list(x=c(2,3,4), y=sin(c(2,3,4)), z=c("a","b", "c","d","e"), fun=mean)
# to grab elements from a list or dataframe use $ or [[]]
df$col3; li$x # get variables
df[,"col3"]; li[["x"]] # get specific elements that can also be numbered
df[,3]; li[[1]]
References
lxxxvii
Assignment: 1. Get the second entry of element y of list x
.0.1.7.3 Create simple functions in R
To create our own functions in R we need to give them a name,
determine necessary input variables and whether these variables
should be pre-specified or not. I use a couple of examples to show
how to do this below.
?"function" # "function" is such a high-level object that it is interpreted
# 1. Let's create a function that squares an entry x and name it square
square <- function(x){x^2}
square(5)
square(c(1,2,3))
# 2. Let us define a function that returns a list of several different resu
stats <- function(v){
v.m <- mean(v) # create a variable that is only valid in the function
v.sd <- sd(v)
v.var <- var(v)
v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)
return(v.output)
}
v <- rnorm(1000,mean=1,sd=5)
stats(v)
stats(v)$Mean
# 3. A function can have standard arguments.
### This time we also create a random vector within the function and use its
stats2 <- function(n,m=0,s=1){
v <- rnorm(n,mean=m,sd=s)
v.m <- mean(v) # create a variable that is only valid in the function
v.sd <- sd(v)
v.var <- var(v)
lxxxviii
List of Figures
v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var)
return(v.output)
}
stats2(1000000)
stats2(1000,m=1)
stats2(1000,m=1,s=10)
stats2(m=1) # what happens if an obligatory argument is left out?
Assignment: 1. Create a function that creates two random samples with length n and m from the normal and the uniform distribution resp., given the mean and sd for the first and min and max
for the second distribution. The function shall then calculate the
covariance-matrix and the correlation-matrix which it gives back
in a named list.
.0.1.8
Plotting
Plotting in R can be done very easily. Check the examples below
to get a reference and idea about the plotting capabilities in R.
A very good source for color names (that work in R) is (http:
//en.wikipedia.org/wiki/Web_colors).
?plot
?colors # very good source for colors:
y1 <- rnorm(50,0,1)
plot(y1)
# set title, and x- and y-labels
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample")
# now make a line between elements, and color everything blue
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# if you want to save plots or open them in separate windows you can use x1
?Devices
# x11 (opens seperate window)
x11(8,6)
References
lxxxix
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l')
# pdf
pdf("plot1.pdf",6,6)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
dev.off()
# more extensive example
X11(6,6)
par(mfrow=c(2,1),cex=0.9,mar=c(3,3,1,3)+0.1)
plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l
legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample"))
barplot(y1,col="blue") # making a barplot
# plotting a histogram
hist(y1) # there is a nicer version available once we get to time series an
# create a second sample
y2 <- rnorm(50)
# scatterplot
plot(y1,y2)
# boxplot
boxplot(y1,y2)
.0.1.9
Control Structures
Last and least for this lecture we learn about control structure.
These structures (for-loops, if/else checks etc) are very useful, if
you want to translate a tedious manual task (e.g. in Excel) into
something R should do for you and go step by step (e.g. column by
column). Again, see below for a variety of examples and commands
used in easy examples.
x <- sample(-15:15,10) # sample does draw randomly draw 10 numbers from the
# 1. We square every element of vector x in a loop
y <- NULL # 1.a) set an empty variable (NULL means it is truly nothing and
is.null(y)
xc
List of Figures
# 1.b) Use an easy for-loop:
for (i in 1:length(x)){
y[i] <- x[i]^2
}
# 2. Now we use an if-condition to only replace negative values
y <- NULL
for (i in 1:length(x)){
y[i] <- x[i]
if(x[i]<0) {y[i] <- x[i]^2}
}
# ASSIGNMENT: lets calculate the 100th square root of the square root of th
y <- rep(NA,101)
y[1] <- 500
for (i in 1:100){
print(i)
y[i+1] <- sqrt(y[i])
}
plot(y,type="l")
Bibliography
Ardia, D., Mullen, K., Peterson, B., and Ulrich, J. (2016). DEoptim: Global Optimization by Differential Evolution. R package
version 2.2-4.
Bacon, C. R. (2008). Practical Portfolio Performance Measurement and Attribution: plus CD-ROM. Wiley, Chichester, England ; Hoboken, NJ, 2. edition.
Bendtsen., C. (2012). pso: Particle Swarm Optimization. R package version 1.0.3.
Gubian, S., Xiang, Y., Suomela, B., Hoeng, J., and SA., P. (2018).
GenSA: Generalized Simulated Annealing. R package version
1.1.7.
Peterson, B. G. and Carl, P. (2018). PortfolioAnalytics: Portfolio Analysis, Including Numerical Methods for Optimization of
Portfolios. R package version 1.1.0.
Würtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2015). Portfolio
Optimization with R/Rmetrics. Rmetrics.
xci
Index
constraint
diversification, lxii
factor exposure, lxiv
leverage exposure, lxv
position limit, lxii
target return, lxiii
transaction cost, lxiv
turnover, lxiii
constraints, lix–lxv
active, lx
box, lx
dollar-neutral, lx
full investment, lx
group, lix, lxi
long-only, lxi
sum of weights, lix
date and time, x
as.Date(), x
business days, xvii
holidays, xvii
POSIXct, xii
Sys.setlocale(), xi
timeDate, xiii
yearmon, xi, xxii
yearqtr, xi, xxii
factor exposure, lv, lxiv
ggplot2, xxxi
objective, lxvi–lxix
return, lxvii
risk, lxvi
risk budget, lxvii
PerformanceAnalytics, xxvii
quantmod, xxv
TTR, xxvi
risk factors, lv
solver, lxix–lxxi
GenSA, lxx
pso, lxx
random portfolios, lxix
grid, lxx
sample, lxx
simplex, lxx
Rglpk, lxix
ROI, lxx
quadprog, lxx
Rglpk, lxx
tidyverse, xxviii–xxxiii
ggplot2, xxxi
timeDate, xiii
business days, xvii
FinCenter, xiv
holidays, xvii
origin, xiv
xciii
xciv
Bibliography
timetk, lvii
TTR, xxvi
xts, xxi, lvii
import/export, xxv
join
(inner/outer/left/right/full),
xxiii
merge, xxiii
missing values, xxiv
replace, xxiii
subset, xxiii
vignettes, xxi
zoo
vignettes, xxi
Download