Sebastian Stöckl Tidy Portfoliomanagement in R DEDICATION Contents List of Tables vii List of Tables vii List of Figures ix List of Figures ix 0.1 Introduction 0.1.1 0.1.2 0.2 0.3 . . . . . . . . . . . . . . . . . . . . xi Introduction to Timeseries . . . . . . . . . xi 0.1.1.1 Date and Time . . . . . . . . . . xii 0.1.1.2 eXtensible Timeseries . . . . . . xxii 0.1.1.3 Downloading timeseries and basic visualization with quantmod . . xxvii Introduction to the tidyVerse . . . . . . xxx 0.1.2.1 Tibbles . . . . . . . . . . . . . . xxx 0.1.2.2 Summary statistics . . . . . . . xxxiii 0.1.2.3 Plotting . . . . . . . . . . . . . . xxxiii Managing Data . . . . . . . . . . . . . . . . . . . xxxvi 0.2.1 xxxvi Getting Data . . . . . . . . . . . . . . . . 0.2.1.1 Downloading from Online Datasources . . . . . . . . . . . . . . xxxvi 0.2.1.2 Manipulate Data . . . . . . . . . l Exploring Data . . . . . . . . . . . . . . . . . . . lvi iii iv Contents 0.3.1 Plotting Data . . . . . . . . . . . . . . . . lvii 0.3.1.1 Time-series plots . . . . . . . . . lvii 0.3.1.2 Box-plots . . . . . . . . . . . . . lvii 0.3.1.3 Histogram and Density Plots . . lvii 0.3.1.4 Quantile Plots . . . . . . . . . . lvii Analyzing Data . . . . . . . . . . . . . . . lvii 0.3.2.1 Calculating Statistics . . . . . . lvii 0.3.2.2 Testing Data . . . . . . . . . . . lvii 0.3.2.3 Exposure to Factors . . . . . . . lvii Managing Portfolios . . . . . . . . . . . . . . . . lviii 0.4.1 . . . . . . . . . . . . . . . . lviii 0.4.1.1 The portfolio.spec() Object . lviii 0.4.1.2 Constraints . . . . . . . . . . . . lxi 0.4.1.3 Objectives . . . . . . . . . . . . lxviii 0.4.1.4 Solvers . . . . . . . . . . . . . . lxxi Mean-variance Portfolios . . . . . . . . . . lxxiii 0.4.2.1 Introduction and Theoretics . . . lxxiii Mean-CVaR Portfolios . . . . . . . . . . . lxxiv Managing Portfolios in the Real World . . . . . . lxxiv 0.5.1 Rolling Portfolios . . . . . . . . . . . . . . lxxiv 0.5.2 Backtesting . . . . . . . . . . . . . . . . . lxxiv Further applications in Finance . . . . . . . . . . lxxiv 0.6.1 Portfolio Sorts . . . . . . . . . . . . . . . lxxiv 0.6.2 Fama-MacBeth-Regressions . . . . . . . . lxxiv 0.6.3 Risk Indices . . . . . . . . . . . . . . . . . lxxiv 0.3.2 0.4 0.4.2 0.4.3 0.5 0.6 0.7 Introduction References .0.1 . . . . . . . . . . . . . . . . . . . . . lxxiv Introduction to R . . . . . . . . . . . . . . lxxiv Contents v .0.1.1 Getting started . . . . . . . . . . lxxv .0.1.2 Working directory . . . . . . . . lxxv .0.1.3 Basic calculations . . . . . . . . lxxvi .0.1.4 Mapping variables . . . . . . . . lxxvii .0.1.5 Sequences, vectors and matrices lxxx .0.1.6 Vectors and matrices lxxxiii .0.1.7 Functions in R . . . . . . . . . . lxxxv .0.1.8 Plotting . . . . . . . . . . . . . . lxxxviii .0.1.9 Control Structures . . . . . . . . lxxxix . . . . . . Bibliography xci Bibliography xci List of Tables vii List of Figures Preface This book should accompany my lectures “Research Methods”, “Quantitative Analysis”, “Portoliomanagement and Financial Analysis” and (to a smaller degree) “Empirical Methods in Finance”. In the past years I have been a heavy promoter of the Rmetrics1 tools for my lectures and research. However, in the last year the development of the project has stagnated due to the tragic death of its founder Prof. Dr. Diethelm Würtz2 . It therefore happened several times that code from past semesters and lectures has stopped working and no more support for the project was available. Also, in the past year I have started to be a heavy user of the tidyverse3 and the financial packages that have been developed on top (e.g. tidyquant). Therefore I have taken the chance, to put together some material from my lectures and start writing this book. In structure it is kept similar to the excellent RMetrics book Würtz et al. (2015) on Portfolio Optimization with R/Rmetrics4 , that I have been heavily using and recommending to my students in the past years! 1 https://www.rmetrics.org/ https://www.rmetrics.org/about 3 https://www.tidyverse.org/ 4 https://www.rmetrics.org/ebooks-portfolio 2 ix x List of Figures Why read this book Because it may help my students :-) Structure of the book Not yet fixed. But the book will start with an introduction to the most important tools for the portfolio analysis: timeseries and the tidyverse. Afterwards, the possibilities of managing and exploring financial data will be developed. Then we do portfolio optimization for mean-Variance and Mean-CVaR portfolios. This will be followed by a chapter on backtesting, before I show further applications in finance, such as predictions, portfolio sorting, FamaMacBeth-regressions etc. Prerequisites To start, install/load all necessary packages using the pacmanpackage (the list will be expanded with the growth of the book). pacman::p_load(tidyverse,tidyquant,PortfolioAnalytics,quantmod,PerformanceAna tibbletime,timetk,ggthemes,timeDate,Quandl,alphavantager,readx DEoptim,pso,GenSA,Rglpk,ROI,ROI.plugin.glpk,ROI.plugin.quadpro Acknowledgments I thank my family… I especially thank the developers of: • the excellent fPortfolio-Book • the tidyquant package and its vignettes • the PerformanceAnalytics developers and the package vignettes • the portfolioAnalytics developers (currently working very hard on the package) and its package vignettes Introduction xi Sebastian Stöckl University of Liechtenstein Vaduz, Liechtenstein 0.1 Introduction 0.1.1 Introduction to Timeseries For an introduction to R see the Appendix @ref(ss_991IntrotoR) Many of the datasets we will be working with have a (somehow regular) time dimension, and are therefore often called timeseries. In R there are a variety of classes available to handle data, such as vector, matrix, data.frame or their more modern implementation: tibble.[^According to the Vignette5 of the xts.] Adding a time dimension creates a timeseries from these objects. The most common/flexible package in R that handles timeseries based on the first three formats is xts, which we will discuss in the following. Afterwards we will introduce the package timetk-package that allows xts to interplay with tibbles to create the most powerful framework to handle (even very large) time-based datasets (as we often encounter in finance). The community is currently working heavily to develop timeaware tibbles to bring together the powerful grouping feature from the dplyr package (for tibbles) with the abbilities of xts, which is the most powerful and most used timeseries method in finance to date, due to the available interplay with quantmod and other financial package. See also this link6 for more information. 5 https://cran.r-project.org/web/packages/xts/vignettes/xts.pdf xii List of Figures All information regarding tibbles and the financial universe is summarized and updated on the business-science.io-Website7 . In the following, we will define a variety of date and time classes, before we go and introduce xts, tibble and tibbletime. Most of this packages come with some excellent vignettes, that I will reference for further reading, while I will only pickup the necessary features for portfolio management, which is the focus of this book. 0.1.1.1 Date and Time There some basic functionalities in base-R, but most of the time we will need additional functions to perform all necessary tasks. Available date (time) classes are Date, POSIXct, (chron), yearmon, yearqtr and timeDate (from the Rmetrics bundle). 0.1.1.1.1 Basic Date and Time Classes There are several Date and Time Classes in R that can all be used as time-index for xts. We start with the most basic as.Date() d1 <- "2018-01-18" str(d1) # str() checks the structure of the R-object ## chr "2018-01-18" d2 <- as.Date(d1) str(d2) ## Date[1:1], format: "2018-01-18" Introduction xiii In the second case, R automatically detects the format of the Date-object, but if there is something more complex involved you can specify the format (for all available format definitions, see ?strptime()) d3 <- "4/30-2018" as.Date(d3, "%m/%d-%Y") # as.Date(d3) will not work ## [1] "2018-04-30" If you are working with monthly or quarterly data, yearmon and yearqtr will be your friends (both coming from the zoo-package that serves as foundation for xts) as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y")) ## [1] "Jan 2018" ## [1] "Apr 2018" as.yearqtr(d1); as.yearqtr(as.Date(d3, "%m/%d-%Y")) ## [1] "2018 Q1" ## [1] "2018 Q2" Note, that as.yearmon shows dates in terms of the current locale of your computer (e.g. Austrian German). You can find out about your locale with Sys.getlocale() and set a different locale with Sys.setlocale() xiv List of Figures Sys.setlocale("LC_TIME","German_Austria") ## [1] "German_Austria.1252" as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y")) ## [1] "Jän 2018" ## [1] "Apr 2018" Sys.setlocale("LC_TIME","English") ## [1] "English_United States.1252" as.yearmon(d1); as.yearmon(as.Date(d3, "%m/%d-%Y")) ## [1] "Jan 2018" ## [1] "Apr 2018" When your data wants you to also include information on time, then you will either need the POSIXct (which is the basic package behind all times and dates in R) or the timeDate-package. The latter one includes excellent abilities to work with financial data (see the next section). Note that talking about time also requires you to talk about timezones! We start with several examples of the POSIXct-class: Introduction xv strptime("2018-01-15 13:55:23.975", "%Y-%m-%d %H:%M:%OS") # converts from ch ## [1] "2018-01-15 13:55:23 CET" as.POSIXct("2009-01-05 14:19:12", format="%Y-%m-%d %H:%M:%S", tz="UTC") ## [1] "2009-01-05 14:19:12 UTC" We will mainly use the timeDate-package that provides many useful functions for financial timeseries. An introduction to timeDate by the Rmetrics group can be found at https://www.rmetrics.org/sites/default/files/201002-timeDateObjects.pdf. Dates <- c("1989-09-28","2001-01-15","2004-08-30","1990-02-09") Times <- c( "23:12:55", "10:34:02", "08:30:00", "11:18:23") DatesTimes <- paste(Dates, Times) as.Date(DatesTimes) ## [1] "1989-09-28" "2001-01-15" "2004-08-30" "1990-02-09" as.timeDate(DatesTimes) ## GMT ## [1] [1989-09-28 23:12:55] [2001-01-15 10:34:02] [2004-08-30 08:30:00] ## [4] [1990-02-09 11:18:23] xvi List of Figures You see, that the timeDate comes along with timezone information (GMT) that is set to your computers locale. timeDate allows you to specify the timezone of origin zone as well as the timezone to transfer data to FinCenter: timeDate(DatesTimes, zone = "Tokyo", FinCenter = "Zurich") ## Zurich ## [1] [1989-09-28 15:12:55] [2001-01-15 02:34:02] [2004-08-30 01:30:00] ## [4] [1990-02-09 03:18:23] timeDate(DatesTimes, zone = "Tokyo", FinCenter = "NewYork") ## NewYork ## [1] [1989-09-28 10:12:55] [2001-01-14 20:34:02] [2004-08-29 19:30:00] ## [4] [1990-02-08 21:18:23] timeDate(DatesTimes, zone = "NewYork", FinCenter = "Tokyo") ## Tokyo ## [1] [1989-09-29 12:12:55] [2001-01-16 00:34:02] [2004-08-30 21:30:00] ## [4] [1990-02-10 01:18:23] listFinCenter("Europe/Vi*") # get a list of all financial centers available ## [1] "Europe/Vaduz" ## [4] "Europe/Vilnius" "Europe/Vatican" "Europe/Vienna" "Europe/Volgograd" Introduction xvii Date as well as the timeDate package allow you to create time sequences (necessary if you want to manually create timeseries) dates1 <- seq(as.Date("2017-01-01"), length=12, by="month"); dates1 # or to= ## [1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01" "2017-05-01" ## [6] "2017-06-01" "2017-07-01" "2017-08-01" "2017-09-01" "2017-10-01" ## [11] "2017-11-01" "2017-12-01" dates2 <- timeSequence(from = "2017-01-01", to = "2017-12-31", by = "month"); ## GMT ## [1] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01] [2017-05-01] ## [6] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01] [2017-10-01] ## [11] [2017-11-01] [2017-12-01] Now there are several very useful functions in the timeDate package to determine first/last days of months/quarters/… (I let them speak for themselves) timeFirstDayInMonth(dates1 -7) # btw check the difference between "dates1-7" ## GMT ## [1] [2016-12-01] [2017-01-01] [2017-02-01] [2017-03-01] [2017-04-01] ## [6] [2017-05-01] [2017-06-01] [2017-07-01] [2017-08-01] [2017-09-01] ## [11] [2017-10-01] [2017-11-01] xviii List of Figures timeFirstDayInQuarter(dates1) ## GMT ## [1] [2017-01-01] [2017-01-01] [2017-01-01] [2017-04-01] [2017-04-01] ## [6] [2017-04-01] [2017-07-01] [2017-07-01] [2017-07-01] [2017-10-01] ## [11] [2017-10-01] [2017-10-01] timeLastDayInMonth(dates1) ## GMT ## [1] [2017-01-31] [2017-02-28] [2017-03-31] [2017-04-30] [2017-05-31] ## [6] [2017-06-30] [2017-07-31] [2017-08-31] [2017-09-30] [2017-10-31] ## [11] [2017-11-30] [2017-12-31] timeLastDayInQuarter(dates1) ## GMT ## [1] [2017-03-31] [2017-03-31] [2017-03-31] [2017-06-30] [2017-06-30] ## [6] [2017-06-30] [2017-09-30] [2017-09-30] [2017-09-30] [2017-12-31] ## [11] [2017-12-31] [2017-12-31] timeNthNdayInMonth("2018-01-01",nday = 5, nth = 3) # useful for option expir ## GMT ## [1] [2018-01-19] Introduction xix timeNthNdayInMonth(dates1,nday = 5, nth = 3) ## GMT ## [1] [2017-01-20] [2017-02-17] [2017-03-17] [2017-04-21] [2017-05-19] ## [6] [2017-06-16] [2017-07-21] [2017-08-18] [2017-09-15] [2017-10-20] ## [11] [2017-11-17] [2017-12-15] If one wants to create a more specific sequence of times, this can be done with timeCalendar using time ‘atoms’: timeCalendar(m = 1:4, d = c(28, 15, 30, 9), y = c(1989, 2001, 2004, 1990), F ## Europe/Zurich ## [1] [1989-01-28 01:00:00] [2001-02-15 01:00:00] [2004-03-30 02:00:00] ## [4] [1990-04-09 02:00:00] timeCalendar(d=1, m=3:4, y=2018, h = c(9, 14), min = c(15, 23), s=c(39,41), ## Europe/Zurich ## [1] [2018-03-01 10:15:39] [2018-04-01 16:23:41] 0.1.1.1.2 Week-days and Business-days One of the most important functionalities only existing in the timeDate-package is the possibility to check for business days in almost any timezone. The most important ones can be called by holidayXXX() xx List of Figures holidayNYSE() ## NewYork ## [1] [2018-01-01] [2018-01-15] [2018-02-19] [2018-03-30] [2018-05-28] ## [6] [2018-07-04] [2018-09-03] [2018-11-22] [2018-12-25] holiday(year = 2018, Holiday = c("GoodFriday","Easter","FRAllSaints")) ## GMT ## [1] [2018-03-30] [2018-04-01] [2018-11-01] dateSeq <- timeSequence(Easter(year(Sys.time()), -14), Easter(year(Sys.time( ## ## ## ## ## ## ## GMT [1] [6] [11] [16] [21] [26] [2018-03-18] [2018-03-23] [2018-03-28] [2018-04-02] [2018-04-07] [2018-04-12] [2018-03-19] [2018-03-24] [2018-03-29] [2018-04-03] [2018-04-08] [2018-04-13] [2018-03-20] [2018-03-25] [2018-03-30] [2018-04-04] [2018-04-09] [2018-04-14] [2018-03-21] [2018-03-26] [2018-03-31] [2018-04-05] [2018-04-10] [2018-04-15] [2018-03-22] [2018-03-27] [2018-04-01] [2018-04-06] [2018-04-11] dateSeq2 <- dateSeq[isWeekday(dateSeq)]; dateSeq2 # select only weekdays ## GMT ## [1] ## [6] ## [11] ## [16] [2018-03-19] [2018-03-26] [2018-04-02] [2018-04-09] [2018-03-20] [2018-03-27] [2018-04-03] [2018-04-10] [2018-03-21] [2018-03-28] [2018-04-04] [2018-04-11] [2018-03-22] [2018-03-29] [2018-04-05] [2018-04-12] [2018-03-23] [2018-03-30] [2018-04-06] [2018-04-13] Introduction xxi dayOfWeek(dateSeq2) ## ## ## ## ## ## ## ## 2018-03-19 "Mon" 2018-03-27 "Tue" 2018-04-04 "Wed" 2018-04-12 "Thu" 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26 "Tue" "Wed" "Thu" "Fri" "Mon" 2018-03-28 2018-03-29 2018-03-30 2018-04-02 2018-04-03 "Wed" "Thu" "Fri" "Mon" "Tue" 2018-04-05 2018-04-06 2018-04-09 2018-04-10 2018-04-11 "Thu" "Fri" "Mon" "Tue" "Wed" 2018-04-13 "Fri" dateSeq3 <- dateSeq[isBizday(dateSeq, holidayZURICH(year(Sys.time())))]; dat ## GMT ## [1] ## [6] ## [11] ## [16] [2018-03-19] [2018-03-26] [2018-04-04] [2018-04-11] [2018-03-20] [2018-03-27] [2018-04-05] [2018-04-12] [2018-03-21] [2018-03-22] [2018-03-23] [2018-03-28] [2018-03-29] [2018-04-03] [2018-04-06] [2018-04-09] [2018-04-10] [2018-04-13] dayOfWeek(dateSeq3) ## 2018-03-19 2018-03-20 2018-03-21 2018-03-22 2018-03-23 2018-03-26 ## "Mon" "Tue" "Wed" "Thu" "Fri" "Mon" ## 2018-03-27 2018-03-28 2018-03-29 2018-04-03 2018-04-04 2018-04-05 ## "Tue" "Wed" "Thu" "Tue" "Wed" "Thu" ## 2018-04-06 2018-04-09 2018-04-10 2018-04-11 2018-04-12 2018-04-13 ## "Fri" "Mon" "Tue" "Wed" "Thu" "Fri" Now, one of the strongest points for the timeDate package is made, when one puts times and dates from different timezones together. xxii List of Figures This could be a challenging task (imagine hourly stock prices from London, Tokyo and New York). Luckily the timeDate-package can handle this easily: ZH <- timeDate("2015-01-01 16:00:00", zone = "GMT", FinCenter = "Zurich") NY <- timeDate("2015-01-01 18:00:00", zone = "GMT", FinCenter = "NewYork") c(ZH, NY) ## Zurich ## [1] [2015-01-01 17:00:00] [2015-01-01 19:00:00] c(NY, ZH) # it always takes the Financial Center of the first entry ## NewYork ## [1] [2015-01-01 13:00:00] [2015-01-01 11:00:00] 0.1.1.1.3 Assignments Create a daily time series for 2018: 1. Find the subset of first and last days per month/quarter (uniquely) 2. Take December 2017 and remove all weekends and holidays in Zurich (Tokyo) 3. create a series of five dates & times in New York. Show them for New York, London and Belgrade 0.1.1.2 eXtensible Timeseries The xts format is based on the timeseries format zoo, but extends its power to be more compatible with other data classes. For example, if one converts dates from the timeDate, xts will be so Introduction xxiii flexible as to memorize the financial center the dates were coming from and upon retransformation to this class will be reassigned values that would have been lost upon transformation to a pure zoo-object. As quite often we (might) want to transform our data to and from xts this is a great feature and makes our lifes a lot easier. Also xts comes with a bundle of other features. For the reader who wants to dig deeper, we recommend the excellent zoo vignettes (vignette("zoo-quickref"), vignette("zoo"), vignette("zoo-faq"), vignette("zoo-design") and vignette("zoo-read")). Read up on xts in vignette("xts") and vignette("xts-faq"). To start, we create an xts object consisting of a series of randomly created data points: data <- rnorm(5) # 5 std. normally distributed random numbers dates <- seq(as.Date("2017-05-01"), length=5, by="days") xts1 <- xts(x=data, order.by=dates); xts1 ## ## ## ## ## ## 2017-05-01 2017-05-02 2017-05-03 2017-05-04 2017-05-05 [,1] 0.72838032 0.47100977 -0.04537768 1.61845234 0.07191067 coredata(xts1) # access data xxiv ## ## ## ## ## ## [1,] [2,] [3,] [4,] [5,] List of Figures [,1] 0.72838032 0.47100977 -0.04537768 1.61845234 0.07191067 index(xts1) # access time (index) ## [1] "2017-05-01" "2017-05-02" "2017-05-03" "2017-05-04" "2017-05-05" Here, the xts object was built from a vector and a series of Dates. We could also have used timeDate, yearmon or yearqtr and a data.frame: s1 <- rnorm(5); s2 <- 1:5 data <- data.frame(s1,s2) dates <- timeSequence("2017-01-01",by="months",length.out=5,zone = "GMT") xts2 <- xts(x=data, order.by=dates); xts2 ## Warning: timezone of object (GMT) is different than current timezone (). ## ## ## ## ## ## s1 s2 2017-01-01 0.7462329 1 2017-02-01 -0.1551448 2 2017-03-01 -0.9693310 3 2017-04-01 0.3428151 4 2017-05-01 0.4692079 5 Introduction xxv dates2 <- as.yearmon(dates) xts3 <- xts(x=data, order.by = dates2) In the next step we evaluate the merging of two timeseries: set.seed(1) xts3 <- xts(rnorm(6), timeSequence(from = "2017-01-01", to = "2017-06-01", b xts4 <- xts(rnorm(5), timeSequence(from = "2017-04-01", to = "2017-08-01", b colnames(xts3) <- "tsA"; colnames(xts4) <- "tsB" merge(xts3,xts4) Please be aware that joining timeseries in R does sometimes want you to do a left/right/inner/outer join of the two objects merge(xts3,xts4,join = "left") merge(xts3,xts4,join = "right") merge(xts3,xts4,join = "inner") merge(xts3,xts4,join="outer",fill=0) In the next step, we subset and replace parts of xts objects xts5 <- xts(rnorm(24), timeSequence(from = "2016-01-01", to = "2017-12-01", xts5["2017-01-01"] xts5["2017-05-01/2017-08-12"] xts5[c("2017-01-01","2017-05-01")] <- NA xts5["2016"] <- 99 xts5["2016-05-01/"] first(xts5) last(xts5) first(xts5,"3 months") xts6 <- last(xts5,"1 year") xxvi List of Figures Now let us handle the missing value we introduced. One possibility is just to omit the missing value using na.omit(). Other possibilities would be to use the last value na.locf() or linear interpolation with na.approx() na.omit(xts6) na.locf(xts6) na.locf(xts6,fromLast = TRUE,na.rm = TRUE) na.approx(xts6,na.rm = FALSE) Finally, standard calculations can be done on xts objects, AND there are some pretty helper functions to make life easier periodicity(xts5) nmonths(xts5); nquarters(xts5); nyears(xts5) to.yearly(xts5) to.quarterly(xts6) round(xts6^2,2) xts6[which(is.na(xts6))] <- rnorm(2) # For aggregation of timeseries ep1 <- endpoints(xts6,on="months",k = 2) # for aggregating timesries ep2 <- endpoints(xts6,on="months",k = 3) # for aggregating timesries period.sum(xts6, INDEX = ep2) period.apply(xts6, INDEX = ep1, FUN=mean) # 2month means period.apply(xts6, INDEX = ep2, FUN=mean) # 3month means # Lead, lag and diff operations cbind(xts6,lag(xts6,k=-1),lag(xts6,k=1),diff(xts6)) Finally, I will show some applications that go beyond xts, for example the use of lapply to operate on a list Introduction xxvii # splitting timeseries (results is a list) xts6_yearly <- split(xts5,f="years") lapply(xts6_yearly,FUN=mean,na.rm=TRUE) # using elaborate functions from the zoo-package rollapply(as.zoo(xts6), width=3, FUN=sd) # rolling standard deviation Last and least, we plot xts data and save it to a (csv) file, then open it again: tmp <- tempfile() write.zoo(xts2,sep=",",file = tmp) xts8 <- as.xts(read.zoo(tmp, sep=",", FUN=as.yearmon)) plot(xts8) 0.1.1.3 Downloading timeseries and basic visualization with quantmod Many downloading and plotting functions are (still) available in quantmod. We first require the package, then download data for Google, Apple and the S&P500 from yahoo finance. Each of these “Symbols” will be downloaded into its own “environment”. For plotting there are a large variety of technical indicators available, for an overview see here8 . Quantmod is developed by Jeffrey Ryan and Joshua Ulrich9 and has a homepage10 . The homepage includes an Introduction11 , describes how Data can be handled between xts and quantmod12 and has examples about Financial Charting with quantmod and TTR13 . More documents will be developed within 2018. 8 https://www.r-bloggers.com/a-guide-on-r-quantmod-package-how-toget-started/ xxviii List of Figures require(quantmod) # the easiest form of getting data is for yahoo finance where you know the getSymbols(Symbols = "AAPL", from="2010-01-01", to="2018-03-01", periodicity= head(AAPL) is.xts(AAPL) plot(AAPL[, "AAPL.Adjusted"], main = "AAPL") chartSeries(AAPL, TA=c(addVo(),addBBands(), addADX())) # Plot and add techni getSymbols(Symbols = c("GOOG","^GSPC"), from="2000-01-01", to="2018-03-01", p getSymbols('DTB3', src='FRED') # fred does not recognize from and to Now we create an xts from all relevant parts of the data stocks <- cbind("Apple"=AAPL[,"AAPL.Adjusted"],"Google"=GOOG[,"GOOG.Adjusted" rf.daily <- DTB3["2010-01-01/2018-03-01"] rf.monthly <- to.monthly(rf.daily)[,"rf.daily.Open"] rf <- xts(coredata(rf.monthly),order.by = as.Date(index(rf.monthly))) One possibility (that I adopted from (here)[https://www. quantinsti.com/blog/an-example-of-a-trading-strategy-coded-inr/]) is to use the technical indicators provided by quantmod to devise a technical trading strategy. We make use of a fast and a slow moving average (function MACD in the TTR package that belongs to quantmod). Whenever the fast moving average crosses the slow moving one from below, we invest (there is a short term trend to exploit) and we drop out of the investment once the red (fast) line falls below the grey (slow) line. To evaluate the trading strategy we need to also calculate returns for the S&P500 index using ROC. chartSeries(GSPC, TA=c(addMACD(fast=3, slow=12,signal=6,type=SMA))) macd <- MACD(GSPC[,"GSPC.Adjusted"], nFast=3, nSlow=12,nSig=6,maType=SMA, per buy_sell_signal <- Lag(ifelse(macd$macd < macd$signal, -1, 1)) Introduction xxix buy_sell_returns <- (ROC(GSPC[,"GSPC.Adjusted"])*buy_sell_signal)["2001-06-01 portfolio <- exp(cumsum(buy_sell_returns)) # for nice plotting we assume tha plot(portfolio) For evaluation of trading strategies/portfolios and other financial timeseries, almost every tool is available through the package PerformanceAnalytics. In this case charts.PerformanceSummary() calculates cumulative returns (similar to above), monthly returns and maximum drawdown (maximum loss in relation to best value, see here14 . PerformanceAnalytics is a large package with an uncountable variety of Tools. There are vignettes on the estimation of higher order (co)moments vignette("EstimationComoments"), performance attribution measures according to Bacon (2008) vignette("PA-Bacon"), charting vignette("PA-charts") and more that can be found on the PerformanceAnalytics cran page15 . require(PerformanceAnalytics) rets <- cbind(buy_sell_returns,ROC(GSPC[,"GSPC.Adjusted"])) colnames(rets) <- c("investment","benchmark") charts.PerformanceSummary(rets,colorset=rich6equal) chart.Histogram(rets, main = "Risk Measures", methods = c("add.density", "ad 14 https://de.wikipedia.org/wiki/Maximum_Drawdown xxx List of Figures 0.1.2 Introduction to the tidyVerse 0.1.2.1 Tibbles Since the middle of 2017 a lot of programmers have put in a huge effort to rewrite many r functions and data objects in a tidy way and thereby created the tidyverse16 . For updates check the tidyverse homepage17 . A very well written book introducing the tidyverse can be found online: R for Data Science18 . The core of the tidyverse currently contains several packages: – ggplot2 for creating powerful graphs19 (see vignette("ggplot2-specs")) – dplyr for data manipulation20 (see vignette("dplyr")) – tidyr for tidying data21 – readr for importing datasets22 (see vignette("readr")) – purrr for programming23 (see the “) – tibble for modern data.frames24 (see vignette("tibble")) the the the the and many more25 . require(tidyverse) # install first if not yet there, update regularly: insta require(tidyquant) # this package wraps all the quantmod etc packages into t Most of the following is adapted from “Introduction to Statistical Learning with Applications in R” by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani at http://www.science. 16 https://www.tidyverse.org/ Introduction xxxi smith.edu/~jcrouser/SDS293/labs/. We begin by loading in the Auto data set. This data is part of the ISLR package. require(ISLR) data(Auto) Nothing happens when you run this, but now the data is available in your environment. (In RStudio, you would see the name of the data in your Environment tab). To view the data, we can either print the entire dataset by typing its name, or we can “slice” some of the data off to look at just a subset by piping data using the %>% operator into the slice function. The piping operator is one of the most useful tools of the tidyverse. Thereby you can pipe command into command into command without saving and naming each Intermittent step. The first step is to transform this data.frame into a tibble (similar concept but better26 ). A tibble has observations in rows and variables in columns. Those variables can have many different formats: Auto %>% slice(1:10) tbs1 <- tibble( Date = seq(as.Date("2017-01-01"), length=12, by="months"), returns = rnorm(12), letters = sample(letters, 12, replace = TRUE) ) As you can see all three columns of tbs1 have different formats. One can get the different variables by name and position. If you want to use the pipe operator you need to use the special placeholder .. 26 http://r4ds.had.co.nz/tibbles.html xxxii List of Figures tbs1$returns tbs1[[2]] tbs1 %>% .[[2]] Before we go on an analysis a large tibble such as Auto, we quickly talk about reading and saving files with tools from the tidyverse. We save the file as csv using write_csv and read it back using read_csv. because the columns of the read file are not in the exact format as before, we use mutate to transform the columns. Auto <- as.tibble(Auto) # make tibble from Auto tmp <- tempfile() write_csv(Auto,path = tmp) # write Auto2 <- read_csv(tmp) Auto2 <- Auto2 %>% mutate(cylinders=as.double(cylinders),horsepower=as.double(horsepower),yea all.equal(Auto,Auto2) # only the factor levels differ Notice that the data looks just the same as when we loaded it from the package. Now that we have the data, we can begin to learn things about it. dim(Auto) str(Auto) names(Auto) The dim() function tells us that the data has 392 observations and nine variables. The original data had some empty rows, but when we read the data in R knew to ignore them. The str() function tells us that most of the variables are numeric or integer, although the name variable is a character vector. names() lets us check the variable names. Introduction xxxiii 0.1.2.2 Summary statistics Often, we want to know some basic things about variables in our data. summary() on an entire dataset will give you an idea of some of the distributions of your variables. The summary() function produces a numerical summary of each variable in a particular data set. summary(Auto) The summary suggests that origin might be better thought of as a factor. It only seems to have three possible values, 1, 2 and 3. If we read the documentation about the data (using ?Auto) we will learn that these numbers correspond to where the car is from: 1. American, 2. European, 3. Japanese. So, lets mutate() that variable into a factor (categorical) variable. Auto <- Auto %>% mutate(origin = factor(origin)) summary(Auto) 0.1.2.3 Plotting We can use the ggplot2 package to produce simple graphics. ggplot2 has a particular syntax, which looks like this ggplot(Auto) + geom_point(aes(x=cylinders, y=mpg)) The basic idea is that you need to initialize a plot with ggplot() and then add “geoms” (short for geometric objects) to the plot. xxxiv List of Figures The ggplot2 package is based on the Grammar of Graphics27 , a famous book on data visualization theory. It is a way to map attributes in your data (like variables) to “aesthetics” on the plot. The parameter aes() is short for aesthetic. For more about the ggplot2 syntax, view the help by typing ?ggplot or ?geom_point. There are also great online resources for ggplot2, like the R graphics cookbook28 . The cylinders variable is stored as a numeric vector, so R has treated it as quantitative. However, since there are only a small number of possible values for cylinders, one may prefer to treat it as a qualitative variable. We can turn it into a factor, again using a mutate() call. Auto = Auto %>% mutate(cylinders = factor(cylinders)) To view the relationship between a categorical and a numeric variable, we might want to produce boxplots. As usual, a number of options can be specified in order to customize the plots. ggplot(Auto) + geom_boxplot(aes(x=cylinders, y=mpg)) + xlab("Cylinders") + y The geom geom_histogram() can be used to plot a histogram. 27 https://www.google.com/url?sa=t&rct=j&q= &esrc=s&source=web&cd=1&cad=rja&uact=8&ved= 0ahUKEwjV6I6F4ILPAhUFPT4KHTFiBwgQFggcMAA& url=https%3A%2F%2Fwww.amazon.com%2FGrammarGraphics-Statistics-Computing%2Fdp%2F0387245448& usg=AFQjCNF5D6H3ySCsgqBTdp96KNF3bGyU2Q&sig2= GnNgoN6Ztn3AJSTJYaMPwA 28 http://www.cookbook-r.com/Graphs/ Introduction xxxv ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5) For small datasets, we might want to see all the bivariate relationships between the variables. The GGally package has an extension of the scatterplot matrix that can do just that. We make use of the select operator to only select the two variables mpg and cylinders and pipe it into the ggpairs() function Auto %>% select(mpg, cylinders) %>% GGally::ggpairs() Because there are not many cars with 3 and 5 cylinders we use filter to only select those cars with 4, 6 and 8 cylinders. Auto %>% select(mpg, cylinders) %>% filter(cylinders %in% c(4,6,8)) %>% GGal Sometimes, we might want to save a plot for use outside of R. To do this, we can use the ggsave() function. ggsave("histogram.png",ggplot(Auto) + geom_histogram(aes(x=mpg), binwidth=5) TO DO: * Tidyquant: Document more technical features. * For extensive manipulations a la timeseries, there is an extension of the tibble objects: time aware tibbles, that allow for many of the xts functionality without the necessary conversion tibbletime29 . 29 https://github.com/business-science/tibbletime xxxvi 0.2 List of Figures Managing Data In this chapter we will learn how to download/import data from various sources. Most importantly we will use the quantmod library through tidyquant to download financial data from a variety of sources. We will also lear how to import ‘.xlsx’ (Excel) files. 0.2.1 Getting Data 0.2.1.1 Downloading from Online Datasources The tidyquant package comes with a variiety of readily compiled datasets/datasources. For whole collections of data, there are the following commands available tq_exchange_options() # find all exchanges available ## [1] "AMEX" "NASDAQ" "NYSE" tq_index_options() # find all indices available ## [1] "RUSSELL1000" "RUSSELL2000" "RUSSELL3000" "DOW" ## [6] "SP400" "SP500" "SP600" "SP1000" "DOWGLOBAL" tq_get_options() # find all data sources available ## [1] "stock.prices" ## [4] "financials" ## [7] "splits" ## [10] "metal.prices" ## [13] "alphavantager" "stock.prices.google" "key.ratios" "economic.data" "quandl" "rblpapi" "stock.prices.japan" "dividends" "exchange.rates" "quandl.datatable" Managing Data xxxvii The commands tq_exchange() and tq_index() will now get you all symbols and some additional information on the stock listed at that exchange or contained in that index.30 glimpse(sp500) ## ## ## ## ## ## ## Observations: Variables: 5 $ symbol $ company $ weight $ sector $ shares_held 504 <chr> <chr> <dbl> <chr> <dbl> "AAPL", "MSFT", "AMZN", "BRK.B", "FB", "Apple Inc.", "Microsoft Corporation", 0.044387857, 0.035053855, 0.032730459, "Information Technology", "Information 53939268, 84297440, 4418447, 21117048, "JPM", "JNJ... "Amazon.com... 0.016868330... Technology"... 26316160, 3... glimpse(nyse) ## ## ## ## ## ## ## ## ## Observations: 3,139 Variables: 7 $ symbol <chr> $ company <chr> $ last.sale.price <dbl> $ market.cap <chr> $ ipo.year <dbl> $ sector <chr> $ industry <chr> "DDD", "MMM", "WBAI", "WUBA", "EGHT", "AHC", "... "3D Systems Corporation", "3M Company", "500.c... 18.4800, 206.7100, 11.6400, 68.1800, 23.2000, ... "$2.11B", "$121.26B", "$491.85M", "$10.06B", "... NA, NA, 2013, 2013, NA, NA, 2014, 2014, NA, NA... "Technology", "Health Care", "Consumer Service... "Computer Software: Prepackaged Software", "Me... glimpse(nasdaq) 30 Note that tq_index() unfortunately makes use of the package XLConnect that requires Java to be installed on your system. xxxviii ## ## ## ## ## ## ## ## ## Observations: 3,405 Variables: 7 $ symbol <chr> $ company <chr> $ last.sale.price <dbl> $ market.cap <chr> $ ipo.year <dbl> $ sector <chr> $ industry <chr> List of Figures "YI", "PIH", "PIHPP", "TURN", "FLWS", "FCCY", ... "111, Inc.", "1347 Property Insurance Holdings... 13.800, 6.350, 25.450, 2.180, 11.550, 20.150, ... NA, "$38M", NA, "$67.85M", "$746.18M", "$168.8... 2018, 2014, NA, NA, 1999, NA, NA, 2011, 2014, ... NA, "Finance", "Finance", "Finance", "Consumer... NA, "Property-Casualty Insurers", "Property-Ca... The datset we will be using consists of the ten largest stocks within the S&P500 that had an IPO before January 2000. Therefore we need to merge both datasets using inner_join() because we only want to keep symbols from the S&P500 that are also traded on NYSE or NASDAQ: stocks.selection <- sp500 %>% inner_join(rbind(nyse,nasdaq) %>% select(symbol,last.sale.price,market.cap filter(ipo.year<2000&!is.na(market.cap)) %>% # filter years with ipo<2000 arrange(desc(weight)) %>% # sort in descending order slice(1:10) The ten largest stocks in the S&P500 with a history longer than January 2000. symbol company weight sector shares_held last.sale.price market.cap ipo.year Managing Data AAPL Apple Inc. 0.044 Information Technology 53939268 221.07 $1067.75B 1980 MSFT Microsoft Corporation 0.035 Information Technology 84297440 111.71 $856.62B 1986 AMZN Amazon.com Inc. 0.033 Consumer Discretionary 4418447 1990.00 $970.6B 1997 CSCO Cisco Systems Inc. 0.009 xxxix xl Information Technology 51606584 46.89 $214.35B 1990 NVDA NVIDIA Corporation 0.007 Information Technology 6659463 268.20 $163.07B 1999 ORCL Oracle Corporation 0.006 Information Technology 32699620 49.34 $196.43B 1986 AMGN Amgen Inc. 0.005 Health Care 7306144 199.50 List of Figures Managing Data xli $129.13B 1983 ADBE Adobe Systems Incorporated 0.005 Information Technology 5402625 267.79 $131.13B 1986 QCOM QUALCOMM Incorporated 0.004 Information Technology 15438597 71.75 $105.41B 1991 GILD Gilead Sciences Inc. 0.004 Health Care 14310276 73.97 $95.89B 1992 In a next step, we will download stock prices from yahoo. xlii List of Figures Data from that source usually comes in the OHLC format (open,high,low,close) with additional information (volume, adjusted). We will additionall download data for the S&P500-index itself. Note, that we get daily prices: stocks.prices <- stocks.selection$symbol %>% tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31") %>% group_by(symbol) index.prices <- "^GSPC" %>% tq_get(get = "stock.prices",from = "2000-01-01",to = "2017-12-31") stocks.prices %>% slice(1:2) # show the first two entries of each group ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## # A tibble: 20 x 8 # Groups: symbol [10] symbol date open high <chr> <date> <dbl> <dbl> 1 AAPL 2000-01-03 3.75 4.02 2 AAPL 2000-01-04 3.87 3.95 3 ADBE 2000-01-03 16.8 16.9 4 ADBE 2000-01-04 15.8 16.5 5 AMGN 2000-01-03 70 70 6 AMGN 2000-01-04 62 64.1 7 AMZN 2000-01-03 81.5 89.6 8 AMZN 2000-01-04 85.4 91.5 9 CSCO 2000-01-03 55.0 55.1 10 CSCO 2000-01-04 52.8 53.5 11 GILD 2000-01-03 1.79 1.80 12 GILD 2000-01-04 1.70 1.72 13 MSFT 2000-01-03 58.7 59.3 14 MSFT 2000-01-04 56.8 58.6 15 NVDA 2000-01-03 3.94 3.97 16 NVDA 2000-01-04 3.83 3.84 17 ORCL 2000-01-03 31.2 31.3 18 ORCL 2000-01-04 28.9 29.7 19 QCOM 2000-01-03 99.6 100 20 QCOM 2000-01-04 86.3 87.7 low <dbl> 3.63 3.61 16.1 15.0 62.9 57.7 79.0 81.8 51.8 50.9 1.72 1.66 56 56.1 3.68 3.60 27.9 26.2 87 80 close volume adjusted <dbl> <dbl> <dbl> 4.00 133949200 3.54 3.66 128094400 3.24 16.4 7384400 16.1 15.0 7813200 14.8 62.9 22914900 53.5 58.1 15052600 49.4 89.4 16117600 89.4 81.9 17487400 81.9 54.0 53076000 43.6 51 50805600 41.2 1.76 54070400 1.61 1.68 38960000 1.54 58.3 53228400 42.5 56.3 54119000 41.0 3.90 7522800 3.61 3.80 7512000 3.51 29.5 98114800 26.4 26.9 116824800 24.0 89.7 91334000 65.7 81.0 63567400 59.4 Managing Data xliii Dividends and stock splits can also be downloaded: stocks.dividends <- stocks.selection$symbol %>% tq_get(get = "dividends",from = "2000-01-01",to = "2017-12-31") %>% group_by(symbol) stocks.splits <- stocks.selection$symbol %>% tq_get(get = "splits",from = "2000-01-01",to = "2017-12-31") %>% group_by(symbol) We additionally can download financial for the different stocks. Therein we have key ratios (financials, profitability, growth, cash flow, financial health, efficiency ratios and valuation ratios). These ratios are from Morningstar31 and come in a nested form, that we will have to ‘dig out’ using unnest. stocks.ratios <- stocks.selection$symbol %>% tq_get(get = "key.ratios",from = "2000-01-01",to = "2017-12-31") %>% group_by(symbol) ## # A tibble: 42 x 3 ## # Groups: symbol [6] ## symbol section ## <chr> <chr> ## 1 AAPL Financials ## 2 AAPL Profitability ## 3 AAPL Growth ## 4 AAPL Cash Flow ## 5 AAPL Financial Health ## 6 AAPL Efficiency Ratios ## 7 AAPL Valuation Ratios ## 8 MSFT Financials ## 9 MSFT Profitability 31 http://www.morningstar.com/ data <list> <tibble <tibble <tibble <tibble <tibble <tibble <tibble <tibble <tibble [150 x 5]> [170 x 5]> [160 x 5]> [50 x 5]> [240 x 5]> [80 x 5]> [40 x 5]> [150 x 5]> [170 x 5]> xliv List of Figures ## 10 MSFT Growth ## # ... with 32 more rows <tibble [160 x 5]> We find that financial ratios are only available for a subset of the ten stocks. We first filter for the ‘Growth’-information, then unnest() the nested tibbles and filter again for ‘EPS %’ and the ‘Year over Year’ information. Then we use ggplot() to plot the timeseries of Earnings per Share for the different companies. stocks.ratios %>% filter(section=="Growth") %>% unnest() %>% filter(sub.section=="EPS %",category=="Year over Year") %>% ggplot(aes(x=date,y=value,color=symbol)) + geom_line(lwd=1.1) + labs(title="Year over Year EPS in %", x="",y="") + theme_tq() + scale_color_tq() Year over Year EPS in % 300 200 100 0 -100 2010 2012 symbol 2014 2016 AAPL AMZN MSFT AMGN CSCO NVDA 2018 A variety of other (professional) data services are available, that are integrated into tidyquant which I will list in the following subsections: Managing Data 0.2.1.1.1 xlv Quandl Quandl32 provides access to many different financial and economic databases. To use it, one should acquire an api key by creating a Quandl account.33 Searches can be done using quandl_search() (I personally would use their homepage to do that). Data can be downloaded as before with tq_get(), be aware that you can download either single timeseries or entire datatables with the arguments get = "quandl" and get = "quandl.datatable". Note that in the example for ‘Apple’ below, the adjusted close prices are different from the ones of Yahoo. An example for a datatable is Zacks Fundamentals Collection B34 . quandl_api_key("enter-your-api-key-here") quandl_search(query = "Oil", database_code = "NSE", per_page = 3) quandl.aapl <- c("WIKI/AAPL") %>% tq_get(get = "quandl", = "2000-01-01", from to = "2017-12-31", column_index = 11, # numeric column number (e.g. 1) collapse = "daily", # can be “none”, “daily”, “weekly”, “mon transform = "none") # for summarizing data: “none”, “diff”, ## ## ## ## ## ## ## ## 32 Oil India Limited Code: NSE/OIL Desc: Historical prices for Oil India Limited<br><br>National Stock Exchan Freq: daily Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur Oil Country Tubular Limited Code: NSE/OILCOUNTUB https://www.quandl.com/ If you do not use an API key, you are limited to 50 calls per day. 34 https://www.quandl.com/databases/ZFB/documentation/about 33 xlvi List of Figures ## ## ## ## ## ## ## ## ## Desc: Historical prices for Oil Country Tubular Limited<br><br>National St Freq: daily Cols: Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur ## ## ## ## ## ## ## ## ## # A tibble: 3 x 13 id dataset_code database_code name description refreshed_at * <int> <chr> <chr> <chr> <chr> <chr> 1 6668 OIL NSE Oil ~ Historical~ 2018-09-13T~ 2 6669 OILCOUNTUB NSE Oil ~ Historical~ 2018-09-13T~ 3 6041 ESSAROIL NSE Essa~ Historical~ 2016-02-09T~ # ... with 7 more variables: newest_available_date <chr>, # oldest_available_date <chr>, column_names <list>, frequency <chr>, # type <chr>, premium <lgl>, database_id <int> ## ## ## ## ## ## ## ## ## ## # A tibble: 5 x 12 date open high low close volume ex.dividend split.ratio <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 2000-01-03 105. 112. 102. 112. 4.78e6 0 1 2 2000-01-04 108. 111. 101. 102. 4.57e6 0 1 3 2000-01-05 104. 111. 103 104 6.95e6 0 1 4 2000-01-06 106. 107 95 95 6.86e6 0 1 5 2000-01-07 96.5 101 95.5 99.5 4.11e6 0 1 # ... with 4 more variables: adj.open <dbl>, adj.high <dbl>, # adj.low <dbl>, adj.close <dbl> Essar Code: Desc: Freq: Cols: 0.2.1.1.2 Oil Limited NSE/ESSAROIL Historical prices for Essar Oil Limited<br><br>National Stock Exchan daily Date | Open | High | Low | Last | Close | Total Trade Quantity | Tur Alpha Vantage Alpha Vantage35 provides access to a real-time and historical financial data. Here we also need to get and set an api key (for free). 35 https://www.alphavantage.co Managing Data xlvii av_api_key("enter-your-api-key-here") alpha.aapl <- c("AAPL") %>% tq_get(get = "alphavantager", av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data alpha.aapl.id <- c("AAPL") %>% tq_get(get = "alphavantager", av_fun="TIME_SERIES_INTRADAY", # for intraday data interval="5min") # 5 minute intervals ## ## ## ## ## ## ## ## ## # A tibble: 5 x 9 timestamp open high low close adjusted_close <date> <dbl> <dbl> <dbl> <dbl> <dbl> 1 2018-04-24 166. 166. 161. 163. 162. 2 2018-04-25 163. 165. 162. 164. 162. 3 2018-04-26 164. 166. 163. 164. 163. 4 2018-04-27 164 164. 161. 162. 161. 5 2018-04-30 162. 167. 162. 165. 164. # ... with 1 more variable: split_coefficient <dbl> ## ## ## ## ## ## ## ## # A tibble: 5 x 6 timestamp open high low close volume <dttm> <dbl> <dbl> <dbl> <dbl> <int> 1 2018-09-11 14:25:00 224. 224. 224. 224. 261968 2 2018-09-11 14:30:00 224. 224. 224. 224. 334069 3 2018-09-11 14:35:00 224. 224. 224. 224. 285138 4 2018-09-11 14:40:00 224. 224. 224. 224. 229329 5 2018-09-11 14:45:00 224. 224. 224. 224. 193316 0.2.1.1.3 FRED (Economic Data) A large quantity of economic data can be extracted from the Federal Reserve Economic Data (FRED) database36 . Below we download the 1M- and 3M- risk-free-rate for the US. Note that these are annualized rates! 36 https://fred.stlouisfed.org/ volume dividend_amount <int> <dbl> 3.37e7 0 2.84e7 0 2.80e7 0 3.57e7 0 4.24e7 0 xlviii List of Figures ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>% group_by(symbol) ## ## ## ## ## ## ## ## ## ## # A tibble: 6 x 3 # Groups: symbol [2] symbol date price <chr> <date> <dbl> 1 TB1YR 2018-08-01 2.36 2 TB1YR 2018-07-01 2.31 3 TB1YR 2018-06-01 2.25 4 TB3MS 2018-08-01 2.03 5 TB3MS 2018-07-01 1.96 6 TB3MS 2018-06-01 1.9 0.2.1.1.4 OANDA (Exchange Rates and Metal Prices) Oanda37 provides a large quantity of exchange rates (currently only for the last 180 days). Enter them as currency pairs using “/” notation (e.g “EUR/USD”), and set get = "exchange.rates". Note that most of the data (having a much larger horizon) is also available on FRED. eur_usd <- tq_get("EUR/USD", get = "exchange.rates", from = Sys.Date() - lubridate::days(10)) plat_price_eur <- tq_get("plat", get = "metal.prices", from = Sys.Date() - lubridate::days(10), base.currency = "EUR") eur_usd %>% arrange(desc(date)) %>% slice(1:3) ## # A tibble: 3 x 2 37 https://www.oanda.com Managing Data xlix ## date exchange.rate ## <date> <dbl> ## 1 2018-09-12 1.16 ## 2 2018-09-11 1.16 ## 3 2018-09-10 1.16 plat_price_eur %>% arrange(desc(date)) %>% slice(1:3) ## ## ## ## ## ## # A tibble: 3 x 2 date price <date> <dbl> 1 2018-09-12 681. 2 2018-09-11 681. 3 2018-09-10 680. 0.2.1.1.5 Bloomberg and Datastream Bloomberg is officially integrated into the tidyquant-package, but one needs to have Bloomberg running on the terminal one is using. Datastream is not integrated but has a nice R-Interface in the package rdatastream38 . However, you need to have the Thomson Dataworks Enterprise SOAP API (non free)39 licensed, then the package allows for convienient retrieval of data. If this is not the case, then you have to manually retrieve your data, save it as “.xlsx” Excel-file that we can import using readxl::read_xlsx() from the readxl-package. 0.2.1.1.6 Fama-French Data (Kenneth French’s Data Library) To download Fama-French data in batch there is a package 38 39 https://github.com/fcocquemas/rdatastream http://dataworks.thomson.com/Dataworks/Enterprise/1.0/ l List of Figures FFdownload that I updated and that now can be installed via devtools::install_bitbucket("sstoeckl/FFdownload"). Currently you can either download all data or skip the (large) daily files using the command exclude_daily=TRUE. The result is a list of data.frames that has to be cleaned somehow but nonetheless is quite usable. FFdownload(output_file = "FFdata.RData", # output file for the final tempdir = NULL, # where should the temporary downloads go to (cre exclude_daily = TRUE, # exclude daily data download = FALSE) # if false, data already in the temp-directory load(file = "FFdata.RData") factors <- FFdownload$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% tk_tbl(rename_index="date") %>% # make tibble mutate(date=as.Date(date, frac=1)) %>% # make proper month-end dat gather(key=FFvar,value = price,-date) # gather into tidy format factors %>% group_by(FFvar) %>% slice(1:2) ## ## ## ## ## ## ## ## ## ## ## ## # A tibble: 8 x 3 # Groups: FFvar [4] date FFvar price <date> <chr> <dbl> 1 1926-07-31 HML -2.87 2 1926-08-31 HML 4.19 3 1926-07-31 Mkt.RF 2.96 4 1926-08-31 Mkt.RF 2.64 5 1926-07-31 RF 0.22 6 1926-08-31 RF 0.25 7 1926-07-31 SMB -2.3 8 1926-08-31 SMB -1.4 0.2.1.2 Manipulate Data A variety of transformations can be applied to (financial) timeseries data. We will present some examples merging together our Managing Data li stock file with the index, the risk free rate from FRED and the Fama-French-Factors. Doing data transformations in tidy datasets is either called a transmute (change variable/dataset, only return calculated column) or a mutate() (add transformed variable). In the tidyquant-package these functions are called tq_transmute and tq_mutate, because they simultaneously allow changes of periodicity (daily to monthly) and therefore the returned dataset can have less rows than before. The core of these functions is the provision of a mutate_fun that can come from the the xts/zoo, quantmod (Quantitative Financial Modelling & Trading Framework for R40 ) and TTR (Technical Trading Rules41 ) packages. In the examples below, we show how to change the periodicity of the data (where we keep the adjusted close price and the volume information) and calculate monthly log returns for the ten stocks and the index. We then merge the price and return information for each stock, and at each point in time add the return of the S&P500 index and the 3 Fama-French-Factors. stocks.prices.monthly <- stocks.prices %>% tq_transmute(select = c(adjusted,volume), # which column t mutate_fun = to.monthly, # funtion: make indexAt = "lastof") %>% # ‘yearmon’, ‘ye ungroup() %>% mutate(date=as.yearmon(date)) stocks.returns <- stocks.prices %>% tq_transmute(select = adjusted, mutate_fun = periodReturn, # create monthly period="monthly", type="arithmetic") %>% ungroup() %>% mutate(date=as.yearmon(date)) index.returns <- index.prices %>% tq_transmute(select = adjusted,mutate_fun = periodReturn, period="monthly", type="arithmetic") %>% 40 41 https://www.quantmod.com/ https://www.rdocumentation.org/packages/TTR/ lii List of Figures mutate(date=as.yearmon(date)) factors.returns <- factors %>% mutate(price=price/100) %>% # already is mon mutate(date=as.yearmon(date)) stocks.prices.monthly %>% ungroup() %>% slice(1:5) # show first 5 entries ## ## ## ## ## ## ## ## # A tibble: 5 x 4 symbol date adjusted volume <chr> <S3: yearmon> <dbl> <dbl> 1 AAPL Jan 2000 3.28 175420000 2 AAPL Feb 2000 3.63 92240400 3 AAPL Mrz 2000 4.30 101158400 4 AAPL Apr 2000 3.93 62395200 5 AAPL Mai 2000 2.66 108376800 stocks.returns %>% ungroup() %>% slice(1:5) ## ## ## ## ## ## ## ## # show first 5 entries # A tibble: 5 x 3 symbol date monthly.returns <chr> <S3: yearmon> <dbl> 1 AAPL Jan 2000 -0.0731 2 AAPL Feb 2000 0.105 3 AAPL Mrz 2000 0.185 4 AAPL Apr 2000 -0.0865 5 AAPL Mai 2000 -0.323 index.returns %>% ungroup() %>% slice(1:5) ## # A tibble: 5 x 2 ## date monthly.returns ## <S3: yearmon> <dbl> # show first 5 entries Managing Data ## ## ## ## ## 1 2 3 4 5 Jan Feb Mrz Apr Mai 2000 2000 2000 2000 2000 liii -0.0418 -0.0201 0.0967 -0.0308 -0.0219 factors.returns %>% ungroup() %>% slice(1:5) ## ## ## ## ## ## ## ## # A tibble: 5 x date <S3: yearmon> 1 Jul 1926 2 Aug 1926 3 Sep 1926 4 Okt 1926 5 Nov 1926 # show first 5 entries 3 FFvar price <chr> <dbl> Mkt.RF 0.0296 Mkt.RF 0.0264 Mkt.RF 0.0036 Mkt.RF -0.0324 Mkt.RF 0.0253 Now, we merge all the information together ## ## ## ## ## ## ## ## ## # A tibble: 5 x 10 symbol date return adjusted volume <chr> <S3:> <dbl> <dbl> <dbl> 1 AAPL Jan ~ -0.0731 3.28 1.75e8 2 AAPL Feb ~ 0.105 3.63 9.22e7 3 AAPL Mrz ~ 0.185 4.30 1.01e8 4 AAPL Apr ~ -0.0865 3.93 6.24e7 5 AAPL Mai ~ -0.323 2.66 1.08e8 # ... with 1 more variable: RF <dbl> sp500 Mkt.RF SMB HML <dbl> <dbl> <dbl> <dbl> -0.0418 -0.0474 0.0505 -0.0045 -0.0201 0.0245 0.221 -0.106 0.0967 0.052 -0.173 0.0794 -0.0308 -0.064 -0.0771 0.0856 -0.0219 -0.0442 -0.0501 0.0243 Now we can calculate and add additional information, such as the MACD (Moving Average Convergence/Divergence42 ) and its driving signal. Be aware, that you have to group_by symbol, or the signal would just be calculated for one large stacked timeseries: 42 https://en.wikipedia.org/wiki/MACD liv List of Figures stocks.final %>% group_by(symbol) %>% tq_mutate(select = adjusted, mutate_fun = MACD, col_rename = c("MACD", "Signal")) %>% select(symbol,date,adjusted,MACD,Signal) %>% tail() # show last part of the dataset ## ## ## ## ## ## ## ## ## ## # A tibble: 6 x 5 # Groups: symbol [1] symbol date adjusted <chr> <dbl> <dbl> 1 GILD 2018. 73.4 2 GILD 2018. 80.8 3 GILD 2018. 78.7 4 GILD 2018. 72.8 5 GILD 2018. 72.6 6 GILD 2018. 70.0 MACD Signal <dbl> <dbl> -5.40 -4.38 -3.86 -4.27 -2.85 -3.99 -2.68 -3.73 -2.52 -3.49 -2.66 -3.32 save(stocks.final,file="stocks.RData") 0.2.1.2.1 Rolling functions One of the most important functions you will need in reality is the possibility to perform a rolling analysis. One example would be a rolling regression to get time varying α and β of each stock with respect to the index or the Fama-French-Factors. To do that we need to create a function that does everything we want in one step: regr_fun <- function(data,formula) { coef(lm(formula, data = timetk::tk_tbl(data, silent = TRUE))) } Managing Data lv This function takes a dataset and a regression formula as input, performs a regression and returns the coefficients, as well as the residual standard deviation and the respective R2 Step 2: Create a custom function Next, create a custom regression function, which will be used to apply over the rolling window in Step 3. An important point is that the “data” will be passed to the regression function as an xts object. The timetk::tk_tbl function takes care of converting to a data frame for the lm function to work properly with the columns “fb.returns” and “xlk.returns”. regr_fun <- function(data) { coef(lm(fb.returns ~ xlk.returns, data = timetk::tk_tbl(data, silent = TRUE))) } Step 3: Apply the custom function Now we can use tq_mutate() to apply the custom regression function over a rolling window using rollapply from the zoo package. Internally, since we left select = NULL, the returns_combined data frame is being passed automatically to the data argument of the rollapply function. All you need to specify is the mutate_fun = rollapply and any additional arguments necessary to apply the rollapply function. We’ll specify a 12 week window via width = 12. The FUN argument is our custom regression function, regr_fun. It’s extremely important to specify by.column = FALSE, which tells rollapply to perform the computation using the data as a whole rather than apply the function to each column independently. The col_rename argument is used to rename the added columns. returns_combined %>% tq_mutate(mutate_fun = rollapply, width = 12, FUN = regr_fun, by.column = FALSE, col_rename = c(“coef.0”, “coef.1”)) As shown above, the rolling regression coefficients were added to the data frame. Also check out the functionality of tibbletime for that task (rollify)! lvi 0.3 List of Figures Exploring Data In this chapter we show how to explore and analyze data using the dataset created in Chapter @ref(#s_2Data): load("stocks.RData") glimpse(stocks.final) ## ## ## ## ## ## ## ## ## ## ## ## Observations: 2,160 Variables: 10 $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL... $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2... $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0.... $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3... $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ... $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756... $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0... $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0... $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0.... $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004... stocks.final %>% slice(1:2) ## ## ## ## ## ## # A tibble: 2 x 10 symbol date return adjusted volume sp500 Mkt.RF SMB HML <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106 # ... with 1 more variable: RF <dbl> Exploring Data lvii 0.3.1 Plotting Data In this chapter we show how to create various graphs of financial timeseries and their properties, which should help us to get a better understanding of their properties, before we go on to calculate and test their statistics. 0.3.1.1 Time-series plots 0.3.1.2 Box-plots 0.3.1.3 Histogram and Density Plots 0.3.1.4 Quantile Plots 0.3.2 Analyzing Data 0.3.2.1 Calculating Statistics 0.3.2.2 Testing Data 0.3.2.3 Exposure to Factors The stocks in our example all have a certain exposure to risk factors (e.g. the Fama-French-factors we have added to our dataset). Let us specify these exposures by regression each stocks return on the factors Mkt.RF, SMB and HML: stocks.factor_exposure <- stocks.final %>% nest(-symbol) %>% mutate(model = map(data, ~ lm(return ~ Mkt.RF + SMB + HML, data= .x)), tidied = map(model, tidy)) %>% unnest(tidied, .drop=TRUE) %>% filter(term != "(Intercept)") %>% select(symbol,term,estimate) %>% spread(term,estimate) %>% select(symbol,Mkt.RF,SMB,HML) lviii 0.4 List of Figures Managing Portfolios In this chapter we show how to explore and analyze data using the dataset created in Chapter @ref(#s_2Data): At first we will learn how to full-sample optimize portfolios, then (in the next chapters) we will do the same thing in a rolling analysis and also perform some backtesting. The major workhorse of this chapter is the portfolioAnalytics-package developed by Peterson and Carl (2018). portfolioAnalytics comes with an excellent introductory vignette vignette("portfolio_vignette") and includes more documents, detailing on the use of ROI-solvers vignette("ROI_vignette"), how to create custom moment functions vignette("custom_moments_objectives") and how to introduce CVaR-budgets vignette("risk_budget_optimization"). 0.4.1 Introduction SHORT INTRODUCTION TO PORTFOLIOMANAGEMENT We start by first creating a portfolio object, before we… 0.4.1.1 The portfolio.spec() Object The portfolio object is a so-called S3-object43 , which means, that it has a certain class (portfolio) describing its properties, behavior and relation to other objects. Usually such an objects comes with a variety of methods. To create such an object, we reuse the stock data set that we have created in Chapter @ref(#s_2Data): 43 http://adv-r.had.co.nz/S3.html Managing Portfolios lix load("stocks.RData") glimpse(stocks.final) ## ## ## ## ## ## ## ## ## ## ## ## Observations: 2,160 Variables: 10 $ symbol <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL... $ date <S3: yearmon> Jan 2000, Feb 2000, Mrz 2000, Apr 2000, Mai 2... $ return <dbl> -0.07314358, 0.10481916, 0.18484202, -0.08651613, -0.... $ adjusted <dbl> 3.283894, 3.628109, 4.298736, 3.926826, 2.658767, 3.3... $ volume <dbl> 175420000, 92240400, 101158400, 62395200, 108376800, ... $ sp500 <dbl> -0.041753145, -0.020108083, 0.096719828, -0.030795756... $ Mkt.RF <dbl> -0.0474, 0.0245, 0.0520, -0.0640, -0.0442, 0.0464, -0... $ SMB <dbl> 0.0505, 0.2214, -0.1728, -0.0771, -0.0501, 0.1403, -0... $ HML <dbl> -0.0045, -0.1057, 0.0794, 0.0856, 0.0243, -0.1010, 0.... $ RF <dbl> 0.0041, 0.0043, 0.0047, 0.0046, 0.0050, 0.0040, 0.004... stocks.final %>% slice(1:2) ## ## ## ## ## ## # A tibble: 2 x 10 symbol date return adjusted volume sp500 Mkt.RF SMB HML <chr> <S3:> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 AAPL Jan ~ -0.0731 3.28 1.75e8 -0.0418 -0.0474 0.0505 -0.0045 2 AAPL Feb ~ 0.105 3.63 9.22e7 -0.0201 0.0245 0.221 -0.106 # ... with 1 more variable: RF <dbl> For the portfolioAnalytics-package we need our data in xtsformat (see @ref(#sss_112xts)) and therefore first spread the dataset returns in columns of stocks and the convert to xts using tk_xts() from the timetk-package. lx List of Figures returns <- stocks.final %>% select(symbol,date,return) %>% spread(symbol,return) %>% tk_xts(silent = TRUE) Now its time to initialize the portfolio.spec() object passing along the names of our assets. Afterwards we print the object (most S3 obejcts come with a printing methods that nicely displays some nice information). pspec <- portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks.selection$sector) print(pspec) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ************************************************** PortfolioAnalytics Portfolio Specification ************************************************** Call: portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks. Number of assets: 10 Asset Names [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD" Category Labels Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM Consumer Discretionary : AMZN Health Care : AMGN GILD str(pspec) Managing Portfolios lxi ## List of 6 ## $ assets : Named num [1:10] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0 ## ..- attr(*, "names")= chr [1:10] "AAPL" "MSFT" "AMZN" "CSCO" ... ## $ category_labels:List of 3 ## ..$ Information Technology: int [1:7] 1 2 4 5 6 8 9 ## ..$ Consumer Discretionary: int 3 ## ..$ Health Care : int [1:2] 7 10 ## $ weight_seq : NULL ## $ constraints : list() ## $ objectives : list() ## $ call : language portfolio.spec(assets = stocks.selection$symb ## - attr(*, "class")= chr [1:2] "portfolio.spec" "portfolio" Checking the structure of the object str() we find that it contains several elements: assets which contains the asset names and initial weights that are equally distributed unless otherwise specified (e.g. portfolio.spec(assets=c(0.6,0.4))), category_labels to categorize assets by sector (or geography etc.), weight_seq (sequence of weights for later use by random_portfolios), constraints that we will set soon, objectives and the call that initialised the object. Before we go and optimize any portfolio we will show how to set contraints. 0.4.1.2 Constraints Constraints define restrictions and boundary conditions on the weights of a portfolio. Constraints are defined by add.constraint specifying certain types and arguments for each type, as well as whether the constraint should be enabled or not (enabled=TRUE is the default). 0.4.1.2.1 Sum of Weights Constraint Here we define how much of the available budget can/must be invested by specifying the maximum/minimum sum of portfolio lxii List of Figures weights. Usually we want to invest our entire budget and therefore set type="full_investment" which sets the sum of weights to 1. ALternatively we can set the type="weight_sum" to have mimimum/maximum weight_sum equal to 1. pspec <- add.constraint(portfolio=pspec, type="full_investment") # print(pspec) # pspec <- add.constraint(portfolio=pspec,type="weight_sum", min_sum=1, max Another common constraint is to have the portfolio dollar-neutral type="dollar_neutral" (or equivalent formulations specified below) # # # # # pspec <- add.constraint(portfolio=pspec, type="dollar_neutral") print(pspec) pspec <- add.constraint(portfolio=pspec, type="active") pspec <- add.constraint(portfolio=pspec, type="weight_sum", min_sum=0, ma 0.4.1.2.2 Box Constraint Box constraints specify upper and lower bounds on the asset weights. If we pass min and max as scalars then the same max and min weights are set per asset. If we pass vectors (that should be of the same length as the number of assets) we can specify position limits on individual stocks pspec <- add.constraint(portfolio=pspec, type="box", min=0, Managing Portfolios lxiii max=0.4) # print(pspec) # add.constraint(portfolio=pspec, # type="box", # min=c(0.05, 0, rep(0.05,8)), # max=c(0.4, 0.3, rep(0.4,8))) Another special type of box constraints are long-only constraints, where we only allow positive weights per asset. These are set automatically, if no min and max are set or when we use type="long_only" # pspec <- add.constraint(portfolio=pspec, type="box") # pspec <- add.constraint(portfolio=pspec, type="long_only") 0.4.1.2.3 Group Constraints Group constraints allow the user to specify constraints per groups, such as industries, sectors or geography.44 These groups can be randomly defined, below we will set group constraints for the sectors as specified above. The input arguments are the following: groupslist of vectors specifying the groups of the assets, group_labels character vector to label the groups (e.g. size, asset class, style, etc.), group_min and group_max specifying minimum and maximum weight per group, group_pos to specifying the number of non-zero weights per group (optional). pspec <- add.constraint(portfolio=pspec, type="group", 44 Note, that only the ROI, DEoptim and random portfolio solvers support group constraints. See also @(#sss_4solvers). lxiv List of Figures groups=list(pspec$category_labels$`Information Techno pspec$category_labels$`Consumer Discretionar pspec$category_labels$`Health Care`), group_min=c(0.1, 0.15,0.1), group_max=c(0.85, 0.55,0.4), group_labels=pspec$category_labels) # print(pspec) 0.4.1.2.4 Position Limit Constraint The position limit constraint allows the user to specify limits on the number of assets with non-zero, long, or short positions. Its arguments are: max_pos which defines the maximum number of assets with non-zero weights and max_pos_long/ max_pos_short that specify the maximum number of assets with long (i.e. buy) and short (i.e. sell) positions.45 pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos=3) # pspec <- add.constraint(portfolio=pspec, type="position_limit", max_pos_l # print(pspec) 0.4.1.2.5 Diversification Constraint The diversification constraint enables to set a minimum diversification limit by penalizing the optimizer if the deviation is larger 45 Note that not all solvers suüpport the different options. All of them are supported by the DEoptim and random portfolio solvres, while no ROI solver supports this type of constraint. The ROI solvers do not support the long/short position limit constraintsm, and (only) quadprog allows for the max_pos argument. Managing Portfolios lxv ∑ 2 46 than 5%. Diversification is defined as N Its i=1 wi for N assets. only argument is the diversification taregt div_target. pspec <- add.constraint(portfolio=pspec, type="diversification", div_target= # print(pspec) 0.4.1.2.6 Turnover Constraint The turnover constraint allows to specify a maximum turnover from a set of initial weights that can either be given or are the weights initially specified for the portfolio object. It is also implemented as an optimization penalty if the turnover deviates more than 5% from the turnover_target.47 pspec <- add.constraint(portfolio=pspec, type="turnover", turnover_target=0. # print(pspec) 0.4.1.2.7 Target Return Constraint The target return constraint allows the user to target an average return specified by return_target. pspec <- add.constraint(portfolio=pspec, type="return", return_target=0.007) # print(pspec) 46 Note that diversification constraint is only supported for the global numeric solvers (not the ROI solvers). 47 Note, that the turnover constraint is not currently supported using the ROI solver for quadratic utility and minimum variance problems. lxvi 0.4.1.2.8 List of Figures Factor Exposure Constraint The factor exposure constraint allows the user to set upper and lower bounds on exposures to risk factors. We will use the factor exposures that we have calculated in @(#sss_3FactorExposure). The major input is a vector or matrix B and upper/lower bounds for the portfolio factor exposure. If B is a vector (with length equal to the number of assets), lower and upper bounds must be scalars. If B is a matrix, the number of rows must be equal to the number of assets and the number of columns represent the number of factors. In this case, the length of lower and upper bounds must be equal to the number of factors. B should have column names specifying the factors and row names specifying the assets. B <- stocks.factor_exposure %>% as.data.frame() %>% column_to_rownames("symb pspec <- add.constraint(portfolio=pspec, type="factor_exposure", B=B, lower=c(0.8,0,-1), upper=c(1.2,0.8,0)) # print(pspec) 0.4.1.2.9 Transaction Cost Constraint The transaction cost constraint enables the user to specify (porportional) transaction costs.48 Here we will assume the proportional transation cost ptc to be equal to 1%. pspec <- add.constraint(portfolio=pspec, type="transaction_cost", ptc=0.01) # print(pspec) 48 For usage of the ROI (quadprog) solvers, transaction costs are currently only supported for global minimum variance and quadratic utility problems. Managing Portfolios lxvii 0.4.1.2.10 Leverage Exposure Constraint The leverage exposure constraint specifies a maximum level of leverage. Below we set leverage to 1.3 to create a 130/30 portfolio. pspec <- add.constraint(portfolio=pspec, type="leverage_exposure", leverage= # print(pspec) 0.4.1.2.11 Checking and en-/disabling constraints Every constraint that is added to the portfolio object gets a number according to the order it was set. If one wants to update (enable/disable) a specific constraints this can be done by the indexnum argument. summary(pspec) # To get an overview on the specs, their indexnum and whether they are enab consts <- plyr::ldply(pspec$constraints, function(x){c(x$type,x$enabled)}) consts pspec$constraints[[which(consts$V1=="box")]] pspec <- add.constraint(pspec, type="box", min=0, max=0.5, indexnum=which(consts$V1=="box")) pspec$constraints[[which(consts$V1=="box")]] # to disable constraints pspec$constraints[[which(consts$V1=="position_limit")]] pspec <- add.constraint(pspec, type="position_limit", enable=FALSE, # only s indexnum=which(consts$V1=="position_limit")) pspec$constraints[[which(consts$V1=="position_limit")]] lxviii List of Figures 0.4.1.3 Objectives For an optimal portfolio there first has to be specified what optimal in terms of the relevant (business) objective. Such objectives (target functions) can be added to the portfolio object with add.objective. With this function, the user can specify the type of objective to add to the portfolio object. Currently available are ‘return’, ‘risk’, ‘risk budget’, ‘quadratic utility’, ‘weight concentration’, ‘turnover’ and ‘minmax’. Each type of objective has additional arguments that need to be specified. Several types of objectives can be added and enabled or disabled by specifying the indexnum argument. 0.4.1.3.1 Portfolio Risk Objective Here, the user can specify a risk function that should be minimized. We start by adding a risk objective to minimize portfolio variance (minimum variance portfolio). Another example could be the expected tail loss with a confidence level 0.95. Whatever function (even user defined ones are possble, the name must correspond to a function in R), necessary additional arguments to the function have to be passed as a named list to arguments. Possible functions are: pspec <- add.objective(portfolio=pspec, type='risk', name='var') pspec <- add.objective(portfolio=pspec, type='risk', name='ETL', arguments=list(p=0.95), enabled=FALSE) # print(pspec) Managing Portfolios 0.4.1.3.2 lxix Portfolio Return Objective The return objective allows the user to specify a return function to maximize. Here we add a return objective to maximize the portfolio mean return. pspec <- add.objective(portfolio=pspec, type='return', name='mean') # print(pspec) 0.4.1.3.3 Portfolio Risk Budget Objective The portfolio risk objective allows the user to specify constraints to minimize component contribution (i.e. equal risk contribution) or specify upper and lower bounds on percentage risk contribution. Here we specify that no asset can contribute more than 30% to total portfolio risk. See the risk budget optimization vignette for more detailed examples of portfolio optimizationswith risk budgets. pspec <- add.objective(portfolio=pspec, type="risk_budget", name="var", max_prisk=0.3) pspec <- add.objective(portfolio=pspec, type="risk_budget", name="ETL", arguments=list(p=0.95), max_prisk=0.3, enabled=FALSE) lxx List of Figures # for an equal risk contribution portfolio, set min_concentration=TRUE pspec <- add.objective(portfolio=pspec, type="risk_budget", name="ETL", arguments=list(p=0.95), min_concentration=TRUE, enabled=FALSE) print(pspec) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ************************************************** PortfolioAnalytics Portfolio Specification ************************************************** Call: portfolio.spec(assets = stocks.selection$symbol, category_labels = stocks. Number of assets: 10 Asset Names [1] "AAPL" "MSFT" "AMZN" "CSCO" "NVDA" "ORCL" "AMGN" "ADBE" "QCOM" "GILD" Category Labels Information Technology : AAPL MSFT CSCO NVDA ORCL ADBE QCOM Consumer Discretionary : AMZN Health Care : AMGN GILD Constraints Enabled constraint types - full_investment - box - group - position_limit - diversification - turnover - return - factor_exposure Managing Portfolios lxxi ## - transaction_cost ## - leverage_exposure ## ## Objectives: ## Enabled objective names ## - var ## - mean ## - var ## Disabled objective names ## - ETL 0.4.1.4 Solvers Solvers are the workhorse of our portfolio optimization framework, and there are a variety of them available to us through the portfolioAnalytics-package. I will briefly introduce the available solvers. Note that these solvers can be specified through optimize_method in the optimize.portfolio and optimize.portfolio.rebalancing method. 0.4.1.4.1 DEOptim This solver comes from the R package DEoptim and is a differential evolution algorithm (a global stochastic optimization algorithm) developed by Ardia et al. (2016). The help on ?DEoptim gives many more references. There is also a nice vignette("DEoptimPortfolioOptimization") on large scale portfolio optimization using the portfolioAnalytics-package. 0.4.1.4.2 Random Portfolios There are three methods to generate random portfolios contained in portfolioAnalytics: lxxii List of Figures 1. The most flexible but also slowest method is ‘sample’. It can take leverage, box, group, and position limit constraints into account. 2. The ‘simplex’ method is useful to generate random portfolios with the full investment and min box constraints (values for min_sum/ max_sum are ignored). Other constraints (box max, group and position limit constraints will be handled by elimination) which might leave only very few feasible portfolios. Sometimes it will lalso lead to suboptimal solutions. 3. Using grid search, the ‘grid’ method only satisfies the min and max box constraints. 0.4.1.4.3 pso The psoptim function from the R package pso (Bendtsen., 2012) and uses particle swarm optimization. 0.4.1.4.4 GenSA The GenSA function from the R package GenSA (Gubian et al., 2018) and is based on generalized simmulated annealing (a generic probabilistic heuristic optimization algorithm) 0.4.1.4.5 ROI The ROI (R Optimization Infrastructure) is a framework to handle optimization problems in R. It serves as an interface to the Rglpk package and the quadprog package which solve linear and quadratic programming problems. Available methods in the context of the portfolioAnalytics-package are given below (see section @(#sss_4Objectives) for available objectives. 1. Maxmimize portfolio return subject leverage, box, group, Managing Portfolios 2. 3. 4. 5. lxxiii position limit, target mean return, and/or factor exposure constraints on weights. Globally minimize portfolio variance subject to leverage, box, group, turnover, and/or factor exposure constraints. Minimize portfolio variance subject to leverage, box, group, and/or factor exposure constraints given a desired portfolio return. Maximize quadratic utility subject to leverage, box, group, target mean return, turnover, and/or factor exposure constraints and risk aversion parameter. (The risk aversion parameter is passed into optimize.portfolio as an added argument to the portfolio object). Minimize ETL subject to leverage, box, group, position limit, target mean return, and/or factor exposure constraints and target portfolio return. 0.4.2 Mean-variance Portfolios 0.4.2.1 Introduction and Theoretics 0.4.2.1.1 The minimum risk mean-variance portfolio 0.4.2.1.2 Feasible Set and Efficient Frontier 0.4.2.1.3 Minimum variance portfolio 0.4.2.1.4 Capital market line and tangency portfolio 0.4.2.1.5 Box and Group Constrained mean-variance portfolios lxxiv List of Figures 0.4.2.1.6 Maximum return mean-variance portfolio 0.4.2.1.7 Covariance risk budget constraints 0.4.3 Mean-CVaR Portfolios 0.5 Managing Portfolios in the Real World 0.5.1 Rolling Portfolios 0.5.2 Backtesting 0.6 Further applications in Finance 0.6.1 Portfolio Sorts 0.6.2 Fama-MacBeth-Regressions 0.6.3 Risk Indices 0.7 References # Appendix{#s_99Appendix} .0.1 Introduction to R For everyone that is more interested in all the topics I strongly recommend this eBook: R for Data Science49 49 http://r4ds.had.co.nz/ References .0.1.1 lxxv Getting started Once you have started R, there are several ways to find help. First of all, (almost) every command is equipped with a help page that can be accessed via ?... (if the package is loaded). If the command is part of a package that is not loaded or you have no clue about the command itself, you can search the entire help (full-text) by using ??.... Be aware, that certain very-high level commands need to be put in quotation marks ?'function'. Many of the packages you find are either equipped with a demo() (get a list of all available demos using demo(package=.packages(all.available = TRUE))) and/or a vignette(), a document explaining the purpose of the package and demonstrating its work using suitable examples (find all available vignettes with vignette(package=.packages(all.available = TRUE))). If you want to learn how to do a certain task (e.g. conducting an event study vignette("eventstudies")50 ). Executing code in Rstudio is simple. Either you highlight the exact portion of the code that you want to execute and hit ctrl+enter, or you place the cursor just somewhere to execute this particular line of code with the same command.51 .0.1.2 Working directory Before we start to learn how to program, we have to set a working directory. First, create a folder “researchmethods” (preferably never use directory names containing special characters or empty spaces) somewhere on citrix/your laptop, this will be your working directory where R looks for code, files to load and saves everything that is not designated by a full path (e.g. “D:/R/LAB/SS2018/…”). Note: In contrast to windows paths you 50 If this command shows an error message you need to install the package first, see further down for how to do that. 51 Under certain circumstances - either using pipes or within loops - RStudio will execute the en tire loop/pipe structure. In this case you have to highlight the particular line that you want to execute. lxxvi List of Figures have to use either “/” instead of “” or use two”\“. Now set the working directory using setwd() and check with getwd() setwd("D:/R/researchmethods") getwd() .0.1.3 Basic calculations 3+5; 3-5; 3*5; 3/5 # More complex including brackets (5+3-1)/(5*10) # is different to 5+3-1/5*10 # power of a variable 4*4*4 4^300 # root of a variable sqrt(16) 16^(1/2) 16^0.5 # exponential and logarithms exp(3) log(exp(3)) exp(1) # Log to the basis of 2 log2(8) 2^log2(8) # raise the number of digits shown options(digits=6) exp(1) # Rounding 20/3 round(20/3,2) References lxxvii floor(20/3) ceiling(20/3) .0.1.4 Mapping variables Defining variables (objects) in R is done via the arrow operator <that works in both directions ->. Sometimes you will see someone use the equal sign = but for several (more complex) reasons, this is not advisable. n <- 10 n n <- 11 n 12 -> n n n <- n^2 n In the last case, we overwrite a variable recursively. You might want to do that for several reasons, but I advise you to rarely do that. The reason is that - depending on how often you have executed this part of the code already - n will have a different value. In addition, if you are checking the output of some calculation, it is not nice if one of the input variables always has a different value. In a next step, we will check variables. This is a very important part of programming. # check if m==10 m <- 11 m==10 # is equal to m==11 lxxviii m!=11 m>10 m<10 m<=11 m>=12 List of Figures # # # # # is is is is is not equal to larger than smaller than smaller or equal than larger or equal than If one wants to find out which variables are already set use ls(). Delete (Remove) variables using rm() (you sometimes might want to do that to save memory - in this case always follow the rm() command with gc()). ls() # list variables rm(m) # remove m ls() # list variables again (m is missing) Of course, often we do not only want to store numbers but also characters. In this case enclose the value by quotation marks: name <- "test". If you want to check whether a variable has a certain format use available commands starting with is.. If you want to change the format of a variable use as. name <- "test" is.numeric(n) is.numeric(name) is.character(n) is.character(name) If you do want to find out the format of a variable you can use class(). Slightly different information will be given by mode() and typeof() References lxxix class(n) class(name) mode(n) mode(name) typeof(n) typeof(name) # Lets change formats: n1 <- n is.character(n1) n1 <- as.character(n) is.character(n1) as.numeric(name) # New thing: NA Before we learn about NA, we have to define logical variables that are very important when programming (e.g., as options in a function). Logical (boolean) variables will either assume TRUE or FALSE. # last but not least we need boolean (logical) variables n2 <- TRUE is.numeric(n2) class(n2) is.logical(n2) as.logical(2) # all values except 0 will be converted to TRUE as.logical(0) Now we can check whether a condition holds true. In this case, we check if m is equal to 10. The output (as you have seen before) is of type logical. is.logical(n==10) n3 <- n==10 # we can assign the logical output to a new variable is.logical(n3) lxxx List of Figures Assignment: Create numeric variable x, set x equal to 5/3. What happens if you divide by 0? By Inf? Set y<-NA. What could this mean? Check if the variable is “na”. Is Inf numeric? Is NA numeric? .0.1.5 Sequences, vectors and matrices In this chapter, we are going to learn about higher-dimensional objects (storing more information than just one number). .0.1.5.1 Sequences We define sequences of elements (numbers/characters/logicals) via the concatenation operator c() and assign them to a variable. If one of the elements of a sequence is of type character, the whole sequence will be converted to character, else it will be of type numeric (for other possibilities check the help ?vector). At the same type it will be of the type vector. x <- c(1,3,5,6,7) class(x) is.vector(x) is.numeric(x) To create ordered sequences make use of the command seq(from,to,by). Please note that often programmers are lazy and just write seq(1,10,2) instead of seq(from=1,to=10,by=2). However it makes code much harder to understand, can produce unintended results, and if a function is changed (which happens as R is always under construction) yield something very different to what was intended. Therefore I strongly encourage you to always specify the arguments of a function by name. To do this I advise you to make use of the tab a lot. Tab helps you to complete commands, produces a list of different commands starting with the same letters (if you do not completely remember the spelling for example), helps you to find out about the arguments and even gives References lxxxi information about the intended/possible values of the arguments. A nice way and shortcut for creating ordered/regular sequences with distance (by=) one is given by the : operator: 1:10 is equal to seq(from=1,to=10,by=1). x1 <- seq(from=1,to=5,by=1) x2 <- 1:5 One can operate with sequences in the same way as with numbers. Be aware of the order of the commands and use brackets where necessary! 1:10-1 1:(10-1) 1:10^2-2 *3 Assignment: 1. Create a series from -1 to 5 with distances 0.5? Can you find another way to do it using the : operator and standard mathematical operations? 2. Create the same series, but this time using the “length”-option 3. Create 20 ones in a row (hint: find a function to do just that) Of course, all logical operations are possible for vectors, too. In this case, the output is a vector of logicals having the same size as the input vector. You can check if a condition is true for any() or all() parts of the vector. .0.1.5.2 Random Sequences One of the most important tasks of any programming language that is used for data analysis and research is the ability to generate random numbers. In R all the random number commands start with an r..., e.g. random normal numbers rnorm(). To find out more about the command use the help ?rnorm. All of these lxxxii List of Figures commands are a part of the stats package, where you find available commands using the package help: library(help=stats). Notice that whenever you generate random numbers, they are different. If you prefer to work with the same set of random numbers (e.g. for testing purposes) you can fix the starting value of the random number generator by setting the seed to a chosen number set.seed(123). Notice that you have to execute set.seed() every time before (re)using the random number generator. rand1 <- rnorm(n = 100) # 100 standard normally distributed random numbers set.seed(134) # fix the starting value of the random number generator (then rand1a <- rnorm(n = 100) Assignment: 1. Create a random sequence of 20 N(0,2)distributed variables and assign it to the variable rand2. 2. Create a random sequence of 200 Uniform(-1,1) distributed variables and save to rand3. 3. What other distributions can you find in the stats package? 4. Use the functions mean and sd. Manipulate the random variables to have a different mean and standard deviation. Do you remember the normalization process (z-score)? As in the last assignment you can use all the functions you learned about in statistics to calculate the mean(), the standard deviation sd(), skewness() and kurtosis() (the latter two after loading and installing the moments package). To install/load a package we use install.packages() (only once) and then load the package with require(). #install.packages("moments") # only once, no need to reinstall every time require(moments) mean(rand1a) sd(rand1a) skewness(rand1a) kurtosis(rand1a) summary(rand1a) References .0.1.6 lxxxiii Vectors and matrices We have created (random) sequences above and can determine their properties, such as their length(). We also know how to manipulate sequences through mathematical operations, such as +-*/^. If you want to calculate a vector product, R provides the %*% operator. In many cases (such as %*%) vectors behave like matrices, automating whether they should be row or column-vectors. However, to make this more explicit transform your vector into a matrix using as.matrix. Now, it has a dimension and the property matrix. You can transpose the matrix using t(), calculate its inverse using solve() and manipulate in any other way imaginable. To create matrices use matrix() and be careful about the available options! x <- c(2,4,5,8,10,12) length(x) dim(x) x^2/2-1 x %*% x # R automatically multiplies row and column vector is.vector(x) y <- as.matrix(x) is.matrix(y); is.matrix(x) dim(y); dim(x) t(y) %*% y y %*% t(y) mat <- matrix(data = x,nrow = 2,ncol = 3, byrow = TRUE) dim(mat); ncol(mat); nrow(mat) mat2 <- matrix(c(1,2,3,4),2,2) # set a new (quadratic) matrix mat2i <- solve(mat2) mat2 %*% mat2i mat2i %*% mat2 Assignment: 1. Create this matrix matrix(c(1,2,2,4),2,2) and try to calculate its inverse. What is the problem? Remember the determinant? Calculate using det(). What do you learn? lxxxiv List of Figures 2. Create a 4x3 matrix of ones and/or zeros. Try to matrixmultiply with any of the vectors/matrices used before. 3. Try to add/subtract/multiply matrices, vectors and scalars. A variety of special matrices is available, such as diagonal matrices using diag(). You can glue matrices together columnwise (cbind()) or rowwise (rbind()). diag(3) diag(c(1,2,3,4)) mat4 <- matrix(0,3,3) mat5 <- matrix(1,3,3) cbind(mat4,mat5) rbind(mat4,mat5) .0.1.6.1 The indexing system We can access the row/column elements of any object with at least one dimension using []. ######################################################## ### 8) The INDEXING System # We can access the single values of a vector/matrix x[2] # one-dim mat[,2] # two-dim column mat[2,] # two-dim row i <- c(1,3) mat[i] mat[1,2:3] # two-dim select second and third column, first row mat[-1,] # two-dim suppress first row mat[,-2] # two-dim suppress second column Now we can use logical vectors/matrices to subset vectors/matrices. This is very useful for data mining. References lxxxv mat>=5 # which elements are large or equal to 5? mat[mat>=5] # What are these elements? which(mat>=5, arr.ind = TRUE) # another way with more explicit information We can do something even more useful and name the rows and columns of a matrix usingcolnames() and rownames(). colnames(mat) <- c("a","b","c") rownames(mat) <- c("A","B") mat["A",c("b","c")] .0.1.7 Functions in R .0.1.7.1 Useful Functions Of course, there are thousands of functions available in R, especially through the use of packages. In the following you find a demo of the most useful ones. x <- c(1,2,4,-1,2,8) # example vector 1 x1 <- c(1,2,4,-1,2,8,NA,Inf) # example vector 2 (more complex) sqrt(x) # square root of x x^3 # x to the power of ... sum(x) # sum of the elements of x prod(x) # product of the elements of x max(x) # maximum of the elements of x min(x) # minimum of the elements of x which.max(x) # returns the index of the greatest element of x which.min(x) # returns the index of the smallest element of x # statistical function - use rand1 and rand2 created before range # returns the minimum and maximum of the elements of x mean # mean of the elements of x lxxxvi List of Figures median # median of the elements of x var # variance of the elements of x sd # standard deviation of the elements of x cor # correlation matrix of x cov # covariance between x and y cor # linear correlation between x and y # more complex functions round(x, n) # rounds the elements of x to n decimals rev(x) # reverses the elements of x sort(x) # sorts the elements of x in increasing order rank(x) # ranks of the elements of x log(x) # computes natural logarithms of x cumsum(x) # a vector which ith element is the sum from x[1] to x[i] cumprod(x) # id. for the product cummin(x) # id. for the minimum cummax(x) # id. for the maximum unique(x) # duplicate elements are suppressed .0.1.7.2 More complex objects in R Next to numbers, sequences/vectors and matrices R offers a variety of different and more complex objects that can stow more complex information than just numbers and characters (e.g. functions, output text. etc). The most important ones are data.frames (extended matrices) and lists. Check the examples below to see how to create these objects and how to access specific elements. df <- data.frame(col1=c(2,3,4), col2=sin(c(2,3,4)), col3=c("a","b", "c")) li <- list(x=c(2,3,4), y=sin(c(2,3,4)), z=c("a","b", "c","d","e"), fun=mean) # to grab elements from a list or dataframe use $ or [[]] df$col3; li$x # get variables df[,"col3"]; li[["x"]] # get specific elements that can also be numbered df[,3]; li[[1]] References lxxxvii Assignment: 1. Get the second entry of element y of list x .0.1.7.3 Create simple functions in R To create our own functions in R we need to give them a name, determine necessary input variables and whether these variables should be pre-specified or not. I use a couple of examples to show how to do this below. ?"function" # "function" is such a high-level object that it is interpreted # 1. Let's create a function that squares an entry x and name it square square <- function(x){x^2} square(5) square(c(1,2,3)) # 2. Let us define a function that returns a list of several different resu stats <- function(v){ v.m <- mean(v) # create a variable that is only valid in the function v.sd <- sd(v) v.var <- var(v) v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var) return(v.output) } v <- rnorm(1000,mean=1,sd=5) stats(v) stats(v)$Mean # 3. A function can have standard arguments. ### This time we also create a random vector within the function and use its stats2 <- function(n,m=0,s=1){ v <- rnorm(n,mean=m,sd=s) v.m <- mean(v) # create a variable that is only valid in the function v.sd <- sd(v) v.var <- var(v) lxxxviii List of Figures v.output <- list(Mean=v.m, StandardDeviation=v.sd, Variance=v.var) return(v.output) } stats2(1000000) stats2(1000,m=1) stats2(1000,m=1,s=10) stats2(m=1) # what happens if an obligatory argument is left out? Assignment: 1. Create a function that creates two random samples with length n and m from the normal and the uniform distribution resp., given the mean and sd for the first and min and max for the second distribution. The function shall then calculate the covariance-matrix and the correlation-matrix which it gives back in a named list. .0.1.8 Plotting Plotting in R can be done very easily. Check the examples below to get a reference and idea about the plotting capabilities in R. A very good source for color names (that work in R) is (http: //en.wikipedia.org/wiki/Web_colors). ?plot ?colors # very good source for colors: y1 <- rnorm(50,0,1) plot(y1) # set title, and x- and y-labels plot(y1,main="normal RV",xlab="Point No.",ylab="Sample") # now make a line between elements, and color everything blue plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l') # if you want to save plots or open them in separate windows you can use x1 ?Devices # x11 (opens seperate window) x11(8,6) References lxxxix plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l') # pdf pdf("plot1.pdf",6,6) plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample")) dev.off() # more extensive example X11(6,6) par(mfrow=c(2,1),cex=0.9,mar=c(3,3,1,3)+0.1) plot(y1,main="normal RV",xlab="Point No.",ylab="Sample",col="blue",type='l',l legend("topleft",col=c("blue"), lty=2, legend=c("normal Sample")) barplot(y1,col="blue") # making a barplot # plotting a histogram hist(y1) # there is a nicer version available once we get to time series an # create a second sample y2 <- rnorm(50) # scatterplot plot(y1,y2) # boxplot boxplot(y1,y2) .0.1.9 Control Structures Last and least for this lecture we learn about control structure. These structures (for-loops, if/else checks etc) are very useful, if you want to translate a tedious manual task (e.g. in Excel) into something R should do for you and go step by step (e.g. column by column). Again, see below for a variety of examples and commands used in easy examples. x <- sample(-15:15,10) # sample does draw randomly draw 10 numbers from the # 1. We square every element of vector x in a loop y <- NULL # 1.a) set an empty variable (NULL means it is truly nothing and is.null(y) xc List of Figures # 1.b) Use an easy for-loop: for (i in 1:length(x)){ y[i] <- x[i]^2 } # 2. Now we use an if-condition to only replace negative values y <- NULL for (i in 1:length(x)){ y[i] <- x[i] if(x[i]<0) {y[i] <- x[i]^2} } # ASSIGNMENT: lets calculate the 100th square root of the square root of th y <- rep(NA,101) y[1] <- 500 for (i in 1:100){ print(i) y[i+1] <- sqrt(y[i]) } plot(y,type="l") Bibliography Ardia, D., Mullen, K., Peterson, B., and Ulrich, J. (2016). DEoptim: Global Optimization by Differential Evolution. R package version 2.2-4. Bacon, C. R. (2008). Practical Portfolio Performance Measurement and Attribution: plus CD-ROM. Wiley, Chichester, England ; Hoboken, NJ, 2. edition. Bendtsen., C. (2012). pso: Particle Swarm Optimization. R package version 1.0.3. Gubian, S., Xiang, Y., Suomela, B., Hoeng, J., and SA., P. (2018). GenSA: Generalized Simulated Annealing. R package version 1.1.7. Peterson, B. G. and Carl, P. (2018). PortfolioAnalytics: Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios. R package version 1.1.0. Würtz, D., Chalabi, Y., Chen, W., and Ellis, A. (2015). Portfolio Optimization with R/Rmetrics. Rmetrics. xci Index constraint diversification, lxii factor exposure, lxiv leverage exposure, lxv position limit, lxii target return, lxiii transaction cost, lxiv turnover, lxiii constraints, lix–lxv active, lx box, lx dollar-neutral, lx full investment, lx group, lix, lxi long-only, lxi sum of weights, lix date and time, x as.Date(), x business days, xvii holidays, xvii POSIXct, xii Sys.setlocale(), xi timeDate, xiii yearmon, xi, xxii yearqtr, xi, xxii factor exposure, lv, lxiv ggplot2, xxxi objective, lxvi–lxix return, lxvii risk, lxvi risk budget, lxvii PerformanceAnalytics, xxvii quantmod, xxv TTR, xxvi risk factors, lv solver, lxix–lxxi GenSA, lxx pso, lxx random portfolios, lxix grid, lxx sample, lxx simplex, lxx Rglpk, lxix ROI, lxx quadprog, lxx Rglpk, lxx tidyverse, xxviii–xxxiii ggplot2, xxxi timeDate, xiii business days, xvii FinCenter, xiv holidays, xvii origin, xiv xciii xciv Bibliography timetk, lvii TTR, xxvi xts, xxi, lvii import/export, xxv join (inner/outer/left/right/full), xxiii merge, xxiii missing values, xxiv replace, xxiii subset, xxiii vignettes, xxi zoo vignettes, xxi