STATS 782 Lab 2 2023SC This lab is about R basics as covered in Chapter 1 of the course book. The questions given below are fairly basic exercises. They are intended to help you understand and use R better so that you are in a better position to solve harder and more complicated problems on your own. 1. Operator Precedences Failing to make use of operator precedences, one has to use redundant parenthesis, which makes the code look rather cumbersome and harder to read for trained eyes. For example, > (1:5) + ((x * 2) / (-3)) does just the same thing as > 1:5 + x * 2 / -3 independently of what variable x might store (as long as either expression can be evaluated). Certainly, the latter, simpler expression is preferred. The following R expressions are some examples that contain redundant parenthesis. Try to simplify them by removing all such parenthesis (but do not change the expressions). All variables are assumed to have appropriate values for evaluation. (a) ((x + (1:10)) * (3 / 4)) + 5 (b) (x / (3 * (1:5))) + (2^(1:5)) (c) (x / (y + 10)) - ((-5):(-1)) (d) x^(2^(1:10)) * (y %/% 2) (e) ((!i) & j) | ((x >= 8) & (y < 5)) To check your answers, you may set, say, > > > > x y i j = = = = 1:10 10:1 rep(c(TRUE,FALSE), c(4,6)) rep(c(TRUE,FALSE), each=5) and run both a given expression and your simplified one. When not sure, try to remove one pair of parenthesis at a time to see if the result changes. 1 Answer: (a) (x + 1:10) * 3 / 4 + 5 (b) x / (3 * 1:5) + 2^(1:5) (c) x / (y + 10) - -5:-1 (d) x^2^(1:10) * y %/% 2 (e) !i & j | x >= 8 & y < 5 (a) > (x + 1:10) * 3 / 4 + 5 [1] 6.5 8.0 9.5 11.0 12.5 14.0 15.5 17.0 18.5 20.0 (b) > x / (3 * 1:5) + 2^(1:5) [1] 2.3333 4.3333 8.3333 16.3333 32.3333 [9] 16.7500 32.6667 4.0000 5.1667 8.8889 (c) > x / (y + 10) - -5:-1 [1] 5.0500 4.1053 3.1667 2.2353 1.3125 5.4000 4.5000 3.6154 2.7500 [10] 1.9091 (d) > x^2^(1:10) * y %/% 2 [1] 5.0000e+00 6.4000e+01 2.6244e+04 [6] 1.2668e+50 2.9756e+108 1.5525e+231 (e) > !i & j | x >= 8 & y < 5 [1] FALSE FALSE FALSE FALSE 1.2885e+10 Inf TRUE FALSE FALSE 2 TRUE 6.9849e+22 NaN TRUE TRUE 2. Subsetting and Logical Values To generate 500 random values from the standard normal distribution, run > set.seed(782) > x = rnorm(500) # what does this mean? To find those values that are greater than 2.5, > x[x > 2.5] [1] 2.9186 2.7818 2.8383 2.5736 2.5535 2.6083 Write some simple R expressions to do the following about the values saved in x. (a) Find the values that are less than −2. (b) Find the values that are less than −2 or greater than 2. (c) Find the number of values in interval (−1.96, 1.96). (d) Find the proportion of values that are in interval (−1.96, 1.96). Do not use the sample size 500 directly in your code. Should the proportion be close to 0.95? (e) Replace all negative values with 0. Then replace all values greater than 2 with 2. Compute the mean of the new sample, which should be about 0.37052. (f) For Part (e), can you think of one (simple) expression that produces the same new sample? Hint: Consider using pmax() and pmin(). Answer: (a) > x[x < -2] [1] -2.4253 -2.2406 -2.3229 -2.0144 -2.0044 -2.3708 -2.2505 -2.2447 [9] -2.1887 -2.4154 -3.4992 -2.2445 -2.3082 -2.5469 -2.2981 -2.3633 (b) > x[x < -2 | x > 2] [1] -2.4253 -2.2406 -2.3229 -2.0144 -2.0044 -2.3708 2.9186 2.7818 [9] -2.2505 2.1346 -2.2447 -2.1887 -2.4154 -3.4992 -2.2445 -2.3082 [17] 2.4810 2.8383 2.5736 -2.5469 -2.2981 2.5535 -2.3633 2.6083 (c) > sum(x > -1.96 & x < 1.96) [1] 474 3 (d) > sum(x > -1.96 & x < 1.96) / length(x) [1] 0.948 > mean(x > -1.96 & x < 1.96) # second solution [1] 0.948 (e) > x[x < 0] = 0 > x[x > 2] = 2 > mean(x) [1] 0.37052 (f) > set.seed(782); x = rnorm(500); > x = pmin(pmax(x, 0), 2) > mean(x) [1] 0.37052 # regenerate the sample 4 3. Creating Patterned Sequences Because of the nature of vectorised computation, patterned sequences of numbers are highly useful in R. Such sequences can be created by, e.g., :, seq(), rep() and other operators/functions. For example, to create the sequence 1, 2, . . . , 10, we can run > 1:10 [1] 1 2 3 4 5 6 7 8 9 10 and, to create the sequence 2, 4, 6, . . . , 20, > 1:10 * 2 [1] 2 4 6 8 10 12 14 16 18 20 or equivalently > seq(2, 20, by=2) [1] 2 4 6 8 10 12 14 16 18 20 Create the following patterned sequences using these operators/functions, but definitely not c() or any explicit loop. (a) 1 3 5 7 9 11 13 15 17 19 (b) 2.0 2.5 3.0 3.5 4.0 4.5 5.0 (c) 1e-01 2e-02 3e-03 4e-04 5e-05 6e-06 (d) 1 3 6 10 15 21 28 (e) 1 2 3 1 2 3 1 2 3 (f) 1 1 1 2 2 2 3 3 3 (g) "a" "a" "a" "b" "b" "c" Note: The built-in R variable letters stores the 26 lowercase Roman/English letters. Also have a look at variable LETTERS which stores the uppercase ones. Answer: (a) > 1:10 * [1] 1 > seq(1, [1] 1 2 - 1 3 5 7 9 11 13 15 17 19 19, by=2) # second solution 3 5 7 9 11 13 15 17 19 5 (b) > seq(2, 5, by=0.5) [1] 2.0 2.5 3.0 3.5 4.0 4.5 5.0 (c) > 0.1^(1:6) * 1:6 [1] 1e-01 2e-02 3e-03 4e-04 5e-05 6e-06 (d) > cumsum(1:7) [1] 1 3 6 10 15 21 28 (e) > rep(1:3, 3) [1] 1 2 3 1 2 3 1 2 3 (f) > rep(1:3, each=3) [1] 1 1 1 2 2 2 3 3 3 (g) > letters[rep(1:3, 3:1)] [1] "a" "a" "a" "b" "b" "c" 6 4. Computing with Vectors Typically in R, computing is carried out directly for vectors, rather than resorting to an explicit loop as often needed for other programming languages. For example, to find the square roots of 1, 2, . . . , 10, we can simply run > sqrt(1:10) [1] 1.0000 1.4142 1.7321 2.0000 2.2361 2.4495 2.6458 2.8284 3.0000 [10] 3.1623 Almost all R built-in functions (and operators) are so vectorised, in the sense that a function takes a vector of values as its input and outputs a vector of the computed values, each corresponding to one given in the input vector. Mathematical expressions can thus be evaluated conveniently for all values given in a vector, or vectors. For example, to find the values of √ 2 x + 1 + x for x = 1, 2, . . . , 10, we can run > 2 * sqrt(1:10 + 1) + 1:10 [1] 3.8284 5.4641 7.0000 [9] 15.3246 16.6332 8.4721 > x = 1:10 > 2 * sqrt(x + 1) + x [1] 3.8284 5.4641 7.0000 [9] 15.3246 16.6332 # just like the mathematical expression 8.4721 9.8990 11.2915 12.6569 14.0000 9.8990 11.2915 12.6569 14.0000 or Use this property to do the folllowing computing (and you should completely avoid using any explicit loop). P (a) Let r = 1.05. Compute nj=0 rj , for n = 10. You should consider computing all terms in the sum in one expression, and then sum up all terms. (b) Re-do Part (a), for n = 20, 30, 40, 50, respectively. P (c) Alternatively, we can use cumsum() to compute nj=0 rj , for all n = 0, 1, 2, . . . , 50, and then extract the results for n = 10, 20, 30, 40, 50. Do this. Is this a better solution than that done in Parts (a) and (b), if one is to obtain all five results? (d) The Sterling numbers of the second kind are defined as k 1 X k−j k S(n, k) = (−1) j n. k! j=0 j Calculate S(5, 2) and S(10, 6), which are 15 and 22827, respectively. Note that R functions factorial() and choose() can be used to compute factorials and binomial coefficients, respectively, in a vectorised fashion. For example, 7 > factorial(0:5) # k!, for k = 0:5 [1] 1 1 2 6 24 120 > choose(10, 0:5) # (k choose j), for k = 10, j = 0:5 [1] 1 10 45 120 210 252 Answer: (a) > r = 1.05 > sum(r^(0:10)) [1] 14.207 (b) > sum(r^(0:20)) [1] 35.719 > sum(r^(0:30)) [1] 70.761 > sum(r^(0:40)) [1] 127.84 > sum(r^(0:50)) [1] 220.82 (c) > cumsum(r^(0:50))[1:5 * 10 + 1] [1] 14.207 35.719 70.761 127.840 220.815 This is a better solution than what’s been done in Parts (a)P and (b), because there is n no duplicated computing here. In Part (b), the computing of 10 j=0 r has been repeated P20 four additional times to Part (a), j=11 rn three additional times, and so on. Extra computing can sometimes be beneficial in R programming when dealing with vectors. However, one should avoid unnecessary ones. Computing time is the factor of concern — more later. (d) > n = 5 # for S(5, 2) > k = 2 > j = 0:k > sum((-1)^(k-j) * choose(k, j) * j^n) / factorial(k) [1] 15 > n = 10 # for S(10, 6) > k = 6 > j = 0:k > sum((-1)^(k-j) * choose(k, j) * j^n) / factorial(k) [1] 22827 If S(n, k) needs to be computed more than a couple of times, we’d better turn the above code into an R function, which is fairly straightfoward (check Chapter 2). 8