Uploaded by zainrzalpha

stats 782 lab02-solution

advertisement
STATS 782
Lab 2
2023SC
This lab is about R basics as covered in Chapter 1 of the course book. The questions given below
are fairly basic exercises. They are intended to help you understand and use R better so that
you are in a better position to solve harder and more complicated problems on your own.
1. Operator Precedences
Failing to make use of operator precedences, one has to use redundant parenthesis, which makes
the code look rather cumbersome and harder to read for trained eyes. For example,
> (1:5) + ((x * 2) / (-3))
does just the same thing as
> 1:5 + x * 2 / -3
independently of what variable x might store (as long as either expression can be evaluated).
Certainly, the latter, simpler expression is preferred.
The following R expressions are some examples that contain redundant parenthesis. Try to simplify them by removing all such parenthesis (but do not change the expressions). All variables
are assumed to have appropriate values for evaluation.
(a) ((x + (1:10)) * (3 / 4)) + 5
(b) (x / (3 * (1:5))) + (2^(1:5))
(c) (x / (y + 10)) - ((-5):(-1))
(d) x^(2^(1:10)) * (y %/% 2)
(e) ((!i) & j) | ((x >= 8) & (y < 5))
To check your answers, you may set, say,
>
>
>
>
x
y
i
j
=
=
=
=
1:10
10:1
rep(c(TRUE,FALSE), c(4,6))
rep(c(TRUE,FALSE), each=5)
and run both a given expression and your simplified one. When not sure, try to remove one
pair of parenthesis at a time to see if the result changes.
1
Answer:
(a) (x + 1:10) * 3 / 4 + 5
(b) x / (3 * 1:5) + 2^(1:5)
(c) x / (y + 10) - -5:-1
(d) x^2^(1:10) * y %/% 2
(e) !i & j | x >= 8 & y < 5
(a)
> (x + 1:10) * 3 / 4 + 5
[1] 6.5 8.0 9.5 11.0 12.5 14.0 15.5 17.0 18.5 20.0
(b)
> x / (3 * 1:5) + 2^(1:5)
[1] 2.3333 4.3333 8.3333 16.3333 32.3333
[9] 16.7500 32.6667
4.0000
5.1667
8.8889
(c)
> x / (y + 10) - -5:-1
[1] 5.0500 4.1053 3.1667 2.2353 1.3125 5.4000 4.5000 3.6154 2.7500
[10] 1.9091
(d)
> x^2^(1:10) * y %/% 2
[1] 5.0000e+00 6.4000e+01 2.6244e+04
[6] 1.2668e+50 2.9756e+108 1.5525e+231
(e)
> !i & j | x >= 8 & y < 5
[1] FALSE FALSE FALSE FALSE
1.2885e+10
Inf
TRUE FALSE FALSE
2
TRUE
6.9849e+22
NaN
TRUE
TRUE
2. Subsetting and Logical Values
To generate 500 random values from the standard normal distribution, run
> set.seed(782)
> x = rnorm(500)
# what does this mean?
To find those values that are greater than 2.5,
> x[x > 2.5]
[1] 2.9186 2.7818 2.8383 2.5736 2.5535 2.6083
Write some simple R expressions to do the following about the values saved in x.
(a) Find the values that are less than −2.
(b) Find the values that are less than −2 or greater than 2.
(c) Find the number of values in interval (−1.96, 1.96).
(d) Find the proportion of values that are in interval (−1.96, 1.96). Do not use the sample
size 500 directly in your code. Should the proportion be close to 0.95?
(e) Replace all negative values with 0. Then replace all values greater than 2 with 2. Compute
the mean of the new sample, which should be about 0.37052.
(f) For Part (e), can you think of one (simple) expression that produces the same new sample?
Hint: Consider using pmax() and pmin().
Answer:
(a)
> x[x < -2]
[1] -2.4253 -2.2406 -2.3229 -2.0144 -2.0044 -2.3708 -2.2505 -2.2447
[9] -2.1887 -2.4154 -3.4992 -2.2445 -2.3082 -2.5469 -2.2981 -2.3633
(b)
> x[x < -2 | x > 2]
[1] -2.4253 -2.2406 -2.3229 -2.0144 -2.0044 -2.3708 2.9186 2.7818
[9] -2.2505 2.1346 -2.2447 -2.1887 -2.4154 -3.4992 -2.2445 -2.3082
[17] 2.4810 2.8383 2.5736 -2.5469 -2.2981 2.5535 -2.3633 2.6083
(c)
> sum(x > -1.96 & x < 1.96)
[1] 474
3
(d)
> sum(x > -1.96 & x < 1.96) / length(x)
[1] 0.948
> mean(x > -1.96 & x < 1.96)
# second solution
[1] 0.948
(e)
> x[x < 0] = 0
> x[x > 2] = 2
> mean(x)
[1] 0.37052
(f)
> set.seed(782); x = rnorm(500);
> x = pmin(pmax(x, 0), 2)
> mean(x)
[1] 0.37052
# regenerate the sample
4
3. Creating Patterned Sequences
Because of the nature of vectorised computation, patterned sequences of numbers are highly useful in R. Such sequences can be created by, e.g., :, seq(), rep() and other operators/functions.
For example, to create the sequence 1, 2, . . . , 10, we can run
> 1:10
[1] 1
2
3
4
5
6
7
8
9 10
and, to create the sequence 2, 4, 6, . . . , 20,
> 1:10 * 2
[1] 2 4
6
8 10 12 14 16 18 20
or equivalently
> seq(2, 20, by=2)
[1] 2 4 6 8 10 12 14 16 18 20
Create the following patterned sequences using these operators/functions, but definitely not
c() or any explicit loop.
(a) 1 3 5 7 9 11 13 15 17 19
(b) 2.0 2.5 3.0 3.5 4.0 4.5 5.0
(c) 1e-01 2e-02 3e-03 4e-04 5e-05 6e-06
(d) 1 3 6 10 15 21 28
(e) 1 2 3 1 2 3 1 2 3
(f) 1 1 1 2 2 2 3 3 3
(g) "a" "a" "a" "b" "b" "c"
Note: The built-in R variable letters stores the 26 lowercase Roman/English letters. Also
have a look at variable LETTERS which stores the uppercase ones.
Answer:
(a)
> 1:10 *
[1] 1
> seq(1,
[1] 1
2 - 1
3 5 7 9 11 13 15 17 19
19, by=2)
# second solution
3 5 7 9 11 13 15 17 19
5
(b)
> seq(2, 5, by=0.5)
[1] 2.0 2.5 3.0 3.5 4.0 4.5 5.0
(c)
> 0.1^(1:6) * 1:6
[1] 1e-01 2e-02 3e-03 4e-04 5e-05 6e-06
(d)
> cumsum(1:7)
[1] 1 3 6 10 15 21 28
(e)
> rep(1:3, 3)
[1] 1 2 3 1 2 3 1 2 3
(f)
> rep(1:3, each=3)
[1] 1 1 1 2 2 2 3 3 3
(g)
> letters[rep(1:3, 3:1)]
[1] "a" "a" "a" "b" "b" "c"
6
4. Computing with Vectors
Typically in R, computing is carried out directly for vectors, rather than resorting to an explicit
loop as often needed for other programming languages. For example, to find the square roots
of 1, 2, . . . , 10, we can simply run
> sqrt(1:10)
[1] 1.0000 1.4142 1.7321 2.0000 2.2361 2.4495 2.6458 2.8284 3.0000
[10] 3.1623
Almost all R built-in functions (and operators) are so vectorised, in the sense that a function
takes a vector of values as its input and outputs a vector of the computed values, each corresponding to one given in the input vector. Mathematical expressions can thus be evaluated
conveniently
for all values given in a vector, or vectors. For example, to find the values of
√
2 x + 1 + x for x = 1, 2, . . . , 10, we can run
> 2 * sqrt(1:10 + 1) + 1:10
[1] 3.8284 5.4641 7.0000
[9] 15.3246 16.6332
8.4721
> x = 1:10
> 2 * sqrt(x + 1) + x
[1] 3.8284 5.4641 7.0000
[9] 15.3246 16.6332
# just like the mathematical expression
8.4721 9.8990 11.2915 12.6569 14.0000
9.8990 11.2915 12.6569 14.0000
or
Use this property to do the folllowing computing (and you should completely avoid using any
explicit loop).
P
(a) Let r = 1.05. Compute nj=0 rj , for n = 10. You should consider computing all terms in
the sum in one expression, and then sum up all terms.
(b) Re-do Part (a), for n = 20, 30, 40, 50, respectively.
P
(c) Alternatively, we can use cumsum() to compute nj=0 rj , for all n = 0, 1, 2, . . . , 50, and
then extract the results for n = 10, 20, 30, 40, 50. Do this. Is this a better solution than
that done in Parts (a) and (b), if one is to obtain all five results?
(d) The Sterling numbers of the second kind are defined as
k
1 X
k−j k
S(n, k) =
(−1)
j n.
k! j=0
j
Calculate S(5, 2) and S(10, 6), which are 15 and 22827, respectively. Note that R functions
factorial() and choose() can be used to compute factorials and binomial coefficients,
respectively, in a vectorised fashion. For example,
7
> factorial(0:5)
# k!, for k = 0:5
[1]
1
1
2
6 24 120
> choose(10, 0:5)
# (k choose j), for k = 10, j = 0:5
[1]
1 10 45 120 210 252
Answer:
(a)
> r = 1.05
> sum(r^(0:10))
[1] 14.207
(b)
> sum(r^(0:20))
[1] 35.719
> sum(r^(0:30))
[1] 70.761
> sum(r^(0:40))
[1] 127.84
> sum(r^(0:50))
[1] 220.82
(c)
> cumsum(r^(0:50))[1:5 * 10 + 1]
[1] 14.207 35.719 70.761 127.840 220.815
This is a better solution than what’s been done in Parts (a)P
and (b), because there is
n
no duplicated computing here. In Part (b), the computing of 10
j=0 r has been repeated
P20
four additional times to Part (a), j=11 rn three additional times, and so on.
Extra computing can sometimes be beneficial in R programming when dealing with vectors. However, one should avoid unnecessary ones. Computing time is the factor of
concern — more later.
(d)
> n = 5
# for S(5, 2)
> k = 2
> j = 0:k
> sum((-1)^(k-j) * choose(k, j) * j^n) / factorial(k)
[1] 15
> n = 10
# for S(10, 6)
> k = 6
> j = 0:k
> sum((-1)^(k-j) * choose(k, j) * j^n) / factorial(k)
[1] 22827
If S(n, k) needs to be computed more than a couple of times, we’d better turn the above
code into an R function, which is fairly straightfoward (check Chapter 2).
8
Download