Advanced function exercises

advertisement
R exercises
Dr. Paolo Coletti – Free University of Bolzano Bozen
10 February 2016
Save periodically the instructions you used to a text file on your Desktop called exam.R and save
periodically the workspace you are using to file exam.RData on your Desktop. These operations will help
you get familiar with file saving and with all the technical problems that can unpredictably happen when
working with a computer program.
Variables and vectors
1.
Build these variables or vectors.
Name
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
E13
Value
173.45
Bozen
FALSE
E1<192
Numbers from 10 to 27
NA
NAN
Inf
Natural logarithm of E5
Vector with 10 names
Logical vector E10=”Julia”
Third element of E10 is “Julia”
All elements of E10 except the fourth
E14
E16
E17
E18
All elements of E10 except “Julia”
(regardless where it is)
Square root of ( elements of E9 – 3 )
E16 without the missing
Be 𝑥 the elements of E17, it should
contain 𝑒 𝑥 + 1 − [𝑥 +
E19
E20
E21
(𝑥−1)2
2
+
(𝑥−1)3
6
]
Elements of E18 which are larger than 0
Numbers from 1 to 20, then from 20 to
1
Elements from E20 with values larger
than 15 set to NA
Loops
Load workspace basic.RData.
2.
3.
4.
Build vector E101 which is example31 – its index + example32 (i.e. 1-1+32,3-2+38,…).
Calculate the sum of the first 10 elements of example31, of the second 10 elements, of the third 10
elements, putting the results into vector with 5 elements called E102. Hint: Take a piece of paper and
try to do it manually, step by step. Do it really! You will see that you are doing two loops, one inside
the other, the external one is repeated 5 times, while the internal one calculates the numbers. The
difficult thing is how to tell to the computer which numbers should be added: doing it manually is
easier because you visualize the “next 10 numbers”. However, in order to tell it to a computer, we
need a precise formula, which you have to invent after having written correctly the two loops. E102
will be 23 21 30 19 21.
Build vector E103 with 30 elements where its element is equal to 2 to the power of its index minus its
index squared (i.e. 21-12,22-22,23-32,…).
1
R exercises
5.
6.
7.
8.
9.
Paolo Coletti
Build vector E104 with the 30 perfect squares starting from 100 (i.e. 100, 121, 144,…). Try to build it
having 100 in position 1 (instead of in position 10 which would be easier), 121 in position 2 (instead of
11), etc.
Build vector E105 with the first 30 triangular numbers, i.e. each number is the sum of all the indexes
from 1 up to its index (i.e. 1, 1+2, 1+2+3, 1+2+3+4, …). Probably you know that there is a mathematical
formula for it, but the exercise makes sense if you do not use it. Suppose you do not know that it exists
and let the computer calculate each sum for you.
Build vector E106 where each element is the sum of the previous element plus the current index (i.e.
1, 1+2, 3+3, 6+4, 10+5, 15+6, …). Build logical vector E107 with the check whether E105 is equal to
E106.
Build vector E108 long as much as you like, then consider the following sequence 2:length(your
vector)-1 and discover why it is not what you expect and how to solve the issue.
Build vector E109 where, starting from 0 for the first element, each subsequent element is the sum of
the previous element plus twice the current index (i.e. 0, 0+2*2, 4+2*3, 10+2*4, 18+2*5, 28+2*6, …).
Functions
Load workspace basic.RData and, if it is the case, re-using the code written for the loops section.
10. Build function triangular which accepts as input a number N (default value 10) and returns a vector
with the first N triangular number. Test that it works.
11. Build function sumUpTo which accepts as input an vector A and a number N (default 1) and returns the
sum of the values of A from 1 to N. Test that it works.
12. Build function equationSolutions which accepts as input a (default 1), b (default 0), c (default 0) and
returns a two element vector with the two solutions of the equation ax 2 + bx + c = 0 (they are
(−b ± √b 2 − 4ac)/2a, in the rare event that you do not remember them from high school). Test that
it works with a = 1, b = 0, c = −1 and with a = 1, b = 3, c = 2.
13. Build function GreekPi which accepts as input N (default 3) and returns the approximation for π using
4
4
4
4
4
4
4
j+1
the formula ∑N
= 1 − 3 + 5 − 7 + 9 − ⋯ (−1)N+1 2N−1. Test that it works with N is 5
j=1(−1)
2j−1
(result is 4/1-4/3+4/5-4/7+4/9=3.339).
Advanced function exercises
14. The typical trick to check whether number n is even is the expression (n/2)==round(n/2). Build
function oddVector which receives as input two integers a and b and returns a vector containing the
odd numbers in sequence between a and b. For example, oddVector(4,11) returns vector 5, 7, 9, 11.
15. Build function appearing which receives as input two vectors and returns how many elements of the
first vector appear in the second one in any position (not only those in the same position, all of them).
Then provide two meaningful examples to check whether the function works.
16. Build function appears which receives as input variable p and vector a and returns TRUE if p appears in
a, FALSE otherwise.
17. Build function commonVectors which receives as input two vectors of the same length and compares
them position by position, returning a vector with the common elements. Then, build
commonVectors2 which is able to handle also vectors of different lengths, examining only the
elements which exists in both. Hint: trim the longest vector using function length and, if you need it,
min.
2
R exercises
Paolo Coletti
If control
18. Modify function equationSolutions of exercise 12 in such a way that it returns a vector which is either
empty in case equation has no solution (b2 − 4ac < 0), with one element in case equation has only
one solution (b2 − 4ac = 0), and with two elements in case equation has two solutions.
19. Modify function GreekPi of exercise 13 in such a way that it checks that N be larger than 1. In case it is
not, the function returns 3.14.
20. Solve exercise 14 using instead an if control to check which numbers are odd.
21. Solve exercise 16 using if control. Hint: this time it is better to rewrite it than using function appearing.
22. Build function IRPEF which receives as input the total earn and deduction and calculates a person’s
IRPEF according to this table
− earn-deduction less than 0: 0
− earn-deduction up to 15000 : 23% of earn-deduction
− earn-deduction from 15001 to 28000: 3450 + 27% on the part of earn-deduction which exceeds 15000
− earn-deduction from 28001 to 55000: 6960 + 38% on the part of earn-deduction which exceeds 28000
− earn-deduction from 55001 to 75000: 17220 + 41% on the part of earn-deduction which exceeds
55000
− earn-deduction larger than 75000: 25420 + 43% on the part of earn-deduction which exceeds 75000.
23. Unfortunately Italian IRPEF is more complicated. Build a new function IRPEF2 copying and modifying
the previous code, to consider also the “no tax area” which follows these rules: calculate the
coefficient (33500-earn)/26000, round it to 4 decimals, if it is larger than 1 increase the deduction by
7500, if it is smaller than 0 do not modify the deduction, if is between 0 and 1 increase the deduction
by 7500 multiplied by the coefficient. Hint: probably it is more natural to solve this exercise using two
ifs, but after having solved it in this way try to solve it using appropriately functions max(0,x) and
min(1,x) instead of the two ifs.
Factors and data frames
Load workspace basic.RData.
24. Using vector example31 build ordered vector E301, using labels XS, S, M, L, XL.
25. Build data frame E302 using the first 5 elements of vectors example04, example05, example06,
example08, example10, example11, assigning names e04, e05, e06, e08, e10, e11 and using character
vector A, B, C, D, E for row names.
26. Attach data frame E302. Display (using a loop and function sumUpTo, not manually!) the sum of the
first one, two, three, four, five values of column e05. Detach data frame E302.
27. Load dataset UScereal from package MASS. Convert all sugars below 10 to NA. Build new vector
logsugar inside dataset UScereal with the logarithm of sugar. Export the dataset into text file
cereals1.txt, using semicolon as separator, with headers, without quoting, using a dash for NA. Export
it to file cereals2.txt using tab as separator, without headers, without quoting, using a dot for NA.
28. Paste this table
x,y,ID,color
8.6,5.6,001,blue
99.3,77.0,002,red
8.01,44.3,003,orange
12.1,42.3,004,red
-0.2,2,005,red
0.8,-31.3,007,blue
into a text file, save it, and import it into dataset E303 with ID column for row names. Colors should be
a factor variable.
3
R exercises
Paolo Coletti
29. Build vector E304 going from 7 to 60, vector E305 equal to the exponential of E304 and vector E306
equal to the sum of the previous elements of E304 (use a loop and an appropriate function to build it;
it must be 7, 7+8, 7+8+9, …). Put the three vectors into time series E307 with monthly data from March
2003 to August 2016. Export the time series to text file timeseries.txt with comma delimiter, no
headers, no quoting.
Data modifications
Load dataset Chile from package “Car”
30. Build a new dataset Chile2 selecting only subject who voted N.
a. Build a new dataset Chile3 filtering out subjects with income larger than 50000.
b. Build a new dataset Chile4 excluding subjects with education P and income larger than 10000 and
excluding also subjects with education PS and income larger than 5000.
c. Build a new dataset Chile5 inserting only all the subjects from region C (regardless of sex and age)
and all the female subjects who are at least 35 years old (regardless of the region).
31. Build a new dataset Chile6 with cases with missing data filtered out.
32. Copy dataset Chile to Chile7. Modify Chile7’s vectors:
a. vote: A and N into “Right”, U, Y, and NA into “Left”;
b. region: C, M and N into “North”, leave the other two unchanged;
c. age: all values below 26 to NA;
d. statusquo: convert all NA to 0. Then all values below -1 get their value with a positive sign;
e. income: all female subjects increase their value by 20%;
f. income: all subjects with education S and population above 200000 must have at least 34000;
g. income2: compute this new vector equal to the logarithm of income;
h. group vector income into ordered vector income_grouped with 5 intervals, using equal count,
calling them “very poor”… “very rich”;
i. bin vector population into factor type with 4 intervals, using K-means clustering, using ranges as
labels.
Load dataset Baumann from package “Car”
33. Recode vectors post.test.1, post.test.2, pretest.1 and pretest.2 into factors rec_post.test.1,
rec_post.test.2, rec_pretest.1 and rec_pretest.2 following this rules:1:3 to "low", 4:6 to "medium", 7:9
to "high", 10:12 to "very high", 13:15 to "excellent" and 16 to "super" without repeating the menu
command four times!
Solution written in light grey: you either use the prefix box in the menu or use the written command
and edit the vectors’ names.
Graphs
Load dataset Mroz from package “Car”, containing data on US women’s labor force. hs is whether husband
went to college, inc is other members of the familiy’s income, k5 is the number of children below 6, k618 the
number of children above 5, lwg the woman’s income, wc whether woman went to college.
34. Build all the appropriate graphs you know for all the individual variables of the dataset, experimenting
with color changes (coloring all the elements of the same colors as well as coloring all the elements of
different colors) and experimenting with axes’ labels or titles whenever possible.
35. Build all the appropriate graphs you know to depict the relation between, experimenting as before:
4
R exercises
a.
b.
c.
d.
e.
f.
Paolo Coletti
lwg and wc
inc and hc
lwg and inc
k618 and wc
k5 and wc
inc, lwg and hc.
Others
36. If you have done all the exercises so far and you are sure they are done correctly, you can go through
all R packages and invent suitable exercises. It is very easy to do this for Graphs, but it is a good idea to
invent new exercises also for data modifications.
5
Download