R exercises Dr. Paolo Coletti – Free University of Bolzano Bozen 10 February 2016 Save periodically the instructions you used to a text file on your Desktop called exam.R and save periodically the workspace you are using to file exam.RData on your Desktop. These operations will help you get familiar with file saving and with all the technical problems that can unpredictably happen when working with a computer program. Variables and vectors 1. Build these variables or vectors. Name E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 Value 173.45 Bozen FALSE E1<192 Numbers from 10 to 27 NA NAN Inf Natural logarithm of E5 Vector with 10 names Logical vector E10=”Julia” Third element of E10 is “Julia” All elements of E10 except the fourth E14 E16 E17 E18 All elements of E10 except “Julia” (regardless where it is) Square root of ( elements of E9 – 3 ) E16 without the missing Be 𝑥 the elements of E17, it should contain 𝑒 𝑥 + 1 − [𝑥 + E19 E20 E21 (𝑥−1)2 2 + (𝑥−1)3 6 ] Elements of E18 which are larger than 0 Numbers from 1 to 20, then from 20 to 1 Elements from E20 with values larger than 15 set to NA Loops Load workspace basic.RData. 2. 3. 4. Build vector E101 which is example31 – its index + example32 (i.e. 1-1+32,3-2+38,…). Calculate the sum of the first 10 elements of example31, of the second 10 elements, of the third 10 elements, putting the results into vector with 5 elements called E102. Hint: Take a piece of paper and try to do it manually, step by step. Do it really! You will see that you are doing two loops, one inside the other, the external one is repeated 5 times, while the internal one calculates the numbers. The difficult thing is how to tell to the computer which numbers should be added: doing it manually is easier because you visualize the “next 10 numbers”. However, in order to tell it to a computer, we need a precise formula, which you have to invent after having written correctly the two loops. E102 will be 23 21 30 19 21. Build vector E103 with 30 elements where its element is equal to 2 to the power of its index minus its index squared (i.e. 21-12,22-22,23-32,…). 1 R exercises 5. 6. 7. 8. 9. Paolo Coletti Build vector E104 with the 30 perfect squares starting from 100 (i.e. 100, 121, 144,…). Try to build it having 100 in position 1 (instead of in position 10 which would be easier), 121 in position 2 (instead of 11), etc. Build vector E105 with the first 30 triangular numbers, i.e. each number is the sum of all the indexes from 1 up to its index (i.e. 1, 1+2, 1+2+3, 1+2+3+4, …). Probably you know that there is a mathematical formula for it, but the exercise makes sense if you do not use it. Suppose you do not know that it exists and let the computer calculate each sum for you. Build vector E106 where each element is the sum of the previous element plus the current index (i.e. 1, 1+2, 3+3, 6+4, 10+5, 15+6, …). Build logical vector E107 with the check whether E105 is equal to E106. Build vector E108 long as much as you like, then consider the following sequence 2:length(your vector)-1 and discover why it is not what you expect and how to solve the issue. Build vector E109 where, starting from 0 for the first element, each subsequent element is the sum of the previous element plus twice the current index (i.e. 0, 0+2*2, 4+2*3, 10+2*4, 18+2*5, 28+2*6, …). Functions Load workspace basic.RData and, if it is the case, re-using the code written for the loops section. 10. Build function triangular which accepts as input a number N (default value 10) and returns a vector with the first N triangular number. Test that it works. 11. Build function sumUpTo which accepts as input an vector A and a number N (default 1) and returns the sum of the values of A from 1 to N. Test that it works. 12. Build function equationSolutions which accepts as input a (default 1), b (default 0), c (default 0) and returns a two element vector with the two solutions of the equation ax 2 + bx + c = 0 (they are (−b ± √b 2 − 4ac)/2a, in the rare event that you do not remember them from high school). Test that it works with a = 1, b = 0, c = −1 and with a = 1, b = 3, c = 2. 13. Build function GreekPi which accepts as input N (default 3) and returns the approximation for π using 4 4 4 4 4 4 4 j+1 the formula ∑N = 1 − 3 + 5 − 7 + 9 − ⋯ (−1)N+1 2N−1. Test that it works with N is 5 j=1(−1) 2j−1 (result is 4/1-4/3+4/5-4/7+4/9=3.339). Advanced function exercises 14. The typical trick to check whether number n is even is the expression (n/2)==round(n/2). Build function oddVector which receives as input two integers a and b and returns a vector containing the odd numbers in sequence between a and b. For example, oddVector(4,11) returns vector 5, 7, 9, 11. 15. Build function appearing which receives as input two vectors and returns how many elements of the first vector appear in the second one in any position (not only those in the same position, all of them). Then provide two meaningful examples to check whether the function works. 16. Build function appears which receives as input variable p and vector a and returns TRUE if p appears in a, FALSE otherwise. 17. Build function commonVectors which receives as input two vectors of the same length and compares them position by position, returning a vector with the common elements. Then, build commonVectors2 which is able to handle also vectors of different lengths, examining only the elements which exists in both. Hint: trim the longest vector using function length and, if you need it, min. 2 R exercises Paolo Coletti If control 18. Modify function equationSolutions of exercise 12 in such a way that it returns a vector which is either empty in case equation has no solution (b2 − 4ac < 0), with one element in case equation has only one solution (b2 − 4ac = 0), and with two elements in case equation has two solutions. 19. Modify function GreekPi of exercise 13 in such a way that it checks that N be larger than 1. In case it is not, the function returns 3.14. 20. Solve exercise 14 using instead an if control to check which numbers are odd. 21. Solve exercise 16 using if control. Hint: this time it is better to rewrite it than using function appearing. 22. Build function IRPEF which receives as input the total earn and deduction and calculates a person’s IRPEF according to this table − earn-deduction less than 0: 0 − earn-deduction up to 15000 : 23% of earn-deduction − earn-deduction from 15001 to 28000: 3450 + 27% on the part of earn-deduction which exceeds 15000 − earn-deduction from 28001 to 55000: 6960 + 38% on the part of earn-deduction which exceeds 28000 − earn-deduction from 55001 to 75000: 17220 + 41% on the part of earn-deduction which exceeds 55000 − earn-deduction larger than 75000: 25420 + 43% on the part of earn-deduction which exceeds 75000. 23. Unfortunately Italian IRPEF is more complicated. Build a new function IRPEF2 copying and modifying the previous code, to consider also the “no tax area” which follows these rules: calculate the coefficient (33500-earn)/26000, round it to 4 decimals, if it is larger than 1 increase the deduction by 7500, if it is smaller than 0 do not modify the deduction, if is between 0 and 1 increase the deduction by 7500 multiplied by the coefficient. Hint: probably it is more natural to solve this exercise using two ifs, but after having solved it in this way try to solve it using appropriately functions max(0,x) and min(1,x) instead of the two ifs. Factors and data frames Load workspace basic.RData. 24. Using vector example31 build ordered vector E301, using labels XS, S, M, L, XL. 25. Build data frame E302 using the first 5 elements of vectors example04, example05, example06, example08, example10, example11, assigning names e04, e05, e06, e08, e10, e11 and using character vector A, B, C, D, E for row names. 26. Attach data frame E302. Display (using a loop and function sumUpTo, not manually!) the sum of the first one, two, three, four, five values of column e05. Detach data frame E302. 27. Load dataset UScereal from package MASS. Convert all sugars below 10 to NA. Build new vector logsugar inside dataset UScereal with the logarithm of sugar. Export the dataset into text file cereals1.txt, using semicolon as separator, with headers, without quoting, using a dash for NA. Export it to file cereals2.txt using tab as separator, without headers, without quoting, using a dot for NA. 28. Paste this table x,y,ID,color 8.6,5.6,001,blue 99.3,77.0,002,red 8.01,44.3,003,orange 12.1,42.3,004,red -0.2,2,005,red 0.8,-31.3,007,blue into a text file, save it, and import it into dataset E303 with ID column for row names. Colors should be a factor variable. 3 R exercises Paolo Coletti 29. Build vector E304 going from 7 to 60, vector E305 equal to the exponential of E304 and vector E306 equal to the sum of the previous elements of E304 (use a loop and an appropriate function to build it; it must be 7, 7+8, 7+8+9, …). Put the three vectors into time series E307 with monthly data from March 2003 to August 2016. Export the time series to text file timeseries.txt with comma delimiter, no headers, no quoting. Data modifications Load dataset Chile from package “Car” 30. Build a new dataset Chile2 selecting only subject who voted N. a. Build a new dataset Chile3 filtering out subjects with income larger than 50000. b. Build a new dataset Chile4 excluding subjects with education P and income larger than 10000 and excluding also subjects with education PS and income larger than 5000. c. Build a new dataset Chile5 inserting only all the subjects from region C (regardless of sex and age) and all the female subjects who are at least 35 years old (regardless of the region). 31. Build a new dataset Chile6 with cases with missing data filtered out. 32. Copy dataset Chile to Chile7. Modify Chile7’s vectors: a. vote: A and N into “Right”, U, Y, and NA into “Left”; b. region: C, M and N into “North”, leave the other two unchanged; c. age: all values below 26 to NA; d. statusquo: convert all NA to 0. Then all values below -1 get their value with a positive sign; e. income: all female subjects increase their value by 20%; f. income: all subjects with education S and population above 200000 must have at least 34000; g. income2: compute this new vector equal to the logarithm of income; h. group vector income into ordered vector income_grouped with 5 intervals, using equal count, calling them “very poor”… “very rich”; i. bin vector population into factor type with 4 intervals, using K-means clustering, using ranges as labels. Load dataset Baumann from package “Car” 33. Recode vectors post.test.1, post.test.2, pretest.1 and pretest.2 into factors rec_post.test.1, rec_post.test.2, rec_pretest.1 and rec_pretest.2 following this rules:1:3 to "low", 4:6 to "medium", 7:9 to "high", 10:12 to "very high", 13:15 to "excellent" and 16 to "super" without repeating the menu command four times! Solution written in light grey: you either use the prefix box in the menu or use the written command and edit the vectors’ names. Graphs Load dataset Mroz from package “Car”, containing data on US women’s labor force. hs is whether husband went to college, inc is other members of the familiy’s income, k5 is the number of children below 6, k618 the number of children above 5, lwg the woman’s income, wc whether woman went to college. 34. Build all the appropriate graphs you know for all the individual variables of the dataset, experimenting with color changes (coloring all the elements of the same colors as well as coloring all the elements of different colors) and experimenting with axes’ labels or titles whenever possible. 35. Build all the appropriate graphs you know to depict the relation between, experimenting as before: 4 R exercises a. b. c. d. e. f. Paolo Coletti lwg and wc inc and hc lwg and inc k618 and wc k5 and wc inc, lwg and hc. Others 36. If you have done all the exercises so far and you are sure they are done correctly, you can go through all R packages and invent suitable exercises. It is very easy to do this for Graphs, but it is a good idea to invent new exercises also for data modifications. 5