Workflow: 1. To execute the current R script in the RStudio script editor, you can use the keyboard shortcut cmd Shift S (Mac) or Ctrl Shift S (Windows) 2. R variable name must start with a letter 3. To execute the current R expression in the RStudio script editor, you can use the keyboard shortcut Cmd Enter (Mac) or Ctrl Enter (Windows) 4. R variable names can contain letters, numbers, underscore (_), and period (.) 5. To assign a value to an R variable, you should use <Data Types 1. To display what kind of value is stored in x, you can use the R statement class(x) 2. To clear all variables from the R environment, you can use the statement rm(list=ls()) 3. To display the values of x, y, and z, you can use the R statement cat(x, y, z) 4. To display all of the visible variables currently defined in the R workspace, you can use the statement print(ls()) 5. To set the variable x to 5, you can use x=5, x<-5, 5->x 6. To set x to 5 as an integer, you can use the R statement x<-5L 7. To add metadata to an R object, you can use the R function attr() 8. To create a vector of R objects, you can use the statement c() 9. The values TRUE and FALSE are known in R as logical 10. To represent the absence of a vector, you would use NULL 11. To display the value of x, you can use the R statement print(x) 12. To delete a variable x from the R environment, you can use the statement rm(x) 13. A variable name in R must start with a letter or period (.) followed by letters, digits, underscores, and periods 14. The R object that can hold multiple types of objects is called a list 15. To represent a missing value in R, you can use NA 16. To determine how many elements there are in a vector, you can use the R function length() 17. To display all the variables that start with the letter "a" in the R workspace, you can use the statement print(ls(pattern="a")) 18. To display what kind of value is stored in x, you can use the R statement typeof(x) 19. To display all variables (including "hidden variables") in the R workspace, you can use the statement print(ls(all.name=TRUE)) 20. The R object that can hold tabular data of different types is called a data.frame() 21. The R object that holds distinct values along with labels is called a factor 22. Values like 5.3, 7, and 1e3 are known in R as numeric 23. To coerce a variable to a string, you can use the R function as.character() Programming 1. To annotate your functions, comments start with # 2. When should you consider writing a function? a. Whenever you have copied and pasted the same block of code more than twice 3. To conditionally execute code in R, you use if 4. To check in R if condition1 is true or condition2 is true, you would use if (condition1 || condition2) 5. To check in R if condition1 and condition2 are true, you would use if (condition1 && condition2) 6. A pipeable function that performs a side-effect accepts an object and performs an action 7. A pipeable function that performs a transformation accepts an object and returns a modified version 8. In R, to loop indefinitely until a condition is not true, you use while (condition) {} 9. In R, you can loop through a vector V with for (k in seq_along(V)) {} 10. In R, you can loop through 1 through 10 with the statement for (k in 1:10) {} Data Transformation 1. Reading a file with columns separated by white space in R is done with. read_fwf( 2. To save data as an Excel CSV file, you can use write_excel_csv() 3. To convert a vector of character data into dates, you can use parse.date() 4. Reading a comma delimited file in R is done with read_csv() 5. To ignore lines in a file that start with "#", you can use comment = "#" 6. To compute the average departure delay (dep_delay) for each month from flights, you can use flights %>% group_by(year,month) %>% summarize(delay = mean(dep_delay,na.rm=TRUE) 7. To select all elements that are in y and not in x, you would use !x & y 8. Using dplyr in R, you add variables with the function mutate() 9. Using dplyr in R, you describe statistics of data with the function summarise() 10. To select all elements that are in x or y but not both x and y, you would use xor(x,y) 11. To reverse sort the nycflights13::flights data set by month and then by day, you would use filter(flights,desc(month),desc(day)) 12. To select all elements that are in x or y, you would use x|y 13. To add a column that calculates the sum of all the values of x up to that row, you could use mutate(data, summation = cumsum(x)) 14. To sort the nycflights13::flights data set by month and then by day, you would use filter(flights,month,day) 15. To get all the variables named x1 through x15 (out of 30 variables that start with x) from data, you could use select(data,num_range("x",1:15) Data Cleaning 1. The R function that pulls apart a column into multiple ones is separate() 2. The first step of tidying data is to figure out what the variables and observations are 3. pivot_longer() makes datasets longer by increasing the number of rows and decreasing the number of columns 4. Most data you encounter will be in a tidy format. a. False 5. Some reasons why data is stored in a non-tidy format are a. specialized fields have conventions for storing data b. alternative representations may have substantial performance or space advantages 6. pivot_wider() makes datasets wider by decreasing the number of rows and increasing the number of columns 7. A data set is tidy if a. each observation has its own row b. each variable has its own column c. each value has its own cell 8. A set operation treats observations as if they were set elements 9. A primary key uniquely identifies an observation in its own table 10. The R function right_join(x,y) keeps all observations from the second table (y) 11. The R function anti_join(x,y) drops all observations from the first table (x) that have a match in y 12. The R function setdiff(x,y) returns observations in x, but not in y 13. A key is a variable (or set of variables) that uniquely identifies an observation in a table 14. The R function full_join(x,y) keeps all observations from both tables 15. You cannot define relationships between three or more tables. a. False Graphs 1. To specify displ and hwy to use on the x and y axes respectively, you use the function aes(x=displ, y=hwy) 2. The function for plotting a line graph is geom_line() 3. To swap the x and y axes, you can use the function coord_flip() 4. The function for plotting a bar chart is geom_bar() 5. To make all the stacked bars in a geom_bar() plot the same height, you can use the adjustment position = "fill" 6. To create an empty graph using the mpg dataset, you would use ggplot(data = mpg) 7. The function for plotting a regression line is geom_smooth() 8. An aesthetic is a visual property of objects in a plot 9. To display a Coxcomb chart, you need to use the function coord_polar() 10. A geom is the geometrical object that a plot uses to represent data 11. The first step in creating a graph in R with ggplot is creating a coordinate system with the ggplot() function 12. The ggplot library uses layers to define the components of a graph 13. To avoid overplotting, you can use the geom_point() position adjustment postion = "jitter" 14. ggplot2 implements the grammar of graphics 15. To display a Coxcomb chart, you need to use the function coord_polar() 16. To af plots on the combination of two variables, you can use facet_grid()