R Programming Assignment: Data Analysis with Airline Data

Exercise Problems – R Programming Assignment-3 Name : Boobathy A Reg.no : 191921047 Subject Code : ITA0411 Date of Submission: Tue, 11 May 2021 1. Briefly describe your respective dataset. Identify which package the dataset belongs to? Display the structure of your dataset and also print the variables (fields) included in it. Import the dataset into the R environment before data analysis. The working directory should be set with your registration number. Solution: Description: The classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960. Format: A monthly time series, in thousands. Program: #Getting and Setting the Working Directory getwd() setwd('C:/Users/BOOBATHY A/Documents/191921047') getwd() Output: #Import the csv file in R by using the read.csv() function: #Once the file is readed it will show in environment: #view(df): #str()-function in R is used for displaying the internal structure of the Dataset. #dim()-function of the R returns the the number of columns and rows of Dataset #class()-function represents the set of properties or methods that are common to all objects of one type. #head()-function in R is used to get the first 6 rows and columns of a Dataset. #tail()-function in R is used to get the last 6 rows and columns of a Dataset. #names()-function gives the variables name in dataset. #summary()-function used to produce the summaries of the various model fitting functions. # $- is used to access each column in the dataset or dataframes. #unique()-function in R gets the unique values in a particular column. #length()-function get the length of the variable Here, Length of unique values in time column is 144 Length of unique values in value column is 118 2. Perform data analysis by computing the measures of central tendency as well as measures of dispersion (for minimum for 2 variables under consideration). Substantiate your answers with appropriate reasoning. Solution: #mean()-function calculates sum of the values and dividing with the number of values in a data series. #median()-middle most value in a data series. #mode()-the value that has highest number of occurrences in a set of data #na.rm is used to omit the missing values from the dataset. Here, In my dataset I don't have NA values. So I have not used it. Program: y<-table(df$value) print(y) mode <- names(y)[which(y == max(y))] print(mode) Output: 3.What is the significance of the measurement you had explored with respect to your data? Write a report on the same. Report on Dataset #sd()-function used to find the standard deviation Additional: Program: library(moments) png(file = "values.png") print(skewness(df$value)) hist(df$value) dev.off() Program: library(moments) png(file = "values.png") print(skewness(df$time)) boxplot(df$time) dev.off() ggplot: ggplot2 is an open-source data visualization package for the statistical R programming 4.Find out how many missing values (NA) are there in the dataset. Create a subset of your dataset after removing the missing values. Solution: #subset()-function in R is used to create subsets of a Dataset or dataframe. 5.Create a contingency table using crosstab between any two columns which is more influential and calculate the correlation between them. #Corr.test()- is used to evaluate the association between two or more variables.

R Programming Assignment: Data Analysis with Airline Data

Related documents

Products

Support

R Programming Assignment: Data Analysis with Airline Data

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib