Guide Questions Reading the data o How would you read a dataset using read_csv method? What are the necessary arguments in using read_csv method? The pandas function read_csv() reads in values, where the delimiter is a comma character. the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data.csv, which you specified with the first argument. o How would you read a dataset using open method in Python? What are the necessary arguments in using open method? o How would you access a data from a URL ? What are other arguments necessary in reading data from the URL? o How would you read an excel dataset? What are other arguments necessary in reading data in excel format? read_excel() returns a new DataFrame that contains the values from data.xlsx. You can also use read_excel() with OpenDocument spreadsheets, or .ods files. Summary, dimensions and structure of data o How would you get the summary, dimensions and structure of your data? Pandas dataframe.info() function is used to get a concise summary of the dataframe. To get a quick overview of the dataset we use the dataframe.info() function. Pandas .size, .shape and .ndim are used to return size, shape and dimensions of data frames and series. Syntax: dataframe.size Return : Returns size of dataframe/series which is equivalent to total number of elements. That is rows x columns. Syntax: dataframe.shape Return : Returns tuple of shape (Rows, columns) of dataframe/series Syntax: dataframe.ndim Return : Returns dimension of dataframe/series. 1 for one dimension (series), 2 for two dimension (dataframe) o How would you get the type of data in each column? DataFrame.dtypes Return the dtypes in the DataFrame. This returns a Series with the data type of each column. The result’s index is the original DataFrame’s columns Data Cleaning activities: Handling missing values o What are the approaches in handling missing values? Understanding your data through fundamentals of statistics o o How would get the following statistical description from your dataset? Mean Median Mode Variance Standard deviation Percentiles Ranges Identify possible outliers in your dataset