Uploaded by Casey Ageas

Guide Questions Datasets

advertisement
Guide Questions
Reading the data
o
How would you read a dataset using read_csv method? What are the necessary arguments in
using read_csv method?
The pandas function read_csv() reads in values, where the delimiter is a comma
character. the Pandas read_csv() function returns a new DataFrame with the data and
labels from the file data.csv, which you specified with the first argument.
o
How would you read a dataset using open method in Python? What are the necessary
arguments in using open method?
o
How would you access a data from a URL ? What are other arguments necessary in reading
data from the URL?
o
How would you read an excel dataset? What are other arguments necessary in reading data in
excel format?
read_excel() returns a new DataFrame that contains the values from data.xlsx. You can
also use read_excel() with OpenDocument spreadsheets, or .ods files.
Summary, dimensions and structure of data
o
How would you get the summary, dimensions and structure of your data?
Pandas dataframe.info() function is used to get a concise summary of the dataframe.
To get a quick overview of the dataset we use the dataframe.info() function.
Pandas .size, .shape and .ndim are used to return size, shape and dimensions of data
frames and series.
Syntax: dataframe.size
Return : Returns size of dataframe/series which is equivalent to total number of
elements. That is rows x columns.
Syntax: dataframe.shape
Return : Returns tuple of shape (Rows, columns) of dataframe/series
Syntax: dataframe.ndim
Return : Returns dimension of dataframe/series. 1 for one dimension (series), 2 for
two dimension (dataframe)
o
How would you get the type of data in each column?
DataFrame.dtypes
Return the dtypes in the DataFrame.
This returns a Series with the data type of each column. The result’s index is the original
DataFrame’s columns
Data Cleaning activities: Handling missing values
o
What are the approaches in handling missing values?
Understanding your data through fundamentals of statistics
o
o
How would get the following statistical description from your dataset?

Mean

Median

Mode

Variance

Standard deviation

Percentiles
 Ranges
Identify possible outliers in your dataset
Download