Pandas Indexing Tutorial

INDEXING Indexing in Pandas : Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection. Selecting some rows and some columns Let’s take a DataFrame with some fake data, now we perform indexing on this DataFrame. In this, we are selecting some rows and some columns from a DataFrame. Dataframe with dataset. Suppose we want to select columns with a labels Amir Johnson and Terry Age, College Rozier and Salary for only rows Our final DataFrame would look like this: Selecting some rows and all columns Let’s say we want to select row Amir Jhonson, Terry Holland with all columns in a dataframe. Rozier and John Our final DataFrame would look like this: Selecting some columns and all rows Let’s say we want to select columns Age, Height and Salary with all rows in a dataframe. Our final DataFrame would look like this: Pandas Indexing using [ ], .loc[], .iloc[ ], .ix[ ] There are a lot of ways to pull the elements, rows, and columns from a DataFrame. There are some indexing method in Pandas which help in getting an element from a DataFrame. These indexing methods appear very similar but behave very differently. Pandas support four types of Multi-axes indexing they are:  Dataframe.[ ] ; This function also known as indexing operator  Dataframe.loc[ ] : This function is used for labels.  Dataframe.iloc[ ] : This function is used for positions or integer based Collectively, they are called the indexers. These are by far the most common ways to index data. These are four function which help in getting the elements, rows, and columns from a DataFrame. Indexing a Dataframe using indexing operator [] : Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing operator to refer to df[]. Selecting a single columns In order to select a single column, we simply put the name of the column inbetween the brackets # importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving columns by indexing operator first = data["Age"] print(first) Output: Selecting multiple columns In order to select multiple columns, we have to pass a list of columns in an indexing operator. # importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving multiple columns by indexing operator first = data[["Age", "College", "Salary"]] first Output: Indexing a DataFrame using .loc[ ] : This function selects data by the label of the rows and columns. The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns. Selecting a single row In order to select a single row using .loc[], we put a single row label in a .loc function. # importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving row by loc method first = data.loc["Avery Bradley"] second = data.loc["R.J. Hunter"] print(first, "\n\n\n", second) Output: As shown in the output image, two series were returned since there was only one parameter both of the times. Selecting multiple rows In order to select multiple rows, we put all the row labels in a list and pass that to .loc function. import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving multiple rows by loc method first = data.loc[["Avery Bradley", "R.J. Hunter"]] print(first) Output: Selecting two rows and three columns In order to select two rows and three columns, we select a two rows which we want to select and three columns and put it in a separate list like this: Dataframe.loc[["row1", "row2"], ["column1", "column2", "column3"]] import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving two rows and three columns by loc method first = data.loc[["Avery Bradley", "R.J. Hunter"], ["Team", "Number", "Position"]] print(first) Output: Selecting all of the rows and some columns In order to select all of the rows and some columns, we use single colon [:] to select all of rows and list of some columns which we want to select like this: Dataframe.loc[:, ["column1", "column2", "column3"]] import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving all rows and some columns by loc method first = data.loc[:, ["Team", "Number", "Position"]] print(first) Output: Indexing a DataFrame using .iloc[ ] : This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections. Selecting a single row In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function. import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving rows by iloc method row2 = data.iloc[3] print(row2) Output: Selecting multiple rows In order to select multiple rows, we can pass a list of integer to .iloc[] function. import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving multiple rows by iloc method row2 = data.iloc [[3, 5, 7]] row2 Output: Selecting two rows and two columns In order to select two rows and two columns, we create a list of 2 integer for rows and list of 2 integer for columns then pass to a .iloc[] function. import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving two rows and two columns by iloc method row2 = data.iloc [[3, 4], [1, 2]] print(row2) Output: Selecting all the rows and a some columns In order to select all rows and some columns, we use single colon [:] to select all of rows and for columns we make a list of integer then pass to a .iloc[] function. import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv", index_col ="Name") # retrieving all rows and some columns by iloc method row2 = data.iloc [:, [1, 2]] print(row2) Output: MULTI-INDEXING Multi-index allows you to select more than one row and column in your index. It is a multi-level or hierarchical object for pandas object. Now there are various methods of multi-index that are used such as MultiIndex.from_arrays, MultiIndex.from_tuples, MultiIndex.from_pro duct, MultiIndex.from_frame, etc which helps us to create multiple indexes from arrays, tuples, dataframes, etc. Syntax: pandas.MultiIndex(levels=None, codes=None, sortorder=None, names=None, dtype=None, copy=False, name=None, verify_integrity=True)  levels: It is a sequence of arrays which shows the unique labels for each level.  codes: It is also a sequence of arrays where integers at each level helps us to designate the labels in that location.  sortorder: optional int. It helps us to sort the levels lexicographically.  dtype:data-type(size of the data which can be of 32 bits or 64 bits)  copy: It is a boolean type parameter with default value as False. It helps us to copy the metadata.  verify_integrity: It is a boolean type parameter with default value as True. It checks the integrity of the levels and codes i.t if they are valid. Let us see some examples to understand the concept better. Example 1: In this example, we will be creating multi-index from arrays. Arrays are preferred over tuples because tuples are immutable whereas if we want to change a value of an element in an array, we can do that. So let us move to the code and its explanation: After importing all the important libraries, we are creating an array of names along with arrays of marks and age respectively. Now with the help of MultiIndex.from_arrays, we are combining all the three arrays together such that elements from all the three arrays form multiple indexes together. After that, we are showing the above result. # importing pandas library from # python import pandas as pd # Creating an array of names arrays = ['Sohom','Suresh','kumkum','subrata'] # Creating an array of ages age= [10, 11, 12, 13] # Creating an array of marks marks=[90,92,23,64] # Using MultiIndex.from_arrays, we are # combining the arrays together along # with their names and creating multi-index # with each element from the 3 arrays into # different rows pd.MultiIndex.from_arrays([arrays,age,marks], names=('names', 'age','marks')) Output:

Pandas Indexing Tutorial

Related documents

Products

Support

Pandas Indexing Tutorial

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib