Uploaded by Kalpana Vijayaraghavan

INDEXING AND MULTIINDEXING

advertisement
INDEXING
Indexing in Pandas :
Indexing in pandas means simply selecting particular rows and columns of
data from a DataFrame. Indexing could mean selecting all the rows and
some of the columns, some of the rows and all of the columns, or some of
each of the rows and columns. Indexing can also be known as Subset
Selection.
Selecting some rows and some columns
Let’s take a DataFrame with some fake data, now we perform indexing on
this DataFrame. In this, we are selecting some rows and some columns from
a DataFrame. Dataframe with dataset.
Suppose we want to select columns
with a labels Amir Johnson and Terry
Age, College
Rozier
and Salary for only rows
Our final DataFrame would look like this:
Selecting some rows and all columns
Let’s say we want to select row Amir Jhonson, Terry
Holland with all columns in a dataframe.
Rozier
and John
Our final DataFrame would look like this:
Selecting some columns and all rows
Let’s say we want to select columns Age, Height and Salary with all rows in a
dataframe.
Our final DataFrame would look like this:
Pandas Indexing using
[ ], .loc[], .iloc[ ], .ix[ ]
There are a lot of ways to pull the elements, rows, and columns from a
DataFrame. There are some indexing method in Pandas which help in
getting an element from a DataFrame. These indexing methods appear very
similar but behave very differently. Pandas support four types of Multi-axes
indexing they are:
 Dataframe.[ ] ; This function also known as indexing operator
 Dataframe.loc[ ] : This function is used for labels.
 Dataframe.iloc[ ] : This function is used for positions or integer based
Collectively, they are called the indexers. These are by far the most
common ways to index data. These are four function which help in getting
the elements, rows, and columns from a DataFrame.
Indexing a Dataframe using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object.
The .loc and .iloc indexers also use the indexing operator to make
selections. In this indexing operator to refer to df[].
Selecting a single columns
In order to select a single column, we simply put the name of the column inbetween the brackets
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving columns by indexing operator
first = data["Age"]
print(first)
Output:
Selecting multiple columns
In order to select multiple columns, we have to pass a list of columns in an
indexing operator.
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving multiple columns by indexing operator
first = data[["Age", "College", "Salary"]]
first
Output:
Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and columns.
The df.loc indexer selects data in a different way than just the indexing
operator. It can select subsets of rows or columns. It can also simultaneously
select subsets of rows and columns.
Selecting a single row
In order to select a single row using .loc[], we put a single row label in
a .loc function.
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)
Output:
As shown in the output image, two series were returned since there was only
one parameter both of the times.
Selecting multiple rows
In order to select multiple rows, we put all the row labels in a list and pass
that to .loc function.
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving multiple rows by loc method
first = data.loc[["Avery Bradley", "R.J. Hunter"]]
print(first)
Output:
Selecting two rows and three columns
In order to select two rows and three columns, we select a two rows which
we want to select and three columns and put it in a separate list like this:
Dataframe.loc[["row1", "row2"], ["column1", "column2", "column3"]]
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving two rows and three columns by loc method
first = data.loc[["Avery Bradley", "R.J. Hunter"],
["Team", "Number", "Position"]]
print(first)
Output:
Selecting all of the rows and some columns
In order to select all of the rows and some columns, we use single
colon [:] to select all of rows and list of some columns which we want to
select like this:
Dataframe.loc[:, ["column1", "column2", "column3"]]
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving all rows and some columns by loc method
first = data.loc[:, ["Team", "Number", "Position"]]
print(first)
Output:
Indexing a DataFrame using .iloc[ ] :
This function allows us to retrieve rows and columns by position. In order to
do that, we’ll need to specify the positions of the rows that we want, and the
positions of the columns that we want as well. The df.iloc indexer is very
similar to df.loc but only uses integer locations to make its selections.
Selecting a single row
In order to select a single row using .iloc[], we can pass a single integer
to .iloc[] function.
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving rows by iloc method
row2 = data.iloc[3]
print(row2)
Output:
Selecting multiple rows
In order to select multiple rows, we can pass a list of integer
to .iloc[] function.
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving multiple rows by iloc method
row2 = data.iloc [[3, 5, 7]]
row2
Output:
Selecting two rows and two columns
In order to select two rows and two columns, we create a list of 2 integer for
rows and list of 2 integer for columns then pass to a .iloc[] function.
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving two rows and two columns by iloc method
row2 = data.iloc [[3, 4], [1, 2]]
print(row2)
Output:
Selecting all the rows and a some columns
In order to select all rows and some columns, we use single colon [:] to
select all of rows and for columns we make a list of integer then pass to
a .iloc[] function.
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving all rows and some columns by iloc method
row2 = data.iloc [:, [1, 2]]
print(row2)
Output:
MULTI-INDEXING
Multi-index allows you to select more than one row and column in your
index. It is a multi-level or hierarchical object for pandas object. Now there
are various methods of multi-index that are used such
as MultiIndex.from_arrays, MultiIndex.from_tuples, MultiIndex.from_pro
duct, MultiIndex.from_frame, etc which helps us to create multiple indexes
from arrays, tuples, dataframes, etc.
Syntax: pandas.MultiIndex(levels=None, codes=None, sortorder=None,
names=None, dtype=None, copy=False, name=None, verify_integrity=True)
 levels: It is a sequence of arrays which shows the unique labels for each
level.
 codes: It is also a sequence of arrays where integers at each level helps
us to designate the labels in that location.
 sortorder: optional int. It helps us to sort the levels lexicographically.
 dtype:data-type(size of the data which can be of 32 bits or 64 bits)
 copy: It is a boolean type parameter with default value as False. It helps
us to copy the metadata.
 verify_integrity: It is a boolean type parameter with default value as
True. It checks the integrity of the levels and codes i.t if they are valid.
Let us see some examples to understand the concept better.
Example 1:
In this example, we will be creating multi-index from arrays. Arrays are
preferred over tuples because tuples are immutable whereas if we want to
change a value of an element in an array, we can do that. So let us move to
the code and its explanation:
After importing all the important libraries, we are creating an array of names
along with arrays of marks and age respectively. Now with the help of
MultiIndex.from_arrays, we are combining all the three arrays together such
that elements from all the three arrays form multiple indexes together. After
that, we are showing the above result.
# importing pandas library from
# python
import pandas as pd
# Creating an array of names
arrays = ['Sohom','Suresh','kumkum','subrata']
# Creating an array of ages
age= [10, 11, 12, 13]
# Creating an array of marks
marks=[90,92,23,64]
# Using MultiIndex.from_arrays, we are
# combining the arrays together along
# with their names and creating multi-index
# with each element from the 3 arrays into
# different rows
pd.MultiIndex.from_arrays([arrays,age,marks], names=('names',
'age','marks'))
Output:
Related documents
Download