Uploaded by Joan Souza

A Practical Introduction to Pandas pivot table() function by BChen Towards Data Science

advertisement
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
Published in Towards Data Science
You have 1 free member-only story left this month. Sign up for Medium and get an extra one
BChen
Follow
Oct 6, 2020 · 5 min read ·
·
Listen
Save
A Practical Introduction to Pandas
pivot_table() function
7 hands-on tricks to effectively use Pandas pivot_table() function to
summarize data
Photo by William Iven on Unsplash
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
1/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
Pivot tables are one of Excel’s most powerful features. A pivot table allows us to
draw insights from data. Pandas provides a similar function called
Pandas
pivot_table()
pivot_table() .
is a simple function but can produce very powerful analysis
very quickly.
In this article, we’ll explore how to use Pandas
pivot_table()
with the help of
examples. Examples cover the following tasks:
1. Simplest Pivot table
2. Specifying
values
and performing aggregation
3. Seeing break down using
columns
4. Replacing missing values
5. Displaying multiple
values
and adjusting view
6. Showing total
7. Generating a monthly report
Please check out the Notebook for the source code.
More tutorials are available from Github Repo.
The Data
In this tutorial, we will be working on coffee sales data. You can download it from
my Github repo.
Let’s import some libraries and load data to get started.
import pandas as pd
def load_data():
return pd.read_csv('coffee_sales.csv', parse_dates=
['order_date'])
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
2/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
df = load_data()
df.head()
We created a function
'order_date'
load_data()
to load coffee_sales.csv file with column
as date datatype.
All columns are pretty self-explanatory.
1. Simplest Pivot table
The simplest pivot table must have an
index .
our index. By default, it is performing the
In our example, let’s use the region as
'mean'
aggregation function on all
available numerical columns.
df.pivot_table(index='region')
To display multiple indexes, we can pass a list to
index :
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
3/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
df.pivot_table(index=['region', 'product_category'])
The values to
index
are the keys to group by on the pivot table. You can change the
order of the values to get a different visual representation, for example, we want to
take a look at average values by grouping the region with the product_category.
df.pivot_table(index=['product_category', 'region'])
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
4/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
2. Specifying values and performing aggregation
By default,
pivot_table
performs the
mean
aggregation function on all numerical
columns and returns the result. To explicitly specify the columns we care about, use
the
values
argument.
df.pivot_table(index=['region'], values=['sales'])
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
5/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
To perform an aggregation other than
aggfunc
mean
, we can pass a valid string function to
, for example, let’s do a sum:
df.pivot_table(index=['region'], values=['sales'], aggfunc='sum')
aggfunc
can be a dict, and below is the dict equivalent.
df.pivot_table(
index=['region'],
values=['sales'],
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
6/16
28/03/2023, 10:55
)
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
aggfunc={ 'sales': 'sum' }
aggfunc
can be a list of functions, and below is an example to display the sum and
count
df.pivot_table(
index=['region'],
values=['sales'],
aggfunc=['sum', 'count']
)
# The dict equivalent
# aggfunc={ 'sales': ['sum', 'count']}
3. Seeing break down using columns
If we would like to see sales broken down by product_category, the
columns
argument allows us to do that
df.pivot_table(
index=['region'],
values=['sales'],
aggfunc='sum',
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
7/16
28/03/2023, 10:55
)
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
columns=['product_category']
4. Replacing missing values
You probably notice a
NaN
value from the previous output. We have got that because
there aren't any Tea sales in the South. If we want to replace it, we could use
fill_value
argument, for example, to set
NaN
to
0.
df.pivot_table(
index=['region'],
values=['sales'],
aggfunc='sum',
columns=['product_category'],
fill_value=0
)
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
8/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
5. Displaying multiple values and adjusting view
If we would like to see the sum of cost as well, we can add the cost column to the
values
list.
df.pivot_table(
index=['region'],
values=['sales', 'cost'],
aggfunc='sum',
columns=['product_category'],
fill_value=0
)
This does work, but the visual representation doesn’t seem to be useful when we
would like to compare the cost and sales side by side. To get a better visual
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
9/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
representation, we can move product_category from the
columns
and add to the
index .
df.pivot_table(
index=['region', 'product_category'],
values=['sales', 'cost'],
aggfunc='sum',
fill_value=0
)
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
10/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
Open in app
Sign up
Sign In
Now, the pivot table is much convientent to see the difference between sales and
cost side by side.
181
1
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
11/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
181
6. Showing total
1
Another common display in pivot table is to show the total. In Panda
, we can simply pass
pivot_table()
margins=True :
df.pivot_table(
index=['region', 'product_category'],
values=['sales', 'cost'],
aggfunc='sum',
fill_value=0,
margins=True
)
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
12/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
13/16
28/03/2023, 10:55
A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science
7. Generating a monthly report
Raw sales data is rarely aggregated by month for us. This type of data is often
captured by the day. However, managers often want reports by month instead of
detail by day. To generate a monthy sales report with Panda
pivot_table() ,
here are
the steps:
(1) defines a groupby instruction using
Grouper()
with
key='order_date'
and
freq='M'
(2) defines a condition to filter the data by year, for example 2010
(3) Use Pandas method chaining to chain the filtering and
pivot_table() .
month_gp = pd.Grouper(key='order_date',freq='M')
cond = df["order_date"].dt.year == 2010
(
)
df[cond]
.pivot_table(index=['region','product_category'],
columns=[month_gp],
values=['sales'],
aggfunc=['sum'])
That’s it
Thanks for reading.
Please checkout the notebook on my Github for the source code.
https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb
14/16
Download