A Practical Introduction to Pandas pivot table() function by BChen Towards Data Science

28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science Published in Towards Data Science You have 1 free member-only story left this month. Sign up for Medium and get an extra one BChen Follow Oct 6, 2020 · 5 min read · · Listen Save A Practical Introduction to Pandas pivot_table() function 7 hands-on tricks to effectively use Pandas pivot_table() function to summarize data Photo by William Iven on Unsplash https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 1/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science Pivot tables are one of Excel’s most powerful features. A pivot table allows us to draw insights from data. Pandas provides a similar function called Pandas pivot_table() pivot_table() . is a simple function but can produce very powerful analysis very quickly. In this article, we’ll explore how to use Pandas pivot_table() with the help of examples. Examples cover the following tasks: 1. Simplest Pivot table 2. Specifying values and performing aggregation 3. Seeing break down using columns 4. Replacing missing values 5. Displaying multiple values and adjusting view 6. Showing total 7. Generating a monthly report Please check out the Notebook for the source code. More tutorials are available from Github Repo. The Data In this tutorial, we will be working on coffee sales data. You can download it from my Github repo. Let’s import some libraries and load data to get started. import pandas as pd def load_data(): return pd.read_csv('coffee_sales.csv', parse_dates= ['order_date']) https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 2/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science df = load_data() df.head() We created a function 'order_date' load_data() to load coffee_sales.csv file with column as date datatype. All columns are pretty self-explanatory. 1. Simplest Pivot table The simplest pivot table must have an index . our index. By default, it is performing the In our example, let’s use the region as 'mean' aggregation function on all available numerical columns. df.pivot_table(index='region') To display multiple indexes, we can pass a list to index : https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 3/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science df.pivot_table(index=['region', 'product_category']) The values to index are the keys to group by on the pivot table. You can change the order of the values to get a different visual representation, for example, we want to take a look at average values by grouping the region with the product_category. df.pivot_table(index=['product_category', 'region']) https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 4/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science 2. Specifying values and performing aggregation By default, pivot_table performs the mean aggregation function on all numerical columns and returns the result. To explicitly specify the columns we care about, use the values argument. df.pivot_table(index=['region'], values=['sales']) https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 5/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science To perform an aggregation other than aggfunc mean , we can pass a valid string function to , for example, let’s do a sum: df.pivot_table(index=['region'], values=['sales'], aggfunc='sum') aggfunc can be a dict, and below is the dict equivalent. df.pivot_table( index=['region'], values=['sales'], https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 6/16 28/03/2023, 10:55 ) A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science aggfunc={ 'sales': 'sum' } aggfunc can be a list of functions, and below is an example to display the sum and count df.pivot_table( index=['region'], values=['sales'], aggfunc=['sum', 'count'] ) # The dict equivalent # aggfunc={ 'sales': ['sum', 'count']} 3. Seeing break down using columns If we would like to see sales broken down by product_category, the columns argument allows us to do that df.pivot_table( index=['region'], values=['sales'], aggfunc='sum', https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 7/16 28/03/2023, 10:55 ) A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science columns=['product_category'] 4. Replacing missing values You probably notice a NaN value from the previous output. We have got that because there aren't any Tea sales in the South. If we want to replace it, we could use fill_value argument, for example, to set NaN to 0. df.pivot_table( index=['region'], values=['sales'], aggfunc='sum', columns=['product_category'], fill_value=0 ) https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 8/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science 5. Displaying multiple values and adjusting view If we would like to see the sum of cost as well, we can add the cost column to the values list. df.pivot_table( index=['region'], values=['sales', 'cost'], aggfunc='sum', columns=['product_category'], fill_value=0 ) This does work, but the visual representation doesn’t seem to be useful when we would like to compare the cost and sales side by side. To get a better visual https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 9/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science representation, we can move product_category from the columns and add to the index . df.pivot_table( index=['region', 'product_category'], values=['sales', 'cost'], aggfunc='sum', fill_value=0 ) https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 10/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science Open in app Sign up Sign In Now, the pivot table is much convientent to see the difference between sales and cost side by side. 181 1 https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 11/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science 181 6. Showing total 1 Another common display in pivot table is to show the total. In Panda , we can simply pass pivot_table() margins=True : df.pivot_table( index=['region', 'product_category'], values=['sales', 'cost'], aggfunc='sum', fill_value=0, margins=True ) https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 12/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 13/16 28/03/2023, 10:55 A Practical Introduction to Pandas pivot_table() function | by BChen | Towards Data Science 7. Generating a monthly report Raw sales data is rarely aggregated by month for us. This type of data is often captured by the day. However, managers often want reports by month instead of detail by day. To generate a monthy sales report with Panda pivot_table() , here are the steps: (1) defines a groupby instruction using Grouper() with key='order_date' and freq='M' (2) defines a condition to filter the data by year, for example 2010 (3) Use Pandas method chaining to chain the filtering and pivot_table() . month_gp = pd.Grouper(key='order_date',freq='M') cond = df["order_date"].dt.year == 2010 ( ) df[cond] .pivot_table(index=['region','product_category'], columns=[month_gp], values=['sales'], aggfunc=['sum']) That’s it Thanks for reading. Please checkout the notebook on my Github for the source code. https://towardsdatascience.com/a-practical-introduction-to-pandas-pivot-table-function-3e1002dcd4eb 14/16

A Practical Introduction to Pandas pivot table() function by BChen Towards Data Science

Related documents

Products

Support

A Practical Introduction to Pandas pivot table() function by BChen Towards Data Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib