Pivoting Data with Pandas in Python – Data Analytics in Python – Learning By Doing

Big Data Analytics Machine Learning Deep Learning Ali Raza Bhayani

 

In this section of the series Data Analytics in Python, we will go through Pivot Tables in Pandas which is a handy technique for exploring data from different dimensions and extracting insights from Data. Pivoting is one of the main techniques used in Business Intelligence Solutions and Data Science for slicing and Dicing the data.

Lets consider initial sales data of the 5 Regions for the years 2016, 2017 and 2018.

Year Central East North South West
2016 300 150 500 325 200
2017 200 300 450 300 200
2018 250 225 150 375 150

Table 1: Initial Sales Data Table

Now consider that a report is needed to be generated where the Data Values of Years is to be shown as columns and generate a summary of each year accordingly something as follows:

Year 2016 2017 2018
Central 300 200 250
East 150 300 225
North 500 450 150
South 325 300 375
West 200 200 150

Table 2: Preview of the Desired ResultĀ 

For generating the required output illustrated above, we will need to Pivot our data. Let’s dive in Python and first make a Pandas’ DataFrame out of the initial data presented in Table 1:

import pandas as pd

df = pd.DataFrame({'Year': ['2016','2017','2018'],
'North' : [500, 450, 150],
'East' : [150, 300, 225],
'South': [325, 300, 375],
'West': [200, 200, 150],
'Central':[300, 200, 250],})

>>> df
   Central  East  North  South  West  Year
0      300   150    500    325   200  2016
1      200   300    450    300   200  2017
2      250   225    150    375   150  2018

To generate the required output illustrated in Table 2, we will use Pandas DataFrame’s pivot_table:

>>> df.pivot_table(columns='Year')
Year     2016  2017  2018
Central   300   200   250
East      150   300   225
North     500   450   150
South     325   300   375
West      200   200   150

A more readable output of the above function is as follows:

Year 2016 2017 2018
Central 300 200 250
East 150 300 225
North 500 450 150
South 325 300 375
West 200 200 150

Table 3: Desired Result Generated with Pandas Pivot TableĀ 

In this way, we can change the dimensions of our data and generate different insights out of it.

Electronics Engineer by book, Software Architect and Technopreneur by passion, Open Source Enthusiast, Problem Hacker, Enabler, Do-Tank, Blogger, Autodidact, Yogi and an avid Reader. Involved in Building Products. Having loads of experience and technical expertise in areas ranging from Full Stack Web Application Development to Big Data Analysis, Modeling, Processing and Visualization, he is currently involved in working on Python, Django, Javascript, SQL, Bootstrap, PostgreSQL, RRD (Round Robin Database), MySQL, MonetDB, LevelDB, BerkeleyDB, Redis, Apache Spark, Pandas, SciPy, NumPy etc.

Ali Raza received his Masters Degree in Electronics Engineering which involved Research focused on Machine Learning. He is currently working as a Chief Technical Officer at BitWits (Pvt) Limited, CEO & Founder at DataLysis.io and CEO & Founder at LearningByDoing.io.

Please follow and like us:

Leave a Reply

Your email address will not be published. Required fields are marked *