In this tutorial, we will explore how we can apply Arbitrary functions to our groupings in Pandas.Its a handy technique while analyzing and performing data analytics with Python and Pandas. We can use Pandas GroupBy using Higher Order Function and apply Custom Aggregations.

>>> import pandas as pd >>> df = pd.DataFrame({'Student':['Beth', 'Alex', 'Diana', 'Adrian'], 'Age': [18, 19, 18, 19], 'Math': [75, 82, 89, 85], 'Science': [65, 75, 86, 90], 'Teacher': ['William', 'William', 'Robert', 'Robert']})

Just to get an idea how our data looks, we can print the records as a Table:

>>> df.head() Age Math Science Student Teacher 0 18 75 65 Beth William 1 19 82 75 Alex William 2 18 89 86 Diana Robert 3 19 85 90 Adrian Robert

Consider the following max function applied on GroupBy Teacher:

>>> df.groupby('Teacher').max() Age Math Science Student Teacher Robert 19 89 90 Diana William 19 82 75 Beth

The pre-defined max function can also be used in the following way:

>>> df.groupby('Teacher').apply(max) Age Math Science Student Teacher Teacher Robert 19 89 90 Diana Robert William 19 82 75 Beth William

In the code above, we passed function as an argument to ‘apply’ function. Notice that in this way we can also pass custom defined functions and get our desired results. Lets define a function which finds best teacher in our case:

def best_teacher(group_dframe): return pd.DataFrame({'Math': [group_dframe.loc[group_dframe.Math.idxmax()].Teacher], 'Science': [group_dframe.loc[group_dframe.Science.idxmax()].Teacher]})

The function above takes a Pandas Grouped DataFrame as an argument and in turn returns a DataFrame with Teacher’s name corresponding to the Subjects’ max scores.

Lets examine the function more closely. Consider the list which is being passed as a value for key ‘Math’ in the dictionary defined in the function above:

[group_dframe.loc[group_dframe.Math.idxmax()].Teacher]

Lets disect the above list step by step for better understanding of whats going on.

group_dframe.Math.idxmax()

The above line returns the index of the maximum value for Math.

group_dframe.loc[group_dframe.Math.idxmax()]

Now by using .loc function, we will fetch the row by using the previously fetched index of maximum value for Math. For more on .loc, you can see my post How to use .loc, .iloc, .ix in Pandas .

Now finally:

group_dframe.loc[group_dframe.Math.idxmax()].Teacher

The line above fetches the Teacher from the row extracted in the previous step. Since that row was for the maximum score for Math, the Teacher returned here is the one whose students get maximum marks in Maths.

Now lets define a groupby DataFrame and apply our function:

>>> group_dframe = df.groupby('Age') >>> group_dframe.apply(best_teacher) Math Science Age 18 0 Robert Robert 19 0 Robert Robert

In this way, we fetched the best teacher according to the age group for each subject based on the max scores.