Home and Learn: Data Analysis

Lesson Created May 2023

The Pandas Apply Function

You can use the Pandas apply function to apply a Python function to your data series or Dataframe. To clear that up, we're going to use our pets CSV file again. If oyu haven't yet downloaded this dataset, you can do so here:

Download the Pets Data CSV File (Right-click, Save As)

The Ownergender column in the dataset has an F or a M in it (again, apologies for the binary nature of this example):

Pet data displayed as a table.

Now, what if we wanted to convert the F to Female and the M to Male? How would we go about this? Well, we can use apply. Let's see how.

Load the CSV pets data again with these lines, changing PATH_TO_FILE to point to a location on your own computer:

import pandas as pd
df_pets = pd.read_csv('PATH_TO_FILE\\pets-data.csv')
df_pets.head()

In a new Jupyter Notebook cell, add this Python function:

def OwnerGenderFormat(gen):
    if gen == "F":
        return "Female"
    elif gen == "M":
        return "Male"
    else:
        return "None"

It should look like this, in your Notebook: (Make sure the indents are the same as in the image above. The if should be one press of the tab key on your keyboard and the return should be two tabs.)

A Python function in a Jupyter Notebook cell.

Make sure to run this code. It's just a simple if statement, though. It returns either Female or Male, depending on the value of gen.

Test out this function in a new Notebook cell. Add this to it:

OwnerGenderFormat("F")

Run the code and you should see this:

The result from a Pythomn function in a Jupyter Noteboook cell.

OK, the function works well when tested. Now to apply it to the OwnerGender column.

In a new Notebook cell, enter these lines:

df_pets['OwnerGender'] = df_pets['OwnerGender'].apply(OwnerGenderFormat)
df_pets.head()

Run the code and the new values for the OwnerGender should display:

A converted column in Pandas when a Python function is applied.

To understand the code, first look at what we have to the left of the equal sign (=). We have this:

df_pets['OwnerGender']

This is just referencing the OwnerGender column (a series) of our df_pets Dataframe.

To the right of the equal sign, we have this:

df_pets['OwnerGender'].apply(OwnerGenderFormat)

Again, we have the reference to the OwnerGender column. After a dot, we then have this:

apply(OwnerGenderFormat)

In between the round brackets of the apply function, we have the name of our Python function. Pandas will apply this function to every cell in the OwnerGender column. Behind the scenes, it passes each cell value over to the function, storing the value in the gen variable. When it's done, it writes the results back to the OwnerGender column on the left of the equal sign.

Now let's do something a little more sophisticated. We'll switch back to our Student Data. If you haven't yet downloaded this dataset, you can do so here:

Student Scores Data Set (right click, Save As)

Load up the dataset with these lines:

import pandas as pd
df_students = pd.read_csv('PATH_TO_FILE\\StudentScores.csv')
df_students.head(10)

Obviously, replace PATH_TO_FILE to point to a location on your own computer.

Looking at the data, we can see we have exam score, but there no grades. What we'd like to do is to add a new column with Exam Grades as the values. Let's start with the Math grades.

The first thing to do is to add a new column. There are lots of ways to add columns in Pandas. For us, though, it would make sense just to copy the Math column. We can then convert scores into grades with this new column.

In a new cell in your Jupyter Notebook, add and run this code:

df_students['MathGrades'] = df_students['Math']
df_students.head()

We're creating a new column to the left of the equal sign (=). First, we have our Dataframe name, df_students. In between the square brackets of the Dataframe, we have given the new column the name MathGrades. This goes between quote marks.

To the right of the equal sign, we have this:

= df_students['Math']

This just copies the Math column to whatever is on the left of the equal sign, which is a new column for us.

When you run the code, you should see this:

A new column added in Pandas.

And there's our new column on the end.

Let's convert the scores to grades. Add this function in a new cell in your Notebook:

def getGrade(val):
    if val >= 90 and val <= 100:
        return "A+"
    elif val >= 80 and val <= 89:
        return "A"
    elif val >= 70 and val <= 79:
        return "B+"
    elif val >= 60 and val <= 69:
        return "B"
    elif val >= 50 and val <= 59:
        return "C+"
    elif val >= 40 and val <= 49:
        return "C"
    elif val >= 30 and val <= 39:
        return "D"
    else:
        return "F"

It should look like this in your Notebook:

A Python function in a Jupyter Notebook cell.

Make sure your cursor is flashing inside the cell and Run the code. Running the code when you've finished typing your function will ensure that Pandas knows about it. Plus, if you've made any mistakes in your Python code, you'll see an error message you can use to correct your code.

In a new cell, test the code out with this line:

'Grade is ' + getGrade(34)

When you run the code, you should see a grade of D print out.

OK, our new function works. Now let's apply it to our new column. Add this code in a new column:

df_students['MathGrades'] = df_students['Math'].apply(getGrade)
df_students.head()

Run the code and you should see this:

A column of numbers converted to grades in Pandas.

We now have a column of Math grades.

The code is pretty much the same as before. Last time, we had this:

df_students['MathGrades'] = df_students['Math']

This copied the values in the Math column over to a new column called MathGrades. This time, we have this:

df_students['MathGrades'] = df_students['Math'].apply(getGrade)

The only difference is the apply function on the end:

apply(getGrade)

Pandas gets a grade for each value in the Math column. It then sets that grade in the MathGrades column (to the left of the equal sign.) We can do it this way because MathGrades is an exact copy of the Math column.

See if you can do the other two columns by yourself. Add two new columns. Convert the Physics and Computers scores to grades.

Apply and Lambda Functions

As well as writing your own custom Python functions to use in Pandas, you can also apply something called a lambda function. Lambda functions are seen in a lot of programming languages and are not exclusive to Pandas. They are said to be anonymous functions, in that they don't need a name. Our Python functions, for example, were called OwnerGenderFormat and getGrade. With a lambda function, we don't need to come up with a name. We just create a variable and use that. If that's not too clear, let's try an example.

Previously, we did this to convert F in Female and M into Male:

df_pets['OwnerGender'] = df_pets['OwnerGender'].apply(OwnerGenderFormat)

The Pandas apply function called our custom-made OwnerGenderFormat function into action. OwnerGenderFormat was a Python function. But, if we were to use a lambda, we woudn't need a separate Python function (though you can use Python functions in lambda expressions).

The structure of a lambda is this:

apply(lambda x: CODE_GOES_HERE)

The x after lambda is just a variable name and you can call it anything you want. After a colon, you add the code you want to apply. This can be just about anything. For example, we can divide one column of values by another:

df['New_COL_NAME'] = df.apply( lambda x: x['COL_1'] / x['COL_2'] )

Pandas will look for a column in the dataframe (df) called COL_1 then divide it by the value in COL_2. The result is returned to df['New_COL_NAME']

But think of lambdas as a loop that goes through all your cell data.

We can rewrite our OwnerGender code to use a lambda instead. That way, we won't need a separate Python function. We can rewrite it like this:

df_pets['OwnerGender'] = df_pets['OwnerGender'].apply( lambda rowVal: 'Male' if rowVal == 'M' else 'Female' )

It looks a bit complicated, so let's break it down.

The code for the lambda, the part in between the round brackets of apply, is this:

lambda rowVal: 'Male' if rowVal == 'M' else 'Female'

This is an if statement and reads, "enter Male if rowVal equals M, else enter Female". (Notice that this goes after a colon).

The result is the same as we did before.

A lambda function in Pandas.

Now let's use a lambda to add up values in columns, returning the reult in a new column.

Suppose we want to add up the scores in our student data columns (Math + Physics + Computers). The total would go in a new column called ExamTotal. We can construct the lambda part like this:

apply(lambda colName: colName['Math'] + colName['Physics'] + colName['Computers'] )

We also need to add the name of our Dataframe:

df_students.apply(lambda colName: colName['Math'] + colName['Physics'] + colName['Computers'] )

This adds up the values in each column (Math, Physics, Computers).

(Notice that we're not using a column name after df_students. That's because we want to apply our lambda to the entire Dataframe, and not just a single column.)

In Pandas, you can apply your lambda code to either rows or columns. This is done with the axis attribute, which can be set to either 1 or 0. The default is 0 and means columns.

apply(lambda x: CODE_GOES_HERE, axis = 1)

apply(lambda x: CODE_GOES_HERE, axis = 0)

Adding the axis attribute, the code would be this:

df_students.apply(lambda colName: colName['Math'] + colName['Physics'] + colName['Computers'], axis = 1 )

When you run the code, the result is as follows:

A Pandas lambda function that adds up column values.

Notice how we've spread the code over few lines. You can do this in your own code, if the lines are looking too long are becoming hard to read.

But let's move on - lambdas can improve your Pandas skills, but they can be a bit tricky to get the hang of. In the next lesson, we'll cover Pandas and plots. You get to create some little charts!

<< Pandas GroupBy | Pandas Plots >>

Back to Pandas Contents Page