Home and Learn: Data Analysis


Pandas loc and iloc

Sometimes, you'll want to extract just a few results from your data set. In which case, you can use loc and iloc. The difference between loc and iloc is that the former is used to refer to columns by name and the latter is used to refer to columns by their number. (loc is short for location). Think of it as iloc being an integer location and loc as a text location. Let's see some examples to clear things up.

Here's our student data again, in case you haven't yet downloaded it:

Student Scores Data Set (right click, Save As)

In a new cell in a Jupyter Notebook, access the CSV file with this code:

import pandas as pd
df_students = pd.read_csv('PATH_TO_FILE\\StudentScores.csv')
df_students.head()

Replace PATH_TO_FILE with a location on your own computer, wherever you downloaded the file to.

You should see the following output when you run the code:

A CSV file being read in Pandas.

Now, suppose you wanted to examine just the Math scores and the student names. The column names you need are First, Last, and Math. You can use loc to extract just these columns. The syntax is this:

df_name[rows, columns]

So, in between square brackets after your Dataframe name, you need to specify which rows you want and which columns. The two are separated by a comma.

 

Get all the Rows but only specified Columns

In Python, you can use the colon to slice values. If you want all the rows and all the columns, just use a colon by itself:

df_name[: , :]

The above code will return all the rows in your data set and all the columns. If you want all the rows but just specific columns, try this in a new Jupyter Notebook cell:

student_subset = df_students.loc[:, ['First', 'Last', 'Math']]
student_subset.head()

The result should be this:

Using loc in Pandas with column names.

Before the comma in the square brackets, we have a single colon (:). This means we want all the rows. After the comma, we have a Python list:

['First', 'Last', 'Math']

Python lists go between square brackets. Each column name is between quote marks and separated by commas (except the last one).

If you don't know the names of columns, you can use the column number instead. But you need to use iloc, rather than loc, otherwise you'll get an error. Try this:

student_subset = df_students.iloc[:, [0, 1, 2]]
student_subset.head()

Run the code to see the following output:

Getting columns by number in Pandas.

The first column in your data set is 0 rather than 1 (the index numbers in the left column is ignored as they come from Pandas).

 

Slicing with loc and iloc in Pandas

Now try this in a new cell:

student_subset = df_students.loc[0:3, ['First', 'Last', 'Math']]
student_subset

The result is this:

Python slice used with Pandas loc.

This time, we have numbers before and after the rows colon:

0:3

We saying we want the rows from row 0 to row 3, which is four rows.

The colon is really a 'from : to' statement:

from : to
0:3
1:5
17:22

If you want all the rows from a certain number, you can miss out the second number:

5:
10:
17:

If you want all the numbers up to a certain number, miss out the first number:

:3
:5
:17

Play around with the numbers before and after the colon to get a feel for how slicing works.

By the way, you can use slicing on the columns, as well, if you're using iloc:

student_subset = df_students.iloc[:, 0:2]
student_subset.head()

This is the same as we did before:

student_subset = df_students.iloc[:, [0, 1, 2]]

You can also grab every other by adding a second colon:

subset_iloc = df_student_scores.iloc[::2, 0:2]

The code above will get you every other row and just the student name columns.

If you wanted, say, every other row from the first 10 rows, you can write this:

subset_iloc = df_student_scores.iloc[0:11:2, 0:2]

This grabs every other row from 0 to 11.

OK, that's enough of loc and iloc. We'll move on and take a look at the important topic of Group By.

Back to Pandas Contents Page

 


Email us: enquiry at homeandlearn.co.uk