Home and Learn: Data Analysis


Importing Data in Pandas

In the previous lesson, you learned how to create a DataFrame in Pandas. Once you have a DataFrame, you can start to manipulate your data. In this lesson, you'll learn how to import some data from a csv file and use that as a DataFrame.

 

File Formats

You can import a wide variety of file formats using Pandas, from data in the popular CSV format to Excel files. You start with the word read, then an underscore, then the format you want to read. For example, to read in a CSV file, you'd do this:

import pandas as pd
file = pd.read_csv('path_to_file.csv')

For an Excel file, it would be this:

import pandas as pd
file = pd.read_excel('path_to_file.xlsx')

For a JSON file, it would be this:

import pandas as pd
file = pd.read_json('path_to_file.json')

For a full list of the formats, see here on the Pandas site:

Pandas File Formats

So, here's a CSV file for you to download. (CSV stands for comma separated values.)

Download the Pets Data CSV File (Right-click, Save As)

Once you've downloaded the file, you can double click it to open it up, if you like. There's not much too it:

A csv file displayed in Notepad.

Now let's import this file and see what it looks like in Pandas.

Start a new Jupyter Notebook. Add the following in the first cell:

import pandas as pd
df_pet = pd.read_csv('PATH_TO_FILE.csv')
df_pets

Replace PATH_TO_FILE with wherever you saved your downloaded file to. In Windows, you can get the full file path by opening an Explorer window. (A shortcut is the WINDOWS Key + E on your keyboard.)

Navigate to the folder where the file is. Select the file and, on the Home tab, click the Copy path button, as in the image below:

Explorer Window showing the Copy Path item.

Paste between the round brackets of read_csv and you'll have something like this:

pd.read_csv('C:\Users\Ken\Documents\DataScience\csv\pets-data.csv')

You now need to change all the backslashes to forward slashes, otherwise you'll get a Unicode Escape error: (Or you can double up on the backslashes in place of the single backslashes above.)

pd.read_csv('C:/Users/Ken/Documents/DataScience/csv/pets-data.csv')

Or this:

pd.read_csv('C:\\Users\\Ken\\Documents\\DataScience\\csv\\pets-data.csv')

Click the Run button in your Jupyter Notebook and you should see this:

A CSV File opened in a Jupyter Notebook and placed in a Pandas Dataframe.

What we've done here is import the pandas library. The variable name we're going to use is pd. On line 2, we've read the CSV file in and stored it all in a variable called df_students. This variable, df_students, now contains a Dataframe. We print the Dataframe out on the third line. Notice that not all the results are displayed, just the top 5 and the bottom five. We are told that there are 102 rows and 6 columns.

If you want to display some records from the top of your results, you can use the head function. Like this:

df_pets.head()

If you don't type a number between the round brackets of head, you'll get the first 5 rows of data. Type a number between the round brackets of head and it will display that number of rows:

df_pets.head(10)

If you want to display rows from the bottom instead of rows from the top, use tail instead of head:

df_pets.tail()
df_pets.tail(10)

 

In the next lesson below, you'll learn some basic operations you can apply to your DataFrames.

Back to Pandas Contents Page

 


Email us: enquiry at homeandlearn.co.uk