Home and Learn: Power BI 2023


Cleaning Data with Power Query

Quite often, you'll find that the dataset you're trying to import is a mess and needs cleaning up. You can clean your dataset with the help of Power Query, which is built into Power BI (it's also available in Excel). In this lesson, we'll take a messy CSV file, transform it, and then import it into Power BI.

Download this CSV file (Right-click > Save As):

July Widget Sales

It looks like this, in Excel:

A CSV displayed in Excel.

As you can see, there are blank rows and columns, totals in the wrong place, a duplicate row, and a heading that we don't need. The data needs transforming before we can work with it in Power BI.

Open Power BI. Click Get Data from the File menu. You want the Text/CSV item, just like you did in a previous lesson. Select the file you downloaded above (sales-july). You'll see this dialog box:

Power BI data screen showing a CSV file.

Instead of clicking the Load button, click the Transform button instead. This will open up your data in Power Query. It should look like this (it's a very wide image):

Power Query screen

It's quite a big screen, so we'll be cropping images a lot in this section! We'll get to the ribbons at the top, soon. But notice the areas on the left, and on the right. These two:

Power Query - Query panel

Power Query - Query Settings panel

The first one is the query pane. Any queries you open and amend will show up here. The area on the right is called Query Settings. The main thing to note here is the Applied Steps area. There is no undo in Power Query, in that pressing CTRL + Z on your keyboard doesn't do anything. You undo things in the Applied Steps area. Click the X to get rid of that step.

But let's start cleaning the data. The data itself looks like this in Power Query:

A CSV file in Power Query ready to be cleaned.

The first thing we can do is get rid of the first two rows. Although row 1, column 3 has a heading in it, it's going to be of no use in Power BI, so there's going to be no harm in getting rid of it.

What we're trying to do, though, is create a neat table, with headings on the top, and no blank columns. The first two rows are mainly blank spaces, so they can go.

To delete rows, locate the Reduce Rows panel, which is on the Home ribbon at the top of Power Query. The item we want is Remove Rows:

The Remove Rows icon

Click the dropdown to see the options:

The Remove Rows menu in Power Query.

We want to remove the top rows, so click that option. You'll see a dialog box. This one:

Remove Top Rows dialog box in Power Query.

Enter the number 2 in the Number of rows textbox and then click OK. The top 2 rows of your data will be removed.

We can also get rid of the bottom row, as we don't need it.

Click the Remove rows dropdown again. This time, select Remove Bottom Rows. From the dialog box, enter a 1. Click OK to remove the bottom row, the one with the Totals line.

There is also a blank row in the data. We can get rid of that, as well.

Again, click the Remove rows dropdown. Select the option to Remove Blank Rows. Click OK to get rid of the blank row.

 

Remove Duplicate Rows in Power Query

We also have a duplicate row. Rows 4 and 8 are the same. This could just be a repeat order, so you need to take care when removing duplicates. In this case, we can get rid of the duplicate row.

Before removing the duplicate row, you need to select all the data. Press CTRL + A on your keyboard to select all your columns. (Or select the first column. Hold down the SHIFT key on your keyboard and then click the last row.)

Once all the columns are selected, click the Remove rows dropdown one more time. Select the option Remove Duplicates. The duplicated row will vanish.

Once you've done all that, the data screen should look like this:

Power Query screen after removing rows

If you've made a mistake, remember that you can undo via the Applied Steps panel on the right.

 

Remove Columns in Power Query

We can get rid of unwanted columns in the same way. We don't need the blank columns 1 and 7. So, select just these two columns. (Click where it says Column1 to select the first column. Hold down the CTRL key on your keyboard. Click where it says Column7. )

With the two columns selected, locate the Manage Columns panel at the top of Power Query. It's just to the left of the Reduce Rows panel:

The Remove columns icon in Power Query.

From the dropdown menu, select Remove Columns:

The Remove Columns menu in Power Query.

The two blank columns should be gone, and your dataset should look like this:

Power Query screen after removing columns.

 

Power Query Row Headers

One thing you will have noticed is that the column headers say, Column2, Column3, Column4, etc. The actual column headers that we want to use are in row 1 of the data. With Power Query, you can easily promote the first row to headers.

Right next to the Reduce Rows panel at the top of Power Query, there is Transform panel. Click the button on the Transform panel that says Use First Row as Headers:

Arrow pointing to the Use First Row as Headers item on the Transform panel of Power Query

Once you click the button, Power Query gets rid of those default Column names and uses the text in the first row of the data as your headers:

Power Query showing the result of promoting the first row to headers.

 

Power Query Data Types

If you look closely, Power Query has done something else for us. As well as using the first row as the headers, it has changed the data types going into the columns.

Before we clicked the Use First Row as Headers button, the Applied Steps on the right looked something like this:

Applied Steps panel

After we clicked the button, it looks like this:

Applied Steps panel with two more steps added.

Two steps have been added, Promoted Headers and Changed Type1.

The Type is the type of data that a column can hold. When we loaded the dataset, it was from a CSV file. The data types were all text. You can tell the type by looking closely at the icon to the left of the header name. In the image below, the red arrow is point to the icon for text, which is ABC:

Arrow indicating the data types menu in Power Query

The icon for integers (whole numbers) is 123. There's a calendar icon for dates. For floating values, the icon is 1.2.

Power Query has correctly guessed the type of data in our columns. Which is great as we don't have to change them ourselves.

If you want to change a data type, click an icon to the left of the header name. You'll see a list of types that you can choose from:

A list of Data Types in Power Query.

Select a type from the list. (Power Query will give you a warning, after you select an option.)

 

Our data is now nice and clean, ready for Power BI. One last thing before we close. You can change the name of your Query. Just above the Applied Steps section in the Query Settings, there is a Properties area with the Name of your Query. Change it to anything you like: (It's called sales-july because that was the name of the CSV file.)

In the next lesson, we'll take a closer look at dates in Power BI.

<--Back to the Power BI Contents Page

 


Email us: enquiry at homeandlearn.co.uk