Quantcast
Channel: PowerBI Archives - The Excelguru Blog
Viewing all articles
Browse latest Browse all 178

Remove All Rows Up To A Specific Value

$
0
0

A couple of weeks ago, Rudi asked how you would go about setting up a query to remove all rows up to a specific value.  Specifically, his question was this:

The other day I was asked if Power Query could delete all top rows up to a found value. I could not find a solution and its been a burning question till now.
For example: If I import a csv file containing columnar info, but the headings for the list are in different rows for each import. I know that the first heading in column A is called "ID Number", but each import has this heading in a different row.
How do I determine an applied step to delete all rows above "ID Number". I cannot use the delete top rows as its not always 5 rows, some import the headings start in row 10, others in row 3...but the label I am looking for is always "ID Number".

While the question was answered in the initial post, I still though it would be interesting to do a full post on this for others who might need to create similar functionality.

The Data Set

I didn’t have Rudi’s exact data, so I knocked up a little sample with an ID Number and Amount column starting in row 3, which you can download here.

The components I was after here was the ID Number header and an extra column of some kind.  In addition, I wanted to have some garbage data above, as I didn’t want to give the impression we can just filter out blank rows.

Now, I assume this data is loaded from an external file, but it doesn’t really matter, I’ll just load this from a range, as it’s just a data source.  The key is that the header row is not the first row.  So I defined a new range to cover the data

  • Select A1:B7 –> Name Box –> Data

I can then select the name from the Name box to select my data:

SNAGHTML1549a4a4

I then loaded it into Power Query by creating a new query –> From Table

Determine the Header Row

This is the first job.  In order to remove any rows above the header row, we need to know which row the header resides.  To do that, we need to add an index column, and filter to the specific header that we’re after.

  • Add Column –> Index Column –> From 0
  • Filter Column1 –> only select ID Number

This results in a single row of data, and conveniently has the Index number of the ID Number row listed.  In this case it’s 2.

Call up the Original Table

We’ll park that value for a moment, and call up the original table.  To do that, we click on the fx button beside the formula bar to create a new step, then replace the formula with =Source.

image

Remove All Rows Up To A Specific Value

Now comes the tricky part.

We cant’ remove all rows up to a specific value immediately.  We need to insert a step that removes the top 2 rows first, then modify it.  So let’s do that:

Home –> Remove Rows –> Remove Top Rows –> 2

This gives us the following table and formula:

image

The key is to understand this formula first.

  • The Table.Skip() function removes (actually skips importing) the first x rows
  • Custom1 is the name of our previous step
  • 2 is the number of rows

So what we really need to get is the number of rows.  We can extract that from the Filtered Rows step like this:

#”Filtered Rows”[Index]{0}

Where:

  • #”Filtered Rows” is the name of the step in the Applied Steps window
  • [Index] is the column we want to look at in that step
  • {0} indicates the first row of that step (since Power Query starts counting at 0

So let’s modify the function to:

= Table.Skip(Custom1,#"Filtered Rows"[Index]{0})

As you can see, it works nicely:

SNAGHTML1559cc91

In fact, we can go even better than this.  Why are we referring to Custom1 at all?  Doesn’t that just refer to Source anyway?  Let’s also replace Custom1 with Source, and remove the Custom1 step from our query:

SNAGHTML155cd378

Cleanup

We can now promote our header rows, remove the unnecessary Changed Type step and set our data types correctly:

SNAGHTML155f21ae

Testing

If you try inserting new rows at the top of the data range, then refreshing the completed query… it just works!  The output table will never end up with extra rows at the top, as we’re filtering for to start at the ID Number row.

SNAGHTML156196d4

Using M

If you learn to write your own M code from scratch, you can start combining lines as well, skipping the next to explicitly filter the primary table.  Bill S provided code in answer to Rudi’s question (adapted to this workbook), which is shown below:

let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
Skip = Table.Skip(Source, Table.SelectRows(AddIndex, each ([Column1] = "ID Number")){0}[Index]),
PromoteHeaders = Table.PromoteHeaders(Skip)
in
PromoteHeaders

It essentially replicates the same steps as the query I built via the user interface, only he combined the Filtered Rows step into the Row Removal step.  To bonus here is that the code is a bit shorter.  The drawback is that it might not be quite so “de-buggable” to someone who isn’t as experienced with M.

The post Remove All Rows Up To A Specific Value appeared first on The Ken Puls (Excelguru) Blog.


Viewing all articles
Browse latest Browse all 178

Trending Articles