Quantcast
Channel: PowerBI Archives - The Excelguru Blog
Viewing all articles
Browse latest Browse all 178

Number rows by group using Power Query

$
0
0

After one of my previous sorting posts, Devin asked if we can number rows by group.  Actually, that's a paraphrase… what he really asked was:

Any thoughts on how to produce something like a ROW_NUMBER function with PARTITION from T-SQL? Similar to this: https://docs.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017#d-using-rownumber-with-partition

I've never used the PARTITION function in SQL, so I checked in with him just to confirm what he was after.  And here it is: take the first three columns of the table below, and add a final column with index numbers sorted by and restarting with each group:

Number rows by group example

And of course there is.  For this one, though, we actually do need to write a little code.

Preparing to Number rows by Group

Now here's the interesting part.  The source data looks exactly the same as what you see above, the only difference being that in the output we also added a number rows by group.  So how?

Well, we start by grouping the data.  Assuming we start by pulling in a table like the above with only the first 3 columns:

  • Sort the data based on the Sales column (in descending order)
  • Group the data by Group
  • Add a aggregation column for "All Rows"

Like this:

Group By dialog

Which yields this:

Table after aggregation

We are now ready to add the numbering.

Now to Number rows by Group

In order to number rows by group, what we really want to do is add an Index column, but we need to add it for each of the individual tables in our grouped segments.  If we try to add it here, we'll get 2 new values, showing 1 for Alcohol and 2 for Food.  That's not exactly what we need.  But if we expand the column, then the index will not reset when it hits Food, and the numbering won't be right either.

The secret here is to add an Index column to each of the individual tables in the Data column, then expand the columns.  The numbering will then be correct.

To do this, I added a custom column using the following code:

=Table.AddIndexColumn([Data], "Index", 1, 1)

Where did I get this code?  I actually added an Index column to the whole table.  That added the following code in the formula bar:

=Table.AddIndexColumn(#"Grouped Rows", "Index", 1, 1)

I copied that code, and deleted the step. Then I added my custom column, pasted the code, and replaced #"Grouped Rows" (the name of the previous step) with the column I wanted to use.

The result was this:

Table after Index column added

The final steps to clean this up:

  • Remove the Data column
  • Expand all columns from the Partitioned table except the Group (since we already have it)

Which leaves us with our table in place including the new column which does number rows by Group as originally planned.

If you want to play with this one, the example file can be found here.

I would also be remiss if I didn't mention that we have a great video in our Power Query Academy that covers this kind of operation (among others).  It's called "Advanced Row Context" (in our M deep dive section) where Miguel shows all kinds of cool stuff that you can do by adding new columns to Grouped Rows.

The post Number rows by group using Power Query appeared first on The Ken Puls (Excelguru) Blog.


Viewing all articles
Browse latest Browse all 178

Trending Articles