Quantcast
Channel: PowerBI Archives - The Excelguru Blog
Viewing all 178 articles
Browse latest View live

Power Query – The IF function

$
0
0

In my last post I talked about useful text functions, and how they differed between Excel and Power Query.  Today we’re going to look at another compare/contrast scenario, but this time it’s going to be the IF function.

Critical background

The only important thing we need to remember here is that all functions in Power Query, whether text, logic or anything else, are case sensitive.  That may strike you as weird in this one, but we need to remember that “if” is not the same as “IF”, and that Power Query will gag on the latter.

The base scenario

For this example I’m going to work with a table of data that holds a customer number, a boat type and a billing code schema.  While the data has been scrambled, this represents a real structure that we use in my day job.

There’s no real mystery to the first two items, but the billing code schema we designed holds a ton of info.  It’s always 10 characters long, and breaks down like this:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, P = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Will factor in to a future post.)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

So a table of customer data could look like this:

SNAGHTML153bc957

Turning data into more useful data

So great, we’ve got this awesome billing code schema, but it doesn’t really tell me anything when I look at it, as it’s too complicated to read.  I really need to break this into separate pieces, and make useful and readable columns out of it.  So that’s what I’m going to start doing now.

The first step is, of course, to click in the table and go to Power Query –> From Table.

My goal here is to make a column that says “Annual” if the second character is an “A”, or “Periodic” if the second character is a “P”.  To start, I’m just going to reach back to last week’s article and make sure I can identify which character I’m looking at.  So first I’ll click “Add Custom Column”.

I’ll call my new column “Seasonality”, and use a formula to extract just the 2nd character:

SNAGHTML16328557

And with that in place we can now focus in on the important data here:

SNAGHTML163368c2

Writing IF functions in Power Query

Assuming the data was in a table that started in row 2 of the worksheet, either of the following formulas would work to convert “A” to “Annual” or “P” to “Periodic”:

=IF([@Seasonality]=”A”,”Annual”,”Periodic”)
=IF(D2=”A”,”Annual”,”Periodic”)

Easy enough, right?  But look at how the signatures differ from Excel to Power Query:

Excel =IF(test, value_if_true, value_if_false)
Power Query =if test then value_if_true else value_if_false

Notice that there are no parenthesis or commas in the Power Query version, but you need to actually type out the “then” and “else” portions.  So to create the same thing in Power Query, we’d need a new column that uses the formula:

=if [Seasonality]=”A” then “Annual” else “Periodic”

Or, as is my preference, we modify the Seasonality column we already built, wrapping the text extraction with the IF function as follows:

=if Text.Range([BillingCode],1,1)=”A” then “Annual” else “Periodic”

Once we modify the original formula, our table now correctly shows the different values all the way down:

SNAGHTML1642027d[5]

Observations

Once again, I find this a bit of a departure from regular Excel formulas.  Although it’s not hard to make the transition once you understand it, it would still be nice if the language could leverage the skill set we’ve worked so hard to master.  You could argue that the verboseness of the Power Query IF function is easier to read, but it’s still inconsistent with the formulas we know and love.

I still feel it would be nice if we could have an alternate pointer into the same function so that I could type this in Power Query too:

=IF([Seasonality]=”A”,”Annual”,”Periodic”)

I think that would just make it so much easier to get off the ground running for Excel pros.

I’ll also point out that the error message Power Query gives you when you create an IF function or formula is not exactly helpful:

image

Most Excel pros aren’t going to understand what “Token Eof expected.” means, and I really have to question how it is telling me anything that I need to do to fix the formula.  Hopefully, in future versions of Power Query we get a more helpful message that says something like “It looks like you typed an upper case formula name.  Can I fix that for you?” (Maybe that will come with Intellisense and auto-complete…)

Taking this further

Next blog post we’ll look at how to take this a bit further… extending our conditional logic to look up a corresponding value in a list, avoiding having to nest several IF functions within each other.

The post Power Query – The IF function appeared first on The Ken Puls (Excelguru) Blog.


Power Query – Multi Condition Logic

$
0
0

In my last post, we looked at creating an IF statement using Power Query.  This time we’re going to go a bit deeper and look at a scenario where we need to choose between several options.

The base scenario

In the last post I talked about my billing code setup.  As a reminder, it’s a 10 digit code that is constructed as follows:

  • Char 1 – Alpha – Indicates the division (G = Golf, F = Fitness, M = Marina)
  • Char 2 – Alpha – Indicates the billing type (D = Dues, S = Pass, A = Annual Moorage, P = Periodic Moorage)
  • Char 3-4 – Numeric – Indicates the number of months of coverage for the product (1-12)
  • Char 5-6 – Numeric – Indicates the start month (and subsequent anniversary) for the customer’s product
  • Char 7-8 – Variable – Slip length (in feet) for a boat in the case of marina customers, or SG, CP or CS for golf (indicating single, couple primary or couple spouse)
  • Char 9 – Text – A variety of single letter codes indicating specific things we want to know. (Outlined below)
  • Char 10 – Text – Indicates the payment method (F = Financed, P = Paid up front, C = Comp/Honorary)

(Note that the sample data only includes records for Marina data)

Sample file

If you’d like to follow along, download the sample file here.

Multi condition logic – Using IF

So, building on my previous two posts (using text functions and creating IF statements), we could easily break the first character into pieces by nesting 2 IF tests together:

=if Text.Start([BillingCode],1)=”G” then “Golf” else “two options left”

=if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

into

if Text.Start([BillingCode],1)=”G” then “Golf” else if Text.Start([BillingCode],1)=”F” then “Fitness” else “Marina”

Not too hard really.  In fact, we can even build each IF statement separately, then just copy the second to replace the “two options left” part without making any other changes at all.  No parentheses or anything needed.

More than 3 options

But what if you have a whole bunch of options that you need to work with?  Let’s look at the 9th character in our billing code.  I haven’t given the details yet for that one, but here are the options:

E = Employee, S = Yacht Club, N = Non-Taxable, R = Restricted, I = Inactive, L = Social, M = Medical, U = Regular

Wow.  That’s a whole lot of possibilities, and would make for one monster nested IF statement.  That wouldn’t be a lot of fun to write, nor maintain.  So how would we deal with it?

In Excel proper, we would probably separate these options into a separate table, then use VLOOKUP to pull the appropriate value into the table.  So we just need a Power Query VLOOKUP function… except there isn’t one.

We actually have a couple of different methods to deal with this.  We could either:

  1. Split the first character into one column, create an Excel table with the first letter in column 1 and the appropriate match in column 2, then merge the two using Power Query’s merge function. (Maybe I’ll write a future post on it.)
  2. Build our own CHOOSE function inside Power Query (or SWITCH if you prefer Power Pivot’s DAX version.)  This is way more fun, so let’s do that.  :)

Building a CHOOSE function

This actually isn’t too hard once you know the basic structure.  It basically goes like this:

function_name = (input) => let
   values = {
         {result_1, return_value_1},
         {input, “Undefined”}
     },
   Result = List.First(List.Select(values, each _{0}=input)){1}
in
   Result,

The key parts to recognize here are:

  • We can change the “function_name” part to be whatever we want/need,
  • result_1 is the first of the possibilities that we may pass TO the function
  • return_value_1 is the value that we’d like to return if the first value is result_1
  • if we need more values, we just insert another comma after the value_1 section and put in a value_2 section
  • we can keep adding as many values as we need.
  • the “Undefined” value will return the text “Undefined” if the value you pass isn’t in your provided list of options (it’s the Else portion of the CHOOSE statement)

Using this structure, we could write a CHOOSE function for our scenario as follows:

fnChoose_CustCode = (input) => let
   values = {
         {“E”, “Employee”},
         {“S”, “SCYC”},
         {“N”, “Non-Taxable”},
         {“R”, “Restricted”},
         {“I”, “Inactive”},
         {“L”, “Social”},
         {“M”, “Medical”},blah
         {“U”, “Regular”},
         {input, “Undefined”}
     },
   Result = List.First(List.Select(values, each _{0}=input)){1}
in
   Result,

Notice that I changed a couple of things:

  1. I gave the function a name so that I can recognize it, and also so that I can create more than one function with different names.  This one is fnChoose_CustCode.
  2. I created a list of all the options I needed.

Implementing the CHOOSE function

Okay, so now we have this, how do we use it?  Again, we’ve got two options.  I’ll focus on the other option at some other time, but for this scenario I want to build this right into an existing query.  So here’s how I do it.

First I created a brand new query that just pulls my table into Power Query, resulting in the following:

SNAGHTML76c03ee

Let’s go and inspect the code that pulls this in.  We need to click View –> Advanced Editor.  That will bring up the following code:

let
    Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
    Source

Not too complicated (yet).  Let’s paste in our code just before the Source = line:

let

fnChoose_CustCode = (input) => let
   values = {
         {“E”, “Employee”},
         {“S”, “SCYC”},
         {“N”, “Non-Taxable”},
         {“R”, “Restricted”},
         {“I”, “Inactive”},
         {“L”, “Social”},
         {“M”, “Medical”},blah
         {“U”, “Regular”},
         {input, “Undefined”}
     },
   Result = List.First(List.Select(values, each _{0}=input)){1}
in
   Result,

   Source = Excel.CurrentWorkbook(){[Name="Customers"]}[Content]
in
    Source

Perfect.  And yet it doesn’t exactly look like much.  In fact, beyond adding a new line in the Steps section of the Editor, we don’t see any changes:

image

So what good did that do then?

As it turns out, we’ve only pasted in our function to make it available to the Power Query engine.  We haven’t actually told Power Query to do anything with it.  So why don’t we do that now?

Using our new CHOOSE function

You’re going to be amazed how easy this is…

First we’re going to add a new column (Add Column –> Add Custom Column).  When the dialog pops up, we’ll create a formula to return the letter we want to pass to the function:

=Text.Range([BillingCode],8,1)

And that gives us the following result (assuming we provided the column name of Status):

SNAGHTML29d7316

Cool stuff.  Now, let’s amp this up and use our function.  We’ll click the gear next to the Status step and wrap the existing formula with our function call.  (Don’t forget the extra  parenthesis needed at the end):

=fnChoose_CustCode(Text.Range([BillingCode],8,1))

Which gives us the following:

SNAGHTML2a06cbf

The end effect

You’ll find that all the sample codes in the data work just fine, and that nothing comes back as undefined.  If you’d like to see how the query reacts to different items, go back to the Customers table and try changing the second to last letter to something else.  When you refresh the table, you’ll find that it will evaluate the new character and return the appropriate result.

Caveat

It should be noted that the function as written above is case sensitive, meaning that a code of MP010450uP would return “Undefined”.  This is expected in my case, as valid codes are made up of upper case letters.

If I wanted to accept either case I would need to modify my Text.Range function to force it to upper case.  This would result in a function call that reads as follows:

=fnChoose_CustCode(Text.Upper(Text.Range([BillingCode],8,1)))

Which would work, as you can see here:

SNAGHTML2be7f6a

The post Power Query – Multi Condition Logic appeared first on The Ken Puls (Excelguru) Blog.

Merging Columns with Power Query

$
0
0

The August update for Power Query was finally made available on Sept 1, and it has some pretty cool stuff in it.  In this week’s segment I thought we’d cover off one of the features that I’m most excited about as an Excel Pro: Merging columns with Power Query.

The old way

It’s been possible to merge two or more columns together in the past, but you had to write a formula to do it.  Honestly, it wasn’t a huge deal, but it still took a bit of know-how and work.  Assume, for example, we had this:

SNAGHTML702fae4

And our goal is to concatenate the Account and Dept columns together with a hyphen between them.  Here’s what you had to do:

  • Insert a New Column (the steps for this varied depending on the version of Power Query you are running.  Currently it is Add Column –> Add Custom Column)
  • When the prompt pops up you had to provide a formula like shown below:

image

Okay, so not a huge deal.  Just =[Column1] & “-“ & [Column2]

But you still had to write it.  I’ve lost count of how many people to whom I’ve taught the simple & shortcut for Excel formulas, but it’s enough to say that it’s probably not intuitive.

So it worked, but could it become easier?  We now know the answer is Hell Yeah!

The new way

This time we’ll do it differently…

  • Select the Account column
  • Hold down CTRL (or SHFT) and select the Dept column
  • On the Add Column tab, click Merge Columns

image

  • Choose your separator.  The default is –None- (meaning it will just mash them together), but other pre-defined options include Comma, Colon, Equals Sign, Semicolon, Space, Tab
  • What I want (a minus sign) isn’t there, so I’m going to choose –Custom–

image

  • Now I’ll enter a – (minus) sign and click OK

image

And that’s it!  My output comes together nicely:

SNAGHTML71194af

Now, to be fair, I still have to rename the column.  I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

In the mean time, we can either right click the column header and rename it there, or we can edit the column directly in the formula bar.  Just change the highlighted part shown below:

image

Like this:

image

So honestly, it’s not that much more efficient, why do I think this is cool?  Well, it’s not that much more efficient with 2 columns.  But try 4.  Or when you just need to put 4 columns back together with no spaces in between.  Then it starts to make life much easier.

The post Merging Columns with Power Query appeared first on The Ken Puls (Excelguru) Blog.

Power Query – The Round Function

$
0
0

The other day I asked one of my co-workers how many ways he knew of to round a number.  His answer was one… if it ends in .4 it rounds down and if it ends in .5 it rounds up.  My guess is that most people would answer along similar lines.

Interestingly though, there are a bunch of different ways to round, depending on your needs, and Excel has a bunch of functions to support them: ROUND, ROUNDUP, ROUNDDOWN, FLOOR, CEILING, EVEN, ODD, TRUNC, INT and MROUND.

Power Query also has a bunch of rounding formulas as well but, given that the function names can be somewhat different in Power Query (as we first saw here), I thought it might be interesting to see how to convert each of those functions from Excel to Power Query’s structure.

Background Setup

To start with, I created a very simple structure: a two column table with some random values in the “Value” column and then rounded them to 2 decimals using the formula =ROUND([@Value],2)  The output, after feeding it through Power Query, looks like this:

SNAGHTML20a5291b

The blue table on the left is the Excel table, and the green table on the right is the Power Query output.  (There is a completed example file available hereexample file available here.)

Creating the Round function

I love the ROUND function in Excel.  I use it constantly – rounding everything that uses multiplication or division – and pretty much have it burned into muscle memory.  So to me this was a logical place to start with Power Query.  Naturally, the syntax is just a bit different from Excel though:

Syntax
Excel =ROUND(number,digits)
Power Query =Number.Round(value, digits, roundingMode)

Hmm… we know that the Power Query function will be case sensitive.  In addition, it has an extra parameter.  The valid options are:

  • RoundingMode.Up
  • RoundingMode.Down
  • RoundingMode.AwayFromZero
  • RoundingMode.TowardZero
  • RoundingMode.ToEven

Let’s see what we can do with this.

I open up my query, select Add Column, and put in the formula as shown below:

image

Pretty easy, just Number.Round([Value],2).  In fact, it’s so similar to Excel it’s awesome!

So I click OK, save the query, and have a look at my results.  And that’s when I notice something isn’t quite right.  I’ve added some conditional formatting to the table below so you can see it easily:

image

This is the default?

Notice all the numbers that don’t match?  Can you spot the pattern?  It’s the oddest damn thing I’ve ever seen, to be honest, and I’ve never heard of anyone rounding in this way.

The default “RoundingMode” for Power Query is “Round to Even”.  What that means is that if there is a tie in the numbers it will round up or down to the closest even number.  So in the case of 1.645, it will round down to 1.64 as that is closer than 1.66.  (1.64 is .05 away from 1.645, where 1.66 is 0.15 away from 1.645).

I find this deeply disturbing.  I personally think that every user would expect Excel and Power Query’s default rounding methods to line up exactly, and this doesn’t.  How serious is this?  I’m not sure.  I think I’ll let someone from the scientific community ponder that.

Using RoundingMode.Up

Since the default plainly doesn’t work for us, it looks like it’s time to figure out which of the additional parameters we need.  Let’s try adding RoundingMode.Up to see if that will fix it.

I open Power Query again, and added a new custom column with the following formula:

=Number.Round([Value],2,RoundingMode.Up)

And the results are as follows:

image

Um… uh oh.  It seems to work above 0, but below is another matter.  That –5.245 is rounding down, not up! (Yes, from a technical perspective I am aware you can argue the words I used, but you get the idea.)

Using RoundingMode.Down

Now I’d be surprised if this came up with numbers consistent with the Excel formula, but let’s just check it for good measure.  The formula is:

=Number.Round([Value],2,RoundingMode.Down)

And the results:

image

So now numbers greater than 0 get rounded down, where numbers less than 0 are being rounded up (away from zero).

Let’s try another:

Using RoundingMode.AwayFromZero

Here’s our next option:

=Number.Round([Value],2,RoundingMode.AwayFromZero)

And these results are pleasing!

image

Look at that… we finally found the one that works!

Using RoundingMode.TowardZero

We’ve only got one other option we haven’t explored, so we might as well use it too, just for the sake of completeness:

=Number.Round([Value],2,RoundingMode.TowardZero)

For some reason, I’m incapable of typing TowardZero the first time I type this.  I always type TowardsZero and end up with an error!  At any rate, the results:

image

Thoughts

As a tool that is built for the Excel audience, I am having some real difficultly accepting the default parameter for this function in Power Query.  I HOPE that this is a bug, and not a design choice, although the documentation would suggest it is the latter.  If that’s the case, I think it’s a HUGE mistake.

Excel’s ROUND formula defaults to round away from zero.  Power Pivot’s DAX ROUND formula defaults to round away from zero.  VBA’s Application.Round function defaults to round away from zero. (As pointed out by Rory Archibald on Twitter, VBA’s Round function – without the application. prefix – does use banker’s rounding though.)

In my impression, if the Power Query formula holds the same name (at least after the Number. portion) it should return the same results as the Excel function.  In fact, I would venture to say that virtually every Excel pro would expect this.

My bigger concern would be that, with one of Power Query’s big selling features being it’s ability to re-shape and process large volumes of data, how quickly will a user realize that the Rounding function they thought they had is NOT working the way they expected?  Not good news at all.

I’m curious to hear your impressions.  Please leave a comment!

Want to see if for yourself?

Download the example file with all the formulas already in place.

The post Power Query – The Round Function appeared first on The Ken Puls (Excelguru) Blog.

Un-pivoting With Subcategories in Power Query

$
0
0

My last few posts have been relatively technical, so this time I figured I’d look at something practical.  I can’t believe it’s been almost a year since I blogged about Un-Pivoting data in Power Query, so it’s about time we looked at that again… but this time with a twist.  This time we’ll add sub categories to the data.

Background

The data we’re going to start with looks like a typical financial report.  Whether a restaurant or a shoe store, in the manager’s office you’re liable to come up with a report that looks something along the lines of this:

SNAGHTMLf92afc

Now for the challenge… someone decides they need an alternate view of this data.  So how do you quickly un-pivot this into a format that you can use for other things?

If you want to follow along, you can download the file from my OneDrive.

Issue 1 – Getting the Data into Power Query

The first issue we come across is that, while Power Query can consume data from inside an Excel file, it MUST be formatted as a table.  But this hardly looks like it’s conducive to a table format with all those blank rows and such.  But what the heck… let’s apply a table to it anyway, and see what happens.

  • Click anywhere
  • Choose Power Query –> From Table
  • Adjust the range to cover all of the data (A4:H17)
  • Uncheck the box that indicates your table has headers

When you’re done, the box should look as follows:

image

And when you say OK, you should be taken into Power Query.

If you take a quick peek back at Excel, you can see that the table has indeed been applied, and that there are generic column headers above each column:

SNAGHTML100f05e

This is also reflected in Power Query:

image

Data Cleanup

Before we get into the trick of how to deal with subcategory columns, let’s clean up some of the garbage here.  Ideally what we’d like to get here is a nice pure table that we can easily un-pivot, just like we did in the prior article.

Cleanup Step 1:

Looking at the first column, we’ve got a bunch of null values in there, as well as some section headers.  What we really need is those section headers repeated on the lines below them.  So let’s make that happen.

  • Select Column1
  • Go to Transform –> Fill –> Down

You’ll see that the section headers are filled into any of the null areas. As soon as they encounter data however, they stop. (I’ve drawn a box around the Revenue lines below – notice how they fill until they reach Total Revenues, which then fill until they reach Expenses, and so on.)

image

Cleanup Step 2:

Now, we don’t really need any of the rows that are showing null values in Column3 through Column 8.  Let’s filter Column3 to remove those.  Click the drop down on Column3 and uncheck (null).  The result looks like this:

image 

So… why didn’t I filter the null values out of Column2?  After all, there are blank data rows, and with a PivotTable we can recreate the subtotals…  The answer is that I’m not ready to lose the first two rows yet.  I need those in order to un-pivot my data. ;)

Issue 2 – Un-Pivoting the Data

If things were lined up perfectly, we could just select Column3 through Column 8 and un-pivot it now.  Unfortunately, if we do we’ll get some really wonky results.  (Go ahead and try it if you like. Remember to click the x to the left of the “Unpivoted Columns” step once you confirm you’ve made a disaster of it!)

image

Preparing to Un-Pivot

Okay, so what do we need to do… well, the first thing we need to do is fill the first row (containing April and May) across the columns.  Here’s the rub though… there is no Fill—>Across feature.  So how do we do it?

Transposing the Table

To an accountant, transposition is an evil word that means you made a mistake and flipped two digits around.  It’s nasty and something we never look forward to.  But to Power Query it’s simply awesome.  Check this out…

  • Go to Transform—>Transpose

This instantly flips the columns to rows and rows to columns!

SNAGHTML1147243

And would you look at that… April and May are in Column1 and below them… null values!  We know what to do with those now!

  • Select Column1
  • Go to Transform –> Fill –> Down

Is this enough though?  Nope, sorry.  You might be tempted to “un-transform” and then un-pivot it, but you’d still end up with garbage.  We still need to do a bit more.

Concatenating the Category and Subcategory

This is the trick to un-pivoting with subcategories: you need to concatenate them first, then un-pivot, then split them up.  So let’s get to it.  Using the tip from Merging Columns with Power Query, let’s join up Column1 and Column2.

  • Select Column1
  • Hold down CTRL or SHFT and select Column2
  • Click Transform –> Merge Columns
  • Choose the Colon for the separator (or whichever you prefer)

Note:  If you don’t have the Merge Columns feature, you’re using an old version of Power Query. Either download the latest version, or you’ll need to manually join the columns by:

  • Adding a new column
  • Using the formula =[Column1]&”:”&[Column2]
  • Delete Column1 and Column2
  • Move the new column to the first position

Once you’ve got it done, the output should look as follows:

SNAGHTML120c673

Un-transposing the Table

Awesome… we’ve got concatenated headers now.  We just need to flip the table back right side up and we’re almost ready to un-pivot it:

  • Go to Transform –> Transpose

Final Preparations

The very last thing we need to do before we un-pivot our table is provide some decent headers.  This will ensure that the data will make sense when it is un-pivoted.  To that end:

  • Go to Transform –> Use First Row as Headers
  • Rename the first column to “Class”
  • Rename the second column to “Category”
  • Filter “Category” to remove the null values

Our table now looks nice and clean:

image

And we’re ready!

Un-Pivot It!

We now follow the steps of a regular un-pivot operation, with only a minor extra step:

  • Select the Class column
  • Hold down CTRL or SHFT
  • Select the Category column
  • Go to Transform –> Unpivot Columns –> Unpivot Other Columns
  • Rename the “Value” column to “Amount”

And now the extra step:

  • Select the “Attribute” column
  • Go to Transform –> Split Column –> By Delimiter –> Colon
  • Rename the Attribute.1 column to “Month”
  • Rename the Attribute.2 column to “Measure”

The results:

SNAGHTML12d5ffa

That’s pretty much it.  The last thing I’d do is change the Query name from Table1 to something more intelligible… maybe Data or something… then load it to the worksheet.

From a Static Report to a Data Source

Now that we’ve got our report reformatted into a data source, we can click anywhere in the table and pivot it to our heart’s content!

A Quick Recap

To be fair, this post has been pretty long, but only because I included a LOT of pictures and detailed instructions.  Once you’ve got the process nailed down, it doesn’t take all that long at all.  Remember, the big key to this is:

  • Suck your report into a table (without headers)
  • Fill any rows you can
  • Transpose the data
  • Concatenate your category and subcategory fields together
  • Un-transpose it
  • Un-pivot it
  • Split the category and subcategory back into their pieces

Enjoy!  :)

The post Un-pivoting With Subcategories in Power Query appeared first on The Ken Puls (Excelguru) Blog.

Pulling Excel Named Ranges into Power Query

$
0
0

The comments of my last post collected a tip that I thought it was worth exploring.

I made the claim that Excel MUST have the data in an official Excel table.  As LoganEatsWorld pointed out, that’s actually not true any more.  If you’d like to give this a go, you can download this workbook to follow along.

What’s in the file?

The file is very basic.  It simply contains one table, and one named range of data:

image

The table is highlighted in the blue table style, and bears the name “Stats”.  The named range is surrounded by the black outline, and is called “Breeds”.

Connecting to Excel Data:

The reason I never found this is that my method was always to go to the Power Query tab and click –> From Table.  That will work great to get the data out of a table, but it won’t work for the named range.  So let’s try this a different way…

  • Go to Power Query –> From Other Sources –> Blank Query
  • Click in the formula bar and type the following:

= Excel.CurrentWorkbook()

(Yes, it’s case sensitive… I’m starting to reconcile myself to the fact that it’s Power Query so I’m just going to have to get over it.)

What ends up happening is a bit of magic:

image

Interesting… we have two tables listed!  The first is our official table, the second is our named range.  Cool!

Let’s click in the blank space to the right of the green “Table” text in the Breeds row:

image

The preview pops up and, sure enough, that’s our named range data:

image

Working With The Data

All right, let’s click the green Table text and break open that named range:

image

One notable difference here (in fact really the only one), is that Power Query doesn’t automatically recognize the header row.  This is due to the fact that an Excel table actually has a named header row to promote, where a named range does not.  No big deal though, as we can easily deal with that:

  • Go to Transform –> Use First Row As Headers

At this point, we could save the table to the worksheet or data model, as we need.

Observations

So this is cool.  It’s awesome that we can get to named ranges, as I have a LOT of workbooks that use these, and there are occasions where I don’t want to convert them to official Excel tables.  Despite the fact that we can, however, you pretty much need a secret decoder ring to find it, and that’s not so good.

It would sure be nice if there was a more discoverable way to pull in a named range… but where?

Suggested Accessibility Option 1

When I look at Power Query’s “Get External Data” function, it seems logical to me that it should end up somewhere in that area.  Looking at the group:

image

I kind of like the ability that comes with the “From Table” feature which works from the table you’re in (if you are), and lets you create a table if you’re not inside a table when you click that button.  But I wonder if it would be better served as a SplitButton/menu/submenu structure that offered the following options:

  • From Table
    • Current Table
    • Other Table
      • List of other tables in the workbook
    • Named Range
      • List of named ranges in the workbook
    • Create New Table
    • Create New Named Range

Actually, there is another change I would make to that group, and that’s to move the “From Blank Query” out of the “From Other Sources”, and give it it’s own button.  (I create a LOT of queries from scratch now, and it’s just extra clicks in my way to do so.)

Suggested Accessibility Option 2

I’m not sure this is so much of an alternate as something additional I’d like to see, actually.  An “additional sources” button on the Home tab would be awesome.  If that had the ability to pull up all the existing tables or named ranges in the workbook, and add them to the Power Query script as a “Source2=…”

I think the implications of this would be two fold:

  1. It would allow you to add a data source after creating a blank query, and/or
  2. It would allow you to add additional data sources into the same query.

The latter is certainly something I do semi frequently, as I don’t want to have multiple Power Queries created that are then merged together.  I’ll add both sources manually in the same query then merge them.

At any rate, just some thoughts.  If you have any on the subject please feel free to leave them in the comments.  :)

The post Pulling Excel Named Ranges into Power Query appeared first on The Ken Puls (Excelguru) Blog.

Create Dynamic Table Headers With Power Query

$
0
0

In my last post, I looked at how to pull a named range into Power Query and turn it into a table of data.  Today we’re going to look at how to create dynamic table headers with Power Query, but using a slightly different, slightly more complicated way to do the same task.  Why?  Well, the reality is that sometimes the simple method won’t work for us.

Background

In this scenario, we’re going to use the same file I used in my last post, we’re just going to build the output differently.  The key that I’m after here is that I want to pull the data range into Power Query, but I want to use a table.  The issue, however, as I mentioned in my last post, is:

If I set up the table with headers in row 3, it would convert the dates to hard numbers, thereby blowing apart the ability to easily update the dynamic column headers next year.  That would defeat the purpose of making the input sheet dynamic in the first place.

I’ve really struggled with this feature in Tables, where it converts your headers to hard values.  So many of the tabular setups I create use dynamic headers, it’s actually more rare that they don’t.  So how do we work around this?  How do we create dynamic table headers in Excel?

Setting Up The Table For Success

It’s actually a LOT easier than you might think.  I use this very simple trick with both PivotTables and regular Tables, allowing me to take advantage of their power, but still control my headers and make them dynamic.  Here’s how:

Step 1: Add Static Headers Manually

  • Insert 2 rows below the current dynamic date headers in Row 3
  • Put some static headers in Row 5 that are generic but descriptive.  (In this case, CYM# means “Current Year, Month #”)

SNAGHTML1a520e81 The important part here is to make sure these static headers are each unique so that you can “unwind” them later with Power Query.

Step 2: Create The Table

Next, we need to create the table.  It’s going to cover A5:N24, and will therefore inherit the CYM column headers.  Since they are static values, it won’t make any changes to them, and my dynamic dates are still showing up top.

Step 3: Build a Translation Table

Huh?  A what?  Bear with me, as this will come clear a bit later.  Here’s how we do it:

  • Enter the following in B4:  =B3
  • Copy this across to cover B4:N4
  • Make sure B4:M4 is selected and go to Home—>Editing—>Find & Select –> Replace
  • Using the dialog, make the following two replacements:
    • Replace = with =$
    • Replace 3 with $3

This has the effect of making the formulas all absolute, which is important for our next step.

  • Select B4:N5 –> Right Click –> Copy
  • Select a cell down below your data range somewhere (I used A34)
  • Right click the cell and choose Paste Special
  • In the Paste Special options, choose the following:
    • Paste:  All
    • Check the Transpose box

Very cool, we’ve now got the beginnings of a table that is linked to the formulas in row 3.  We just need to finish it.

  • Add the column header of “Date” in A33
  • Add the column header of “Period” in B33
  • Format the data range from A33:B46 as a table
  • Rename the table to “DateTranslation”

SNAGHTML1a6575f9

Step 4: Final Header Cleanup

Row 4 has now served it’s purpose for us, so you can delete that, and then hide (the new) row 4.  The end result is that we have a header on our table that looks like it’s part of the table, but isn’t really.  The benefits here are that we can preserve the dynamic nature of it, and we have a wider variety of formatting options. SNAGHTML1a67304b We also have a completely separate date translation table too…

Setting Up The Power Query Scripts

So now we need to get the data into Power Query.  Let’s do that.

Step 1: Import and Reformat the Rounds Table

To do this we will:

  • Click somewhere in the main table
  • Power Query –> From Table
  • Remove the “TOTAL” column
  • Filter the first column to remove text that begins with “Total” and values that equal null
  • Filter the second column to remove null values
  • Right click the first column and Un-Pivot Other Columns
  • Rename the “Month” column to “Round Type”
  • Rename the query to “Budget”

At this point, you should have the following: SNAGHTML1afa967c This is great, but what about the Attribute column.  This is the whole big pain here about using a table, in that we don’t have dates.  Yes we could have hard coded them, but then it would be very painful to update our query when our year changes.  So while we have something flexible (in a way) here, it isn’t really all that readable.  How can we change that? Save and close the query, and let’s deal with this.

Step 2: Create Another Power Query

Let’s add the other table we built:

  • Click inside the DateTranslation table we created
  • Go to Power Query –> From Table
  • Click Close & Load –> Load To…
  • Click Only Create Connection –> Load

This will create a new Power Query that basically just reads your original table on demand.  It won’t refresh unless it’s called from another source.  Now we’re set to do something interesting.

Step 3: Re-Open The Budget Power Query

If the Workbook Queries pane isn’t open on the right, then go to Power Query –> Workbook Queries to show it.

  • Right click the Budget query and click Edit
  • On the Power Query Home tab, click Merge Queries (in the Combine group)
  • Select the Attribute column
  • From the drop down box, choose Date Translation
  • Select the Period column
  • Make sure “Only Include Matching Rows” is checked and click OK

image At this point you’ll get a new column of data in your Query.  Click the icon in the top right of the column to expand it: image Excellent… now we only need to pick the column we need, which is Date.  So uncheck the Period column and click OK. Finally we can remove the Attribute column and rename the “NewColumn.Date” column to Date and we’ve got a pretty clean query: SNAGHTML1b05acce At this point we could call it a day, as we’ve pretty much accomplished the original goal.  I can update the Year cell in B1 and my Table’s “Headers” will update.  In addition, my Power Query will show the correct values for the dates as well.  Pretty cool, as I could now link this into Power Pivot and set a relationship against a calendar table without having to worry about how it would be updated.

Going One Level Deeper

One thing I don’t like about this setup is the need for the extra query.  That just seems messy, and I’d prefer to see just one query in my workbook.  The issue though, is that I’m pulling data from two tables.  What’s cool though, is that with a little editing of the M code, I can fix that. Here’s the M for the query I’ve built, with a key line highlighted (twice): image As you can see, the “Merge” line is coloured yellow, but the name of the Query being merged is in orange.  Well guess what, we don’t need to reach to an external query here, we can reach to another named step in M.  Try this: Immediately after the “Let” line, enter the following:

TranslationTable = Excel.CurrentWorkbook(){[Name="DateTranslation"]}[Content],

Now, modify the “Merge” line to update the name of the table from “DateTranslation” to “TranslationTable”.  (The reason we’re doing this is that the original query still exists, so we can’t just name the first step “DateTranslation”, as it will conflict. Once we’ve made our modifications, the script will look as follows: image When you click “Done”, the query will reload and you’ll see an extra step in the “Applied Steps” box on the right.  What you won’t see though, are any changes, as the data comes out the same.  Very cool, as we are now referencing both tables in a single step.  To prove this out, save this query, drop back to Excel and delete the “DateTranslation” query.  It will still work! (The completed file can be downloaded here.)

Ending Thoughts

I really like this technique.  It let’s me dynamically change the column names, yet still use those to link them into my data model tables.  But even more I like the ability that, with a minor edit to the M code, I can keep my workbook from being littered with extra queries.  :)

The post Create Dynamic Table Headers With Power Query appeared first on The Ken Puls (Excelguru) Blog.

Refresh Power Query With VBA

$
0
0

When I’ve finished building a solution in Excel, I like to give it a little polish, and really make it easy for my users to update it.  The last thing I want to do is to send them right clicking and refreshing everywhere, so I program a button to refresh Power Query with VBA.

The interesting part about the above statement is that Power Query doesn’t have any VBA object model, so what kind of black magic trick do we need to leverage to pull that off?  As it turns out, it’s very simple… almost too simple in fact.

A Simple Query

Let’s just grab the sample data file from my post on pulling Excel named ranges into Power Query.  Once we’ve done that:

  • Click in the blue table
  • Go to Power Query –> From Table
  • Let’s sort Animal ascending (just so we know something happened)
  • Next save and Exit the query

At this point, we should get a new “Sheet2” worksheet, with our table on it:

SNAGHTML34cf68

 The Required VBA Code

Next, we need to build our VBA for refreshing the table.  Rather than record and tweak a macro, I’m just going to give you the code that will update all Query Tables in the entire workbook in one shot.  But to use it, you need to know the secret handshake:

  • Press Alt + F11

This will open the Visual Basic editor for you.  If you don’t see a folder tree at the left, then press CTRL+R to make it show up.

  • Find your project in the list (It should be called “”VBA Project (Selecting Data.xlsx)”
  • Right click that name and choose “Insert Module”
  • In the window that pops up, paste in the following code:

Public Sub UpdatePowerQueries()
‘ Macro to update my Power Query script(s)

Dim cn As WorkbookConnection

For Each cn In ThisWorkbook.Connections
If Left(cn, 13) = “Power Query -” Then cn.Refresh
Next cn
End Sub

Now, I’ll admit that I find this a little looser than I generally like.  By default, all Power Query scripts create a new connection with the name “Power Query –“ the name of your query.  I’d prefer to check the type of query, but this will work.

Speaking of working, let’s prove it…  But first, close the Visual Basic Editor.

Proving The Refresh Works

The easiest way to do this is to go back to the table on Sheet 1 and add a new row to the table.  I’m going to do that first, then I’m going to:

  • Press Alt + F8
  • Choose “UpdatePowerQueries”
  • Click Run
  • Go back to Sheet2 to verify it’s updated

If all goes well,  you should now have another row of data in your table, as I do:

image

Adding Polish

Let’s face it, that’s probably harder than going to Data –> Refresh All.  The goal here was to make it easier for my users.  So let’s do that now.

  • Return to Sheet 1
  • Go to the Developer Tab (if you don’t see it, right click the ribbon, choose “Customize Ribbon” and check the box next to the Developer tab to expose it)
  • Click Insert and select the button in the top left

image

  • Left click and drag a button onto your worksheet

When you let go, you’ll be prompted to assign a macro.

  • Choose “UpdatePowerQueries” and click OK
  • While the dots are still on the corners, click in the text
  • Backspace it out and replace it with something helpful like “Update Queries” (if you click elsewhere, you’ll need to right click the button to get the selection handles back.)
  • Click in the worksheet to de-select the button

SNAGHTML57873a

That’s it.  Test it again by adding some more data to the table then clicking the button.

Ramifications

I write a lot of VBA to make my users lives easier, and generally use this kind of technique as part of a bigger goal.  But regardless, it can still be useful as a stand alone routine if you want to avoid having to train users on how to do things through the ribbon.

The post Refresh Power Query With VBA appeared first on The Ken Puls (Excelguru) Blog.


Tame Power Query Workbook Privacy Settings

$
0
0

I recently built a cool Excel solution at work that uses Power Query to reach out and grab some weather data, then upload it into database.  We use that data in our dashboards, as weather is pretty important to the golf industry in which I work.  But then I went to deploy the file, and needed to find a way to tame the Power Query Workbook Privacy settings.

The Issue

What happens is, every time a user runs a Power Query that was last saved by another user, they are prompted to set the Workbook’s Privacy level.  This is maddening, as we have two staff that use this workbook, where one covers for the other while they’re away.  Naturally, long lapses of time can occur in between… just long enough to forget what to do when you’re prompted by this frustrating message:

image

So while I can (and have) set the privacy level for the web data they are going to retrieve (see my tool here), I have no way to permanently set the Workbook’s Privacy level.  Worse is that, if the user clicks Cancel, or chooses the wrong privacy level, (even trying to protect the output table structure using Chris Webb’s technique here,) fails.  The table generates an error, and all business logic in the workbook is blown apart.  The only recourse is to exit the file without saving and try again.

Naturally, this concerns them, and they call to make sure they do the right thing.  That’s awesome, and I wouldn’t change that at all.  But the deeper underlying problem is that Power Query’s Workbook security is engineered to require developer interaction.  And that is bad news!

How to Tame Power Query Workbook Privacy Settings

Unfortunately I can’t use VBA to set this in the workbook (or at least, I haven’t tried, anyway), but I can work a little trick to at least warn my users when they are about to be prompted, and remind them which privacy level they need to select.  Here’s what I need to do in order to make that work:

Step 1: Create a range to hold the last user

  • Create a named range somewhere in the workbook called “rngLastUser”  (I put mine on the “Control Panel” worksheet and hid it.)
  • Drop the following code in a Standard VBA Module:

Public Sub UpdateLastUser()
With This
Workbook.Worksheets(“Control Panel”)
.Range(“rngLastUser”) = Application.UserName
End With
End Sub

Step 2: Create a macro to update your Power Queries

Public Sub GetWeather()
With ThisWorkbook
‘Check if privacy level will need to be set
If .Worksheets(“Control Panel”).Range(“rngLastUser”) <> Application.UserName Then
MsgBox “You are about to be prompted about Privacy Levels” & vbNewLine & _
“for the Current Workbook. When the message pops up, ” & vbNewLine & _
“you’ll see an option to ‘Select’ to the right side of the Current Workbook.” & vbNewLine & _
vbNewLine & _
“Please ensure you choose PUBLIC from the list.”, vbOKOnly + vbInformation, _
“Hold on a second…”
End If

‘Refresh the Power Query table
.Worksheets(“Weather”).ListObjects(“WeatherHistory”).QueryTable.Refresh BackgroundQuery:=True
End With
Call UpdateLastUser
End Sub

Step 3: Link the macro to a button for your users

  • Link my GetWeather() routine to a button
  • And we’re good!

What Did We Do?

So basically, what I did here was this:

  • Every time the user clicks the button…
  • Excel checks the contents of rngLastUser to see if the username is the same as the current user
    • If it is, it just goes on to refresh the table
    • If it’s not, it kicks up the following message:

image

    • After the user clicks OK (if necessary), then it prompts the user to set the security level. Yes, they can still get it wrong, but at least they have a chance now!
    • Once the security level is set, the macro goes on to refresh the table
  • After the table is refreshed, Excel updates the rngLastUser cell to the name of the current user.

And that’s it.  We now have a system that will prompt our users with the correct answer, so that they don’t have to come back and ask us every time.

Thoughts On The Security Model

Ideally it would be nice to not have to do this, and there is – in fact – a way.  Microsoft’s answer is “Yes, just enable Fast Combine.”  That’s great and all, but then it ignores all privacy levels and runs your query.  What if, as a developer, I need a way to ensure that what I’ve built stays Public, Private or Organizational?  What it if actually matters?

To me, Fast Combine is akin to setting Macro Security to Low in Excel 2003 and earlier.  Basically, we’re saying “I don’t like nagging messages, so let’s run around in a war zone with no bullet proof vest.”  Sure, you might come out alive, but why should you take that risk?

In my opinion, the Power Query security model needs work.  Even if we could assign a digital certificate to the query to prove it (and the privacy levels) had not been modified, that would be great.  I’m not sure exactly what it is we need, but we need something better than what we have today.

The post Tame Power Query Workbook Privacy Settings appeared first on The Ken Puls (Excelguru) Blog.

Name Columns During a Merge

$
0
0

The timing for the release of the October Power Query update couldn’t really have been much better for me, especially since it’s got a cool new feature in that you can now name columns during a merge.  Why is this timing so cool?

The reason is that I’m at the MVP Summit in Redmond this week, an event where the MVP’s and Microsoft product teams get together to discuss the things that are/aren’t working in the products, and discuss potential new features.  The tie in here is that I already blogged about the new method for Merging Columns With Power Query that came into effect with the August update and, in that blog post, I said:

I do wish this interface had a way to name the column in advance (like exists when you create a custom column.)  Hopefully the PQ team will retrofit us with that ability at some point in the future.

The October Power Query update (released Oct 27, 2014), includes the ability to name columns during a merge, rather than making you do this in two steps.  How cool is that?

image

While I’m under no illusions that this change was made based on my feedback alone, I do know that the Power Query team reads my blog, so it played a part.  This is one of the big reasons I go to the summit every year, to share ideas that make the product more intuitive/usable, so it’s cool to see one of those changes make it into the product a few days before I arrive.

By the time this post is published on Wednesday morning, we’ll already be into our final day of the 2014 Summit.  I’m going in pretty jazzed though, as I know that the product teams are listening and reacting to our feedback.  :)

The post Name Columns During a Merge appeared first on The Ken Puls (Excelguru) Blog.

Merge Multiple Files With Properties

$
0
0

This post illustrates a cool technique that I learned at the MVP summit last week, allowing us to use Power Query to merge multiple files with properties from the file in the output.  A specific example of where this is useful is where you have several files with transactional data, saved with the month as the file name, but no date records in the file itself.  What we’d want to do is merge all the contents, but also inject the filename into the records as well.

The funny thing about this technique is that it’s eluded me for a long time, mainly because I way over thought the methods needed to pull it off.  Once you know how, it’s actually ridiculously simple, and gives us some huge flexibility to do other things.  Let’s take a look at how it works.

If you’d like to download the files I’m using for this example, you can get them here.  You’ll find that there are 3 files in total: Jan 2008.csv, Feb 2008.csv and Mar 2008.csv.

Step 1: Create your query for one file

The first thing we need to do is connect to the Jan 2008.csv file and pull in it’s contents.  So let’s do that:

  • Power Query –> From File –> From CSV
  • Browse to the Jan 2008.csv file and import it
  • Rename the “Sum of Amount” column to “Amount”

Perfect, we now have a basic query:

SNAGHTML63ab03

Notice here how the January file has no dates in it?  That’s not good, so that’s what we’re trying to fix.

Step 2: Turn the query into a function

At this point we need to do a little code modification.  Let’s go into the Advanced editor:

  • View –> Advanced Editor

We need to do two things to the M code here:

  • Insert a line at the top that reads:  (filepath) =>
  • Replace the file path in the Source step (including the quotes) with:  filepath

At that point our M will read as follows:

image

We can now:

  • Click Done
  • Rename our Query to something like fnGetFileContents
  • Save and close the query

Power Query doesn’t do a lot for us yet, just giving us this in the Workbook Queries pane:

image

Step 3: List all files in the folder

Now we’re ready to make something happen.  Let’s create a new query…

  • Power Query –> From File –> From Folder
  • Browse to the folder that holds the CSV files
  • Remove all columns except the Name and Folder Path

Wait, what?  I removed the column with the binary content?  The column that holds the details of the files?  You bet I did!  You should now have a nice, concise list of files like this:

image

Next, we need to add a column to pull in our content, via our function.  So go to:

  • Add Column –> Add Custom Column
  • Enter the following formula:  =fnGetFileContents([Folder Path]&[Name])

Remember it is case sensitive, but when you get it right, some magic happens.  We get a bunch of “Table” objects in our new column… and those Table objects hold the contents of the files!

I know you’re eager to expand them, but let’s finish prepping the rest of the data first.

Step 4: Prep the rest of the data you want on the table rows

Ultimately, what I want to do here is convert the file name into the last day of the month.  In addition, I don’t need the Folder Path any more. So let’s take care of business here:

  • Remove the “Folder Path” column
  • Select the “Name” column –> Transform –> Replace Values –> “.csv” with nothing
  • Select the “Name” column –> Transform –> Date Type –> Date
  • Select the “Name” column –> Transform –> Date –> Month –> End of Month

And we’ve now got a pretty table with our dates all ready to go:

image

Step 5: Expand the table

The cool thing here is that, when we expand the table, each row of the table will inherit the appropriate value in the first column.  (So all rows of the table in row 1 will inherit 2/29/2008 as their date.)

  • Click the little icon to the top right of the Custom column
  • Click OK (leaving the default of expanding all columns)
  • Rename each of the resulting columns to remove the Custom. prefix

And that’s it!  You can save it and close it, and it’s good to go.

A little bit of thanks

I want to throw a shout out to Miguel Llopis and Faisal Mohamood from the Power Query team for demonstrating this technique for the MVP’s at the summit.  I’ve been chasing this for months, and for some reason tried to make it way more complicated than it needs to be.

What’s the next step?

The next logical step is to extend this to working with Excel files, consolidating multiple Excel files together; something we can’t do through the UI right now.  Watch this space, as that technique is coming soon!

The post Merge Multiple Files With Properties appeared first on The Ken Puls (Excelguru) Blog.

Combine Multiple Worksheets Using Power Query

$
0
0

In last week’s post we looked at how to combine multiple files together using Power Query.  This week we’re going to stay within the same workbook, and combine multiple worksheets using Power Query.

Background Scenario

Let’s consider a case where the user has been creating a transactional history in an Excel file.  It is all structured as per the image below, but resides across multiple worksheets; one for each month:

image

As you can see, they’ve carefully named each sheet with the month and year.  But unfortunately, they haven’t formatted any of the data using Excel tables.

Now the file lands in our hands (you can download a copy here if you’d like to follow along,) and we’d like to turn this into one consolidated table so that we can do some analysis on it.

Accessing Worksheets in Power Query

Naturally we’re going to reach to Power Query to do this, but how do we get started?  We could just go and format the data on each worksheet as a table, but what if there were hundreds?  That would take way too much work!

But so far we’ve only seen how to pull Tables, Named Ranges or files into Power Query.  How do we get at the worksheets?

Basically, we’re going to start with two lines of code:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

You’ll now see the following blank query:

let
Source = “”
in
Source

What we need to do is replace the second line (Source = “”) with the following two lines of code:

FullFilePath = “D:\Temp\Combine Worksheets.xlsx”,
Source = Excel.Workbook(File.Contents(FullFilePath))

Of course, you’ll want to update the path to the full file path for where the file is saved on your system.

Once you click Done, you should see the following:

image

Cool!  We’ve got a list of all the worksheets in the file!

Consolidating the Worksheets

The next step is to prep the fields we want to preserve as we combine the worksheets.  Obviously the Name and Item columns are redundant, so let’s do a bit of cleanup here.

  • Remove the Kind column
  • Select the Name column –> Transform –> Data Type –> Date
  • Select the Name column –> Transform –> Date –> Month –> End of Month
  • Rename the Name column to “Date”

At this point, the query should look like so:

image

Next we’ll click the little double headed arrow to the top right of the data column to expand our records, and commit to expanding all the columns offered:

SNAGHTML52dcfe5

Hmm… well that’s a bit irritating.  It looks like we’re going to need to promote the top row to headers, but that means we’re going to overwrite the Date column header in column 1.  Oh well, nothing to be done about it now, so:

  • Transform –> Use First Row As Headers –> Use First Row As Headers
  • Rename Column1 (the header won’t accept 1/31/2008 as a column name) to “Date” again
  • Rename the Jan 2008 column (far right) to “Original Worksheet”

Final Cleanup

We’re almost done, but let’s just do a bit of final cleanup here.  As we set the data types correctly, let’s also make sure that we remove any errors that might come up from invalid data types.

  • Select the Date column
  • Home –> Remove Errors
  • Set Account and Dept to Text
  • Set Amount to Decimal Number
  • Select the Amount column
  • Home –> Remove Errors
  • Set Original Worksheet to Text

Rename the query to “Consolidated”, and load it to a worksheet.

Something Odd

Before you do anything else, Save the File.

To be fair, our query has enough safe guards in it that we don’t actually have to do this, but I always like to play it safe.  Let’s review the completed query…

Edit the Consolidated query, and step into the Source line step.  Check out that preview pane:

image

Interesting… two more objects!  This makes sense, as we created a new table and worksheet when we retrieved this into a worksheet.  We need to filter those out.

Getting rid of the table is easy:

  • Select the drop down arrow on the Kind column
  • Uncheck “Table”, then confirm when asked if you’d like to insert a step

Select the next couple of steps as well, and take a look at the output as you do.

Aha!  When you hit the “ChangedType” step, something useful happens… we generate an error:

image

Let’s remove that error from the Name column.

  • Select the Name column –> Home –> Remove Errors

And we’re done.  We’ve managed to successfully combine all the data worksheets in our file into one big table!

Some Thoughts

This method creates a bit of a loop in that I’m essentially having to reach outside Excel to open a copy of the workbook to pull the sheet listing in.  And it causes issues for us, since Power Query only reads from the last save point of the external file we’re connecting to (in this case this very workbook.)  I’d way rather have an Excel.CurrentWorkbook() style method to read from inside the file, but unfortunately that method won’t let you read your worksheets.

It would also be super handy to have an Excel.CurrentWorkbookPath() method.  Hard coding the path here is a real challenge if you move the file.  I’ve asked Microsoft for this, but if you think it is a good idea as well, please leave a comment on the post.  (They’ll only count one vote from me, but they’ll count yours if you leave it here!)

The post Combine Multiple Worksheets Using Power Query appeared first on The Ken Puls (Excelguru) Blog.

Building a Parameter Table for Power Query

$
0
0

One of the things that I’ve complained about in the past is that there is no built in way for Power Query to be able to pull up the path for the current workbook.  Today I’m going to show you how I solved that problem by building a parameter table for Power Query in Excel, then link it into my queries.

Quick Note!  WordPress is determined to drive me crazy and is replacing all of my straight quotes (SHIFT + ‘) with curly quotes.  Since curly quotes are different characters, they will cause issues if you copy/paste the code directly.  I’ll get this sorted out, but in the mean time, just make sure you replace all “curly quote” characters with straight quotes.

To do this successfully, we need to pieces; an Excel table, and a Power Query function.  So let’s get to it.

Building a Parameter Table

The table is quite simple, really.  It’s a proper Excel Table, and it has a header row with the following two columns:

  • Parameter
  • Value

Once created, I also make sure that I go to the Table Tools tab, and rename the table to “Parameters”.

SNAGHTML192eb95

Pretty bare bones, but as you’ll see, this becomes very useful.

Now we add something to the table that we might need.  Since I’ve mentioned it, let’s work out the file path:

  • A8:     File Path
  • B8:     =LEFT(CELL(“filename”,B6),FIND(“[“,CELL(“filename”,B6),1)-1)

Now, as you can see, column A essentially gives us a “friendly name” for our parameter, where the value ends up in the second column:

SNAGHTML195b12d

While we’re here, let’s add another parameter that we might have use for.  Maybe I want to base my power query reports off the data for the current day.  Let’s inject that as well:

  • A9:     Start Date
  • B9:     =TODAY()

SNAGHTML197a4ef

Good stuff.  We’ve now got a table with a couple of useful parameters that we might want when we’re building a query.

Adding the Parameter Function

Next, we’re going to add the function that we can reference later.  To do that:

  • Go to Power Query –> From Other Sources –> Blank Query
  • Go to View –> Advanced Editor
  • Replace all the code with the following:

(ParameterName as text) =>
let
ParamSource = Excel.CurrentWorkbook(){[Name=”Parameters”]}[Content],
ParamRow = Table.SelectRows(ParamSource, each ([Parameter] = ParameterName)),
Value=
if Table.IsEmpty(ParamRow)=true
then null
else Record.Field(ParamRow{0},”Value”)
in
Value

  • Click Done
  • Rename the function to “fnGetParameter”
  • Go to Home –> Close & Load To…
  • Choose to create the connection only, avoiding loading to a table or the Data Model

And just in case you think this means it didn’t work, we expect to see that it didn’t load in the queries window:

image

Making Use of the Parameter Table

Okay, so now that we have this in place, how do we go about using it?

Let’s create a ridiculously simple table:

SNAGHTML1a313c0

Now, click in the table and go to Power Query –> From Table.

We’ll be taken into the Power Query window and will be looking at our very simple data.  Let’s pull the contents of my parameters into columns:

  • Go to Add Column –> Add Custom Column
  • Change the title to “File Path”
  • Enter the following formula: =fnGetParameter(“File Path”)

Check it out!

image

Do you see that?  This is the path to the folder that holds the workbook on my system.  The formula we used in the table retrieves that, and I can pass it in to Power Query, then reference it as needed!

How about we do the same for our date?

  • Go to Add Column –> Add Custom Column
  • Change the title to “First date”
  • Enter the following formula: =fnGetParameter(“Start Date”)

image

The key here is to make sure that the value passed to the parameter function is spelled (and cased) the same as the entry in the first column of the parameter table.  I.e. You could use “FilePath”, “File Path”, “Folder”, “My File Path” or whatever, so long as that’s the name you gave it in the first column of the Parameters Excel table.

And what happens if you pass an invalid value?  Say you ask for fnGetParameter(“Moldy Cheese”) and it’s not in the table?  Simple, you’ll get null returned instead.  :)

Implications of Building a Parameter Table for Power Query

The implications for this technique are huge.  Consider this scenario… you create your workbook, and store it in a folder.  But within that folder you have a subfolder called “Data”.  Your intention is to store all of your csv files in that folder.  And, for argument’s sake, let’s say that it’s a mapped drive, with the path to your solution being “H:\My Solution\”

No problem, you build it all up, and it’s working great.  You keep dropping your text files in the data folder, and you can consolidate them with some M code like this:

let
Source = Folder.Files(“H:\My Solution\Data”),
#”Combined Binaries” = Binary.Combine(Source[Content]),
#”Imported CSV” = Csv.Document(#”Combined Binaries”,null,”,”,null,1252),
#”First Row as Header” = Table.PromoteHeaders(#”Imported CSV”)
in
#”First Row as Header”

Things run along for ages, and that’s awesome, but then you need to go on vacation.  No worries, it’s Power Query and easy to use, you can just get your co-worker to update it… except… on your co-worker’s machine that same drive is mapped not to the H:\ drive, but the F:\ drive.  Doh!

We could recode the path, but what a pain.  So how about we use the parameter table to make this more robust so that we don’t have to?  All we need to do is modify the first couple of lines of the query.  We’ll pull in a variable to retrieve the file path from our parameter table, then stuff that into the file path, like this:

let
SolutionPath = fnGetParameter(“File Path”),
    Source = Folder.Files(SolutionPath & “Data”),
#”Combined Binaries” = Binary.Combine(Source[Content]),
#”Imported CSV” = Csv.Document(#”Combined Binaries”,null,”,”,null,1252),
#”First Row as Header” = Table.PromoteHeaders(#”Imported CSV”)
in
#”First Row as Header”

How awesome is that?  Even better, the SolutionPath shows as a step in the Applied Steps section.  That means you can select it and make sure the value is showing up as you’d expect!

Practical Use Case

Several months back I built a solution for a client where we stored his Power Query solution in a folder, and we had data folders that were created on a bi-weekly basis.  Each of those folders were named based on the pay period end data (in ISO’s yyyy-mm-dd format), and were stored in a path relative to the master solution.

Naturally, I needed a way to make the queries dynamic to the master folder path, as I did the development in my home office, then emailed the updated solution files to him in his New York office.  He had different drive mappings than I do, and his team had different drive mappings than he did.  With half a dozen Power Queries in the solution, having them manually update the code each time a new user wanted to work with the file just wasn’t an option.

This technique became invaluable for making the solution portable.  In addition, by having a formula to generate the correct end date, we could also pull the data files from the right place as well.

I still want a Power Query CurrentWorkbook.FilePath method in M, but even if I get it this technique is still super useful, as there will always be some dynamic parameter I need to send up to Power Query.

I hope you find this as useful as I have.

The post Building a Parameter Table for Power Query appeared first on The Ken Puls (Excelguru) Blog.

Force Power Query to Import as a Text File

$
0
0

I’ve run into this issue in the past, and also got an email about this issue this past week as well, so I figured it’s worth taking a look.  Power Query takes certain liberties when importing a file, assuming it knows what type of file it is.  The problem is that sometimes this doesn’t work as expected, and you need to be able to force Power Query to import as a text file, not the file format that Power Query assumes you have.

IT standards are generally a beautiful thing, especially in programming, as you can rely on them, knowing that certain rules will always be followed.  CSV files are a prime example of this, and we should be able to assume that any CSV file will contain a list of Comma Separated Values, one record per line, followed by a new line character.  Awesome… until some bright spark decides to inject a line or two of information above the CSV contents which doesn’t contain any commas.  (If you do that, please stop.  It is NOT A GOOD IDEA.)

The Issue in the Real World

If you’d like to follow along, you can click here to download MalformedCSV.csv (the sample file).

If you open the sample file in Notepad, you’ll see that it contains the following rows:

SNAGHTML35a6fc

Notice the first row… that’s where our issue is.  There are no commas.  Yet when you look at the data in the rows below, they are plainly separated by commas.  Well yay, so what, right?  Who cares about the guts of a CSV file?  The answer is “you” if you ever get one that is built like this…

Let’s try importing the sample file into Power Query:

  • Power Query –> From File –> From CSV
  • Browse to MalformedCSV.csv

And the result is as follows:

SNAGHTML390f9a

One header, and lots of errors.  Not so good!

The Source Of The Error

If I click the white space beside one of those errors, I get this:

image

What the heck does that even mean?

Remember that CSV is a standard.  Every comma indicates a column break, every carriage return a new line.  And we also know that every CSV file has a consistent number of columns (and therefore commas) on every row.  (That’s why you’ll see records in some CSV’s that read ,, – because there isn’t anything for that record, but we still need the same number of commas to denote the columns.

And now some joker builds us a file masquerading as a CSV that really isn’t.  In this case:

  • Our first row has no commas before the line feed.  We therefore must have a one column table.  Power Query sets us up for a one column table.
  • But our second row has three commas, which means three columns… That’s not the same number of columns as our header row, so Power Query freaks out and throws an error for every subsequent record.

So Now What?

If we can’t rely on the file format being correct, why don’t we just import it as a text file?  That would allow us to bring all the rows in, remove the first one, then split by commas.  That sounds like a good plan.  Let’s do it.

  • Power Query –> From File –> From Text
  • Browse to the folder that holds MalformedCSV.csv

Uh oh… our file is not showing.  Ah… we’re filtered to only show *.txt and *.prn files…

  • Click the file filter list in the bottom right and change “Text File (*.txt;*.prn)” to “All Files (*.*)”
  • Open MalformedCSV.csv

And the result…

SNAGHTML390f9a

Damn.  See, Power Query is too smart.  It looks at the file and says “Hey!  That’s not a text file, it’s a CSV file!” and then imports it as a CSV file… which we already know has issues.  Grr…

Force Power Query to Import as a Text File

Let’s try this again, this time from scratch.  We need two things here…

  1. The full file path to the file you want to import (including the file extension).  On my system it is “D:\Test\MalformedCSV.csv”
  2. A little bit of a code template, which is conveniently included below.

What we’re going to do is this:

  • Go to Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor
  • Paste in the following code

let
/* Get the raw line by line contents of the file, preventing PQ from interpreting it */
fnRawFileContents = (fullpath as text) as table =>
let
Value = Table.FromList(Lines.FromBinary(File.Contents(fullpath)),Splitter.SplitByNothing())
in Value,

/* Use function to load file contents */
Source = fnRawFileContents(“D:\Test\MalformedCSV.csv”)

in
Source

  • Update the file path in the “Source” step to the path to your file.
  • Click Done

And the output is remarkably different… in fact, it’s the entire contents of the file!

SNAGHTML4af3ce

This is awesome, as we now have the ability to clean up our data and use it as we were hoping to do from the beginning!  So let’s do just that…starting with killing off that first line that’s been screwing us up:

  • Home –> Remove Rows –> Remove Top Rows –> 1
  • Transform –> Split Column By Delimiter –> Comma –> At each occurrence of the delimiter
  • Transform –> User First Row As Headers

And now we’ve got a pretty nice table of data that actually looks like our CSV file:

SNAGHTMLa3388b

The final steps to wrap this up would essentially be

  • set our data types,
  • name the query. and
  • load the query to the final destination.

Final Thoughts

This seems too hard.  I know that Power Query is designed for ease of use, and to that end it’s great that it can and does make assumptions about your file.  Most of the time it gets it exactly right and there is no issue.  But when things do go wrong it’s really hard to get back to a raw import format so that we can take control.

I really think there should be an easy and discoverable way to do a raw import of data without making import/formatting assumptions.  A button for “Raw Text Import” would be a useful addition for those scenarios where stuff goes wrong and needs a careful hand.

I should also mention that this function will work on txt files or prn files as well.  In fact, it also works on Excel files to some degree, although the results aren’t exactly useful!

image

The key here is, whether caused by one of more extra header lines in a csv file, tab delimited, colon delimited or any other kind of delimited file, the small function template above will help you get past that dreaded message that reads “DataFormat.Error:  There were more columns in the result than expected.”

Addendum To This Post

Miguel Escobar has recorded a nice video of the way to do this without using any code.  You can find that here:

 

 

The post Force Power Query to Import as a Text File appeared first on The Ken Puls (Excelguru) Blog.

Treat Consecutive Delimiters As One

$
0
0

A couple of weeks back, I received a comment on my “Force Power Query to Import as a Text File” blog post.  Rudi mentioned that he’d really like to see a method that would let us Treat Consecutive Delimiters as One; something that is actually pretty easy to find in Excel’s old Import as Text file feature.

…the option for “Treat consecutive delimiters as one” should be made available too. The text import eventually imported but the info was all out of sync due to the tabbed nature of the data – names longer and shorter cause other data to not line up.

So let’s take a look at this issue today.  You can access and download the sample file I’ll be working with from this link.

The issue at hand

The contents of the text file, when viewed in Notepad, are fairly simple:

image

But the challenge shows up when we go to import it into Power Query.  Let’s do that now:

  • Power Query –> From File –> From Text
  • Browse to where you stored the file and double click it to Open it up

The first indication that we have an issue is that we’ve got a single column, the second is that it’s a bit scattered:

image

Well no problem, it’s a Tab delimited file, so we’ll just split it by Tabs, and all should be good, right?

  • Home –> Split Column –> By Delimiter
  • Choose Tab and click OK

image

Uh oh…  What happened here?

The issue is that we’ve got multiple delimiters acting between fields here.  They look like one, but they’re not.  Between Date and Vendor, for example, there are two quotes.  But between the Date values and the Vendor values only one Tab is needed to line it up in the text document.  The result is this crazy mess.

Approaches to the Issue

I can see three potential routes to deal with this problem:

  1. We could replace all instances of 2 Tab’s with a single Tab.  We’d need to do that a few times to ensure that we don’t turn 4 Tabs into 2 and leave it though.
  2. We could write a function to try and handle this.  I’m sure that this can be done, although I decided not to go that route this time.
  3. We could split the columns using the method that I’m going to outline below.

Before we start, let’s kill off the last two steps in the “Applied Steps” section, and get back to the raw import that we had at the very beginning.  (Delete all steps except the “Source” step in the Applied Steps window.)

How to Treat Consecutive Delimiters As One

The method to do this is a three step process:

  1. Split by the Left Most delimiter (only)
  2. Trim the new column
  3. Repeat until all columns are separated

So let’s do it:

  • Select Column1
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

Good, but see that blank space before Vendor?  That’s one of those delimiters that we don’t want.  But check what happens when you trim it…

  • Select Column1.2
  • Transform—>Format –> Trim

And look at that… it’s gone!

image

Let’s try this again:

  • Select Column1.2
  • Split Column –> By Delimiter –> Tab –> At the left-most delimiter

image

And trim the resulting column again:

  • Select Column1.2.2
  • Transform—>Format –> Trim

And we’re good.  Now to just do a final bit of cleanup:

  • Transform –> Use First Row as Headers
  • Name the Query

And we’re finished:

image

Easy as Pie?

In this case, yes it was.  Easy, if a bit painful.  But what about spaces?  If this file was space delimited I couldn’t have done this, as my Vendors all have spaces in them too.  So then what?

I’d modify my procedure just a bit:

  1. Replace all instances of double spaces with the | character
  2. Split by the left most | character
  3. Replace all | characters with spaces
  4. Trim the new column
  5. Repeat until finished

Final Thought

I haven’t received too much in the way of data like this, as most of my systems dump data into properly delimited text or csv files, but I can certainly see where this is an issue for people.  So I totally agree that it would be nice if there were an easier way to treat consecutive delimiters as one in Power Query.

I do like the versatility of the choices we have currently, but adding this as another option that works in combination with the existing ones would be fantastic.

Power Query team: here’s my suggestion…

image

:)

The post Treat Consecutive Delimiters As One appeared first on The Ken Puls (Excelguru) Blog.


Transpose Stacked Tables

$
0
0

For the first post of the new year, I thought I’d tackle an interesting problem; how to Transpose Stacked Tables in Power Query.  What’s do I mean by Stacked Tables?  It’s when your data looks like this:

image

Notice that we’ve got 3 tables stacked on top of each other with gaps.  The question is, how do we deal with this?

There’s actually a variety of different ways we could accomplish this, but I want to show a neat trick that allows us to refer to data on the next row(s) in Power Query this time.  We may revisit this in future with some other techniques as well, but for now… I think you’ll find this interesting.

Sample File

If you’d like to play along, click here to download the sample file, with a mock-up of a fictional Visa statement.

Getting Started

The first thing we need to do is pull the data into Power Query, so let’s go to Power Query –> From Table, and set the range to pull in all the data from A1:A17:

image

We end up with the table in the Power Query window, and we’re now going to add an Index column to it; something you’re going to see can be very useful!  To do this, go to Add Column –> Add Index Column (you can start from 0 or 1, your preference.  I’m just going to go with 0):

image

Now, for simplicity, I am going to make an unnecessary change to start with.  What I’m going to do is – in the “Applied Steps” section, I’m going to right click the “Added Index” line, and choose Rename, then rename this step to “AddedIndex” with no space:

image

Transpose Stacked Tables – The Tricky Part

Go to Add Column –> Add Custom Column.  In the window that pops up:

  • Name the column “Location”
  • In the formula area, enter:  AddedIndex{[Index]+1}[Transactions]

And the result:

image

Wow… how cool is that?  We’ve referred to the value on the next row!  But how?  The secret is in the syntax.  It basically works like this:

Name of previous step{[Index] + 1}[Name of Column to Return]

Watch all those brackets carefully too.  The curly ones go around the index row you want to return (the index number of the current row plus 1), and the square brackets around the name of the column you want.

Now, let’s do the next row.  Add a new column again:

  • Name the column “TransactionID”
  • In the formula area, enter:  #”Added Custom”{[Index]+2}[Transactions]

Okay, so what’s with that?  Why the # and quotes around the previous step this time?  The answer is that, in order to read the column name with the space, we need to wrap the column’s name in quotes and preface it with the # mark.  This tells Power Query to interpret everything between the quotes as a literal (or literally the same as what we wrote.)  As you can see, it works nicely:

image

Just to circle back on the unnecessary step I mentioned before, it was renaming the “Added Index” step.  Doing that saved me having the type #”Added Index”.  Personally I can’t stand all the #”” in my code, so I tend to modify my steps to drop the spaces.  It looks cleaner to me when I’m reading the M code.

At any rate, let’s do the last piece we need too.  Add another column:

  • Name the column “Value”
  • In the formula area, enter:  #”Added Custom1”{[Index]+3}[Transactions]

image

Beautiful… I’ve got each row of data transposed into the table the way I need it, but I’ve still got a bunch of garbage rows…

Taking Out The Trash

As it stands, we really only need the rows that start with the dates.  Again, we have multiple options for this, but I see a pattern I can exploit.  I need to:

  • Keep 1 row
  • Remove 5 rows
  • Repeat

How do we do that easily?  Go to the Home Tab –> Remove Rows and choose Remove Alternate Rows!

image

And finally we can get rid of the Index column, set our Data Types, and we’re all set:

image

And there you have it.  Just one of a few ways to Transpose Stacked Tables using Power Query.

The post Transpose Stacked Tables appeared first on The Ken Puls (Excelguru) Blog.

Slicers For Value Fields

$
0
0

Earlier this week I received an email asking for help with a Power Pivot model.  The issue was that the individual had built a model, and wanted to add slicers for value fields.  In other words, they’d built the DAX required to generate their output, and wanted to use those values in their slicers.  Which you can’t do.  Except maybe you can…  :)

My approach to solve this issue is to use Power Query to load my tables.  This gives me the ability to re-shape my data and load it into the data model the way I need it.  I’m not saying this is the only way, by any means, but it’s an approach that I find works for me.  Here’s how I looked at it in Excel 2013.  (For Excel 2010 users, you have to run your queries through the worksheet and into Power Pivot as a linked table.)

Background

The scenario we’re looking at is a door manufacturer.  They have a few different models of doors, each of which uses different materials in their production.  The question that we want to solve is “how many unique materials are used in the construction of each door?”  And secondarily, we then want to be able to filter the list by the number of materials used.

The first question is a classic Power Pivot question.  And the setup is basically as follows:

image

  • Create a PivotTable with models on rows and material on columns
  • Create a DAX measure to return the distinct count of materials:
    • DistinctMaterials:=  DISTINCTCOUNT(MaterialsList[material])
  • Add a little conditional formatting to the PivotTable if you want it to look like this:

image

The secret to the formatting is to select the values and set up an icon set.  Modify it to ensure that it is set up as follows:

image

Great stuff, we’ve got a nice looking Pivot, and you can see that our grand total on the right side is showing the correct count of materials used in fabricating each door.

Creating Slicers For Value Fields

Now, click in the middle of your Pivot, and choose to insert a slicer.  We want to slice by the DistinctMaterials measure that we created… except.. it’s not available.  Grr…

image

Okay, it’s not surprising, but it is frustrating.  I’ve wanted this ability a lot, but it’s just not there.  Let’s see if we can use Power Query to help us with this issue.

Creating Queries via the Editor

We already have a great query that has all of our data, so it would be great if we could just build a query off of that.  We obviously need the original still, as the model needs that data to feed our pivot, but can we base a query off a query?  Sure we can!

  • In the Workbook Queries pane, right click the existing “MaterialsList” query and choose Edit.
  • You’ll be taken into the Power Query editor, and on the right side you’ll see this little collapsed “Queries” window trying to hide from you:

image

  • When you expand that arrow, you’ll see your existing query there!
  • Right click your MaterialsList query and choose “Reference”.

You’ve now got a new query that is referring to your original.  Awesome.  This will let us preserve our existing table in the Power Pivot data model, but reshape this table into the format that we need.

Building the Query we need

Let’s modify this puppy and get it into the format that will serve us.  First thing, we need to make sure it’s got a decent name…

  • On the right side, rename it to MaterialsCount

Now we need to narrow this down to a list of unique material/model combinations, then count them:

  • Go to Add Column –> Add Custom Column
  • Leave the default name of “Custom” and use the following formula:  [model]&[material]
  • Sort the model column in ascending order
  • Sort the material column in ascending oder

We’ve not got a nicely ordered list, but there’s a few duplicates in it.

SNAGHTML4800f6be[4]

Those won’t help, so let’s get rid of them:

  • Select the “Custom” column
  • Go to Home –> Remove Duplicates

Now, let’s get that Distinct Count we’re looking for:

  • Select the “model” column
  • Go to Transform –> Group By
  • Set up the Group By window to count distinct rows as follows:

image

Very cool!  We’ve now got a nice count of the number of distinct materials that are used in the production of each door.

The final step we need to do in Power Query is load this to the data model, so let’s do that now:

  • File –> Close & Load To…
  • Only create the connection and load it to the Data Model

Linking Things in Power Pivot

We now need to head into Power Pivot to link this new table into the Data Model structure.  Jump into the Manage window, and set up the relationships between the model fields of both tables:

image

And that’s really all we need to do here.  Let’s jump back out of Power Pivot.

Add Slicers for Value Fields

Let’s try this again now. Click in the middle of your Pivot and choose to insert a slicer.  We’ve certainly got more options than last time!  Choose both fields from the “MaterialsCount” table:

image

And look at that… we can now slice by the total number materials in each product!

image

The post Slicers For Value Fields appeared first on The Ken Puls (Excelguru) Blog.

How to Reference other Power Query queries

$
0
0

One of the things I really like to do with Power Query is shape data into optimized tables. In order to accomplish that goal, I’ve begun using Power Query to source data over Power Pivot’s built in methods. But in order to build things the way I want, I need an easy way to reference other power query queries.

Why would I go to the effort of feeding through Power Query first? I’m no SQL ninja, and I find Power Query allows me to easily re-shape data in ways that would be hard with my SQL knowledge. I can leverage this new tool to optimize my tables and build Power Pivot solutions that require less tricky and funky DAX measures to compensate for less than ideal data structure. (I’d rather have easy to understand relationships and simple DAX measures!)

Methodology

My methodology generally goes something like this:

  • Load a base table into a Power Query. I then set it to only create a connection. Let’s call this my Base Connection.
  • Next I’ll create as many queries as I need to re-shape the data in the Base Connection into the forms I need, then load those into the data model.

It’s that second part that is key. I need to be able to reference other Power Query queries (namely my Base Connection) so that I could prune/trim/re-shape the data.

Reference other Power Query queries – The Old Way

Until recently, I would create my Base Connection, then I’d do the following to create the new query to reference that one.

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and choose Reference

The problem was this… my intention was to reference and customize my query. Instead, it immediately loads it into a worksheet. I have to wait for that to finish before I can edit the new query and customize it the way I want.

Reference other Power Query queries – The New Way

I learned a new method last week from one of the Power Query team members which is much better (thanks Miguel!). I included it in my last post, but I thought this was worth calling out on its own.

Instead of following the method above, this time we will:

  • Go to the Power Query tab
  • Show the Workbook Queries pane
  • Right click the Base Connection query and Edit

Now we’re taken into the Power Query window. On the left side we can see a collapsed “Queries” pane. When you expand that, you get a list of all Power Queries in the workbook.

image_thumb13[1]

  • Right click the Base Connection query and choose “Reference”

We now have a new query in the editor that we can edit, without loading it into a worksheet first. Faster, and more in line with my goals.

The other thing I like about this method is that it immediately gives me access to that queries pane. Why is that important? Because I can drill through the other queries and get at their M code without having to close the window and go back to Excel first. So if I have some funky M code I need to re-use, it makes it way easier to review it and copy it.

The post How to Reference other Power Query queries appeared first on The Ken Puls (Excelguru) Blog.

Creating a VLOOKUP Function in Power Query

$
0
0

Tonight I decided to actually follow through on something I’d been musing about for a while:  building a full fledged VLOOKUP function in Power Query.  Why?  Yeah… that’s probably a good question!

Replicating VLOOKUP’s exact match is REALLY easy in Power Query.  You simply take two tables and merge them together.  But the approximate match is a little harder to do, since you don’t have matching records on each side to merge together.

Now, to be fair, you could go the route of building a kind of case statement, as Chris Webb has done here.  In actual fact, you probably should do that if you want something that is lean and mean, and the logic won’t change.  But what if you wanted to maintain a table in Excel that holds your lookup values, making it easy to update? Shouldn’t we be able to take that and use it just like a VLOOKUP with an approximate match?  I don’t see why not.  So here’s my take on it.

Practical Use Cases

I see this as having some helpful use cases.  They’ll mostly come from Excel users who are experienced with VLOOKUP and maintain lookup tables, and reach back to that familiarity.  And they would probably be tempted to do something like this:

image

The concern, of course, is that landing data in the worksheet during this cycle contributes to file size, memory usage and ultimately re-calc speed, so if you can avoid this step on the way to getting it into Power Pivot, you plainly want to do that.

The cool thing is that by building this the way I’ve done it, you’re not restricted to landing your data in the worksheet to use VLOOKUP with it.  You can pull data into Power Query from any source (csv, text file, database, web page) and perform your VLOOKUP against your Excel table without that worksheet round trip.

Let’s Take a Look…

Now, I AM going to use Excel based data for this, only because I have a specific scenario to demonstrate.  You can download a sample file – containing just the data – from this link.  (The completed file is also available at the end of the post.)

So, we have a series of numbers, and want to look them up in this table:

image

I really used my imagination for this one and called it “LookupTable”.  Remember that, as we need that name later.  Note also that the first record is 1, not 0.  This was done to demonstrate that an approximate match can return a #N/A value, as you’ll see in a minute.

Now here’s what things would look like using standard Excel VLOOKUP formulas against that table:

image

Hopefully this makes sense.  The formulas in columns 2, 3 and 4 are:

  • =VLOOKUP([@Values],LookupTable,2,TRUE)
  • =VLOOKUP([@Values],LookupTable,3)
  • =VLOOKUP([@Values],LookupTable,2,FALSE)

Just to recap the high points here… column 2 declares the final parameter as ,TRUE which will give us an approximate match.  Column 3 doesn’t declare the final parameter, which will default to ,TRUE and give an an approximate match.  Column 4 declares the final parameter as ,FALSE which means we’ll want an exact match.  The end result is that only one value matches, which is why we get all those #N/A results.

Standard VLOOKUP stuff so far, right?

Creating the VLOOKUP function in Power Query

Before we get to using the function, we need to create it.  To do that we’re going to go to:

  • Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

Highlight all the code in that window and replace it with this… (yes, it’s not short)

let pqVLOOKUP = (lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>
let
/*Provide optional match if user didn't */
matchtype =
if approximate_match = null
then true
else approximate_match,

/*Get name of return column */
Cols = Table.ColumnNames(table_array),
ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ColName_match = Record.Field(ColTable{0},"Column1"),
ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),

/*Find closest match */
SortData = Table.Sort(table_array,{{ColName_match, Order.Descending}}),
RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),
RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),
ClosestMatch=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */
ClosestReturn=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */
Return =
if matchtype=true
then ClosestReturn
else
if lookup_value = ClosestMatch
then ClosestReturn
else "#N/A"
in Return
in pqVLOOKUP

Now:

All right… the function is there.  Now let’s go make use of it… (we’ll come back to how it works in a bit.)

Using the VLOOKUP function in Power Query

Now, before we go any further, I want to ask you a favour.  I need you to pretend for a second.  Pretend that the data we are connecting to next is a database, not an Excel table.  You’ll see how this can be useful if you’ll play along here.  (The only reason I’m using an Excel table for my source data is that it’s easier to share than a database.)

Let’s go click in the DataTable table.  (This one:)

image

Now, let’s upload this “database” into Power Query…

  • Go to Power Query –> From Table

You should have something like this now:

image

Funny how Power Query reads the #N/A values as errors, but whatever.  Let’s get rid of those columns so that we’re left with just the Values column.

  • Right click Values –> Remove Other Columns

Now, we’re going to make a really small M code edit.

  • Go to View –> Advanced Editor
  • Copy the second line (starts with Source =…)
  • Paste it immediately above the line you just copied
  • Modify it to read as follows:
    • Source –> LookupSource
    • DataTable –> LookupTable

Your M code should now look as follows:

image

  • Click Done

Nothing really appears to look different right now, but you’ll notice that you have an extra step called “LookupSource” on the right.  If you switch back and forth between that and Source, you’ll see we are looking at the original DataTable and the LookupTable.  The reason we do this is to make the next step really easy.

  • Go to Add Column –> Add Custom Column
  • Call the column 2 True
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,true)

Okay, so what’s what?

  • pqVLOOKUP is the name of our function we added above
  • [Values] is the value we want to look up
  • LookupSource is the table we want to look in to find our result
  • 2 is the column we want to return
  • true is defining that we want an approximate match

And, as you can see when you click OK, it works!

image

Let’s do the next two columns:

  • Go to Add Column –> Add Custom Column
  • Call the column 3 default
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,3)

So this time we asked for a return from the 3rd column, and we omitted the final parameter.  Notice that it defaulted to true for us:

image

Last one…

  • Go to Add Column –> Add Custom Column
  • Call the column 2 false
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,false)

And how about that, all but one comes back with #N/A:

image

And with that you can load this into a table in the worksheet:

image

Notice that the results are identical to that of the original Excel table, with one exception… the #N/A I have provided is text, not an equivalent to the =NA() function.

The completed file is available here.

How Does the VLOOKUP Function in Power Query Actually Work?

This VLOOKUP actually has some advantages over the VLOOKUP we all know and love.  The most important is that we don’t need to worry if the list is sorted or not, as the function takes care of it for you.  It essentially works like this:

  • Pull in the data table
  • Sort it descending by the first column
  • Remove all records greater than the value being searched for
  • Return the value in the requested column for the first remaining record UNLESS we asked for an Exact match
  • If we asked for an Exact match then it tests to see if the return is a match and returns #N/A if it’s not

Some key design principles I used here:

  • The parameters are all in EXACTLY the same order as Excel’s VLOOKUP
  • The required, optional and default parameters match what you already know and use in Excel
  • The function is dynamic in that it will work no matter what your lookup table column names are, how many rows or columns it has
  • It returns results that are in parallel with Excel’s output
  • The function is pretty much a drag’n’drop for your project.  The only thing you need to remember is to define the lookup table in the first part of your query

So how cool is that?  You love VLOOKUP, and you can now use it in Power Query to perform VLOOKUP’s from your Power Query sourced database queries against tables of Excel data without hitting the worksheet first!  (In fact, if your database has an approximate table, you could VLOOKUP from database table against database table!)

The post Creating a VLOOKUP Function in Power Query appeared first on The Ken Puls (Excelguru) Blog.

Date Formats in Power Query

$
0
0

Date formats in Power Query are one of those little issues that drives me nuts… you have a query of different information in Power Query, at least one of the columns of which is a date.  But when you complete the query, it doesn’t show up as a date.  Why is this?

Demonstrating the Issue

Have a look at the following table from Excel, and how it loads in to Power Query:

SNAGHTML157a9ddf

That looks good… plainly it’s a date/time type in Power Query, correct?  But now let’s try an experiment.  Load this to the worksheet:

image

Why, when we have something that plainly renders as a date/time FROM a date format, are we getting the date serial number?  Yes, I’m aware that this is the true value in the original cell, but it’s pretty misleading, I think.

It gets even better

I’m going to modify this query to load to BOTH the worksheet and the Excel data model.  As soon as I do, the format of the Excel table changes:

image

Huh?  So what’s in Power Pivot then?

image

Curious… they match, but Power Pivot is formatted as Text, not a date?

(I’ve missed this in the past and spent HOURS trying to figure out why my time intelligence functions in Power Pivot weren’t working.  They LOOK so much like datetimes it’s hard to notice at first!)

Setting Date Formats in Power Query

When we go back and look at our Power Query, we can discover the source of the issue by looking at the Data Type on the Transform tab:

image

By default the date gets formatted as an “Any”.  What this means to you – as an Excel user – is that you could get anything out the other end.  No… that’s not quite true.  It means that it will be formatted as Text if Power Pivot is involved anywhere, or a Number if it isn’t.  I guess at least it’s consistent… sort of.

Fixing this issue is simple, it’s just annoying that we have to.  In order to take care of it we simply select the column in Power Query, then change the data type to Date.

Unfortunately it’s not good enough to just say that you’ve set it somewhere in the query.  I have seen scenarios where – even though a column was declared as a date – a later step gets it set back to Any.

Recommendations

I’ve been irritated by this enough that I now advise people to make it a habit to set the data types for all of their columns in the very last step of the query.  This ensures that you always know EXACTLY what is coming out after all of your hard work and eliminates any surprises.

The post Date Formats in Power Query appeared first on The Ken Puls (Excelguru) Blog.

Viewing all 178 articles
Browse latest View live