I was teaching a course on Power Query yesterday in which we imported a text file. Almost immediately, some of my users pointed out that their dates weren’t importing correctly, and we had to cover how to fix date errors right away.

And to underscore the importance of this… this morning I woke up to comments on one of my previous blog posts with similar issues. I figured it’s time to cover the easy way to fix date errors.

The Symptom

My data came from a text file, and was shown in the following format:

As you can see, the data is set up in the Month/Day/Year format. The issue for the user is that their system is set to a different format… in the case of the users in my class their default windows settings were set to Day/Month/Year, which is the Canadian date format. When they tried to convert the data from an “any” data type to a date, it messed it up.

The problem comes in to play because many systems export into a MDY format as they were programmed using US date standards. But with our operating system set to a different format, it tries to interpret dates in the standard set there. So when importing a text file, it looks at 1/12/2000 and interprets it as Dec 1, 2000, not Jan 12, 2000.

Then it hits a date like 1/13/2000. Because there is no 13th month, it returns an error.

One of the class attendees brought up an interesting point as I was explaining this… “I thought that dates were just a format on top of a serial number? So how can it get it wrong”. He was absolutely correct. But in this case we are importing data from another source into Excel (via Power Query)… Excel (Power Query) is trying to determine what that date serial number is based on the system settings. That’s where the issue hits us.

How to fix date errors

At first, you might be tempted to flip the date format in your Windows settings, but that won’t actually help you in the long run. In fact, in the worst case it may fix the issue for the current import, and blow apart other solutions that you’ve built. So that’s really not a practical solution. What we need is a way to tell Power Query what the date settings are for THIS data source. Fortunately, there is a way to override the date format. In truth, this doesn’t so much fix date errors, but rather prevents them from occurring in the first place.

What we do is select the column with our dates in it then:

Right click the column
Choose Change Type –> Using Locale

(Yeah, I know… this is hardly a term that Excel users are familiar with, but it allows you to force a different regional setting on the data source.)

You’ll then be prompted with a new dialog where you’ll choose the date, then the Locale you want to use to read it:

The key here is to recognize WHICH locale your data format is emulating. There are hundreds of countries in this listing. My guess is that you’re probably going to pick either your own or English (United States) most of the time. In truth, when working with dates, the country is actually not the important part. The important part is that you pick a country where the MDY or DMY format is consistent with your data source.

Future Proofing

I’ve been fortunate enough to have to deal with this issue very little in my career. I generally leave my system in a US English configuration and most of my imports follow the US date standard, so no issue. But I recognize this as a huge issue for Europeans, as well as any company that conducts business in multiple countries. In both cases this issue comes up over an over again.

There are two great things about using the “Change Type With Locale” feature:

The first is that it avoids relying on an implicit shortcut, explicitly declaring the source data format. This is REALLY handy for future proofing. In Canada, we typically end up with some users in the organization using Canadian standards and some who use US standards (we are a very confused country sometimes.) By specifically declaring the data type, I know that this solution will continue to work even when I send it to someone who uses a different date standard on their PC. Why? Because it’s now defined for the DATA, not the SYSTEM.

The second great thing is that this is a DATA SOURCE SPECIFIC feature. I can set a different format for each data source used in my solution, allowing me to combine several data sets from different countries and still get it consistent.

Drawbacks/Improvements

One thing that I struggled with is this. Back in the old text import tool for Excel, we had a nice feature that looked like this when we were setting data types:

This was fantastic, as I didn’t need to try and figure out which region the data came from, I simply chose the date format that I could see. While it’s great that we can now exhibit some national pride by choosing our country, that doesn’t always help. In the case of Canada, I’d bet that if I asked 5 different people what our official data format is, I’d get 5 different answers.

It would be SO handy if the Power Query team would add some indicator at the end of these options to indicate what the format is. That would be such an easy change to make here, and SOOOOO useful. I honestly don’t think I need to care if my setting is set to English Australia/UK/Ireland/Canada/Belize if it gives me interprets my date in the correct MDY order.

(I actually did email this thought to one of the program managers a few days ago. So hopefully one day we’ll see that change take place.)

The post Fix Date Errors appeared first on The Ken Puls (Excelguru) Blog.

At last, the Power Query update I’ve been waiting for has finally landed in the download site. This is version 2.24, and sets my parameter tables technique back to a working state!

You can download version 2.24 direct from Microsoft’s site by clicking here.

Why is this Power Query Update important?

This Power Query update is pretty important for a few reasons:

It fixes the issues with the MultipleUnclassified/Trusted error on refreshing parameter tables (as I blogged about here)
If you’re running version 2.22 it also fixes issues with loading to the data model

Does this Power Query Update have any NEW features?

Of course it does!

Improvements to ODBC Connector.
Navigator dialog improvements.
Option to enable Fast Data Load vs. Background Data Load.
Support for Salesforce Custom environments in Recent Sources list.
Easier parsing of Date or Time values out of a Text column with Date/Time information.
Unpivot Other Columns entry in column context menu.

The big thing to me, though, is the Power Query update fixes the critical bug(s) listed above. If you’re running 2.22 or 2.23 I highly recommend updating. (And if you’re running an older version I’d update too, as there is new functionality released every month.)

The post Important Power Query Update Available appeared first on The Ken Puls (Excelguru) Blog.

Last week, in a blog comment, a reader asked how to filter their data to only show the most recent rolling 12 month period. This post looks at how I made that work in Power Query.

Background

Assume we have a table set up as follows:

As you can see, we have sales categories down the left, months across the top, and values in the middle. A classic setup when users are tracking information. And now we need to pull the most recent 12 months only from that table.

You can download the completed sample file if you’d like to follow this along as well.

Filter the most recent rolling 12 months from a table in Power Query

Step 1: Grab the data

First thing we need to do is grab the data. To do this, I clicked in the table and:

Power Query –> From Table –> Confirm the range (if required)
Changed the query name to Rolling12

Helpfully, Power Query identified the data types in all the columns for me, so I’m pretty much ready to go.

Step 2: Show the most recent record first

To do this, we really need to get the data into an unpivoted list. Easy enough to do:

Select the “Sales” column
Right click –> UnPivot Other Columns

Note: The “unpivot other columns” command was added to the right click menu in version 2.24. While you can most likely still access this command, you are definitely running an older version, and for a multitude of important reasons, you should really update to the latest version of Power Query.

If you don’t want to (or can’t) do this for some reason, then go to Transform –> Unpivot –> Unpivot Other columns to accomplish the same thing.

We now get a nice unpivoted list:

Next I cleaned up that Attribute column:

Right click the Attribute column –> Rename –> Date
Right click the Date column –> Changed Type –> Date

And finally I sorted the records to show the most recent ones at the top:

Click the Filter icon on the Date column –> Sort Descending

Leaving us with this:

Step 3: Create variables to hold the required data range

Next we need to work out the dates that we want to use for the top and bottom range of dates in the query. This is a bit tricky, but uber powerful once you realize you can do it.

To start, click the fx icon in the formula bar:

NOTE: If you don’t see the formula bar, go to the View tab, and check the box next to Formula Bar

What this does is add a new step in the formula bar called “Custom1”. And if you check the formula bar you’ll see that it just refers to the previous step:

The cool thing here is that we can modify this. Why don’t we add some data to the end of that statement to pull the first value from the Date column of that table? To do that, change the text in the formula bar to read as follows:

=#”Sorted Rows”[Date]{0}

Recognize here that:

#”Sorted Rows” refers to the table in the previous step
[Date] tells Power Query that you only want the Date column
{0} tell Power Query that you want the first value from that column (remember that Power Query starts counting from 0)

The result is a single cell with the most recent date:

Let’s keep things clean in our Steps window… right click the Custom1 step and rename it to “MaxDate”.

What’s cool here is that we’ve essentially created a variable to work out and hold the most recent date.

So now that we have the top of the range, why don’t we create another step to modify it to the date for the lower end of the range?

Click the fx step in the formula bar
Modify the formula to read as follows

=Date.AddYears(MaxDate,-1)

And the result is a new Custom1 step that shows in the formula bar as follows:

Note: The formulas you can use are documented at Microsoft’s site here: http://office.microsoft.com/en-us/excel-help/power-query-formula-categories-HA104122363.aspx

Pretty cool, don’t you think? We’ve now got a step that holds the lower end of the results too. Let’s do our cleanup again.

Right click the Custom1 step and rename it to “AfterDate”

Step 4: Implement the variables into a filter

Our last step is to implement the variables into the filter, cutting the data down to the most recent rolling 12 months of data. Before we can do that, however, we really need to get back to the table we had in the “Sorted Rows” step.

The challenge is that we can’t select and work from that step, as it’s earlier in the process than the creation of our variables. So how do we get back to that step AFTER we’ve created our variables?

Click the fx button in the formula bar

Once again, we get a new step in the formula bar:

The problem is that it’s referring to the previous step. So what if we changed it to point to the #”Sorted Rows” step?

Now how cool is that? Not only can we refer to the previous step of our query, we can change that to point to ANY previous step, or type in our own formulas against any previous step!

NOTE: If your steps have spaces in the, don’t forget to wrap the step name in quotes and then preface it with the # symbol. If your steps don’t have spaces, then you don’t need to do this.

Okay, so now let’s filter our data. We’ll start by doing it manually:

Click the filter icon on the Date column
Select Date Filters –> Custom Filter
Set up your filter as follows:

Notice that we have to pick the values from the list here. (Wouldn’t it be cool if we could type in our variable names here? That would be awesome!) Regardless, we can set up the filter as we’d expect to use it. This will filter our list, and leave us with the following formula in the formula bar:

NOTE: You can expand the formula bar to show as I have by clicking the little down arrow icon.

Good stuff… now we need to do a little surgery. Let’s replace the manual dates with our variables:

= Table.SelectRows(Custom1, each [Date] <= MaxDate and [Date] > AfterDate)

If you check the table now, you’ll see that it is filtered down to only contain records between Mar 1, 2014 and Feb 28, 2015. And better yet, because the variables are created dynamically when the query is run, it will ALWAYS return the most recent rolling 12 months!

Step 5: Pivoting the data back into the original format.

Now we need to put the data back into the format the user wanted. To do this, we need to pivot it back.

The trick to pivoting in Power Query is to select the column you want to use as the new column headers. This time it is the Date column. So…

Select the Date column –> Transform –> Pivot Column
Change the “Values” column to the one that holds your values (in this case it’s actually called Value)

And the result:

Bingo! The most recent rolling 12 months of data from our table.

At the point you can click File –> Close & Load, and load it to a table.

Proof Positive

Go and add a new column of data. You can insert it into the existing table, put it on the end, it really doesn’t matter since Power Query will sort it anyway. Once you’re done, right click the new table and refresh it, and you’ll find it works nicely.

One minor point of note… in the version I did we’ve actually reversed the column order (the most recent date has moved to the left from the right.) If we wanted to change that it’s fairly easy too. Just before we pivot the data back into the pivoted form, just sort it in descending order.

The post Rolling 12 Months in Power Query appeared first on The Ken Puls (Excelguru) Blog.

Last week I posted a technique to show how to calculate a rolling 12 months in Power Query. One of the techniques used was to refer to other steps during the construction of that query. Shortly after publishing that, a user asked a question on a non-related post that can make use of the same technique. Because of this I thought I should focus on that specific technique this week, and where it can add more value.

The Question

I have a data sheet where the generated date shows up in a single cell up on the top and then the data table itself follows.

I created a query to just pick up the generated data but now I want to use that date within a formula of a new column in the 2nd query (the one that pulls/transforms the data table itself). How can I do that?

Now, the asker is working between two queries. I’m actually not going to do that at all, rather focussing on getting things working in a single query.

The Mockup

I haven’t seen the asker’s original data, but I mocked up a sample which I believe should be somewhat representative of what was described:

As you can see, we’ve got a single cell with the data in A3, and a table below it. While I’ve done this in Excel, this could easily be pulled in from a text file, web page, or some other medium. The key that I want to focus on here is how to get that date lined up with the rest of the rows in the table.

Options, Options, and more Options

There’s actually a ton of ways to do this. Just some include:

Naming the date range, using the fnGetParameter function to pull it in, and pass it into the query that way.
Pull the data into Power Query, duplicate the first column, format it as a date, replace errors with null, fill down, and cull out the rest of the garbage rows
Add a custom column that refers directly the the 3rd field of the first column
And many more

But in order to pull this of today, I’m going to refer to other steps in the Applied Steps section of the query. This is a method you can use to determine a variable through the user interface without resorting directly to M code.

Building the Output

Loading the data

To pull the data in, I’ll set up a named range, as this doesn’t exactly look like a table. To do that I:

Selected A1:C8
Replaced the A1 in the Name box (just to the left of the formula bar) with the name “Data”

Which landed me the following in Power Query:

Filter down to just the date cell

This part is relatively easy, we just need to:

Right click Column1 –> Remove other columns
Right click Column1 –> Change Type –> Date
Go to Home –> Remove Errors
Filter Column1 –> Uncheck the null values

And we’re now down to just our date:

You’ll also notice, on the right side, the Applied Steps window shows this step as “Filtered Rows”. I’m going to right click that and rename it to “ReportDate” (with no space).

Refer to Prior Steps

With this in place, we can now essentially revert to our original query. To do that, we:

Go to the Formula Bar and click the fx logo to get a new query line:

Notice that it refers to the previous step. No big deal, change that back to “=Source” (the original step of our query. When you do, your “Custom1” step will look like this:

Perfect. Let’s add a custom column.

Go to Add Column –> Add Custom Column
Set up the custom column as follows:
- Name: Date
- Formula: =ReportDate

Your “ReportDate” step gets added as a table:

Click the expand arrow to the top right of the date column header and expand it (without keeping the column prefix)

And now it’s just basic cleanup to get things in the right place:

Go to Home –> Remove Rows –> Remove Top Rows –> 4
Go to Transform –> Use First Row as Headers
Right click Column4 –> Rename –> Date

And you’re done:

So… could you build one query to get the date, then try to pass it to a query with your data table in it? Sure, but why? Much better to do it all in one place.

It’s Faster with M

Before Bill S jumps in and shows us that it’s faster when we manipulate the M code directly, I figure I’ll cover that too. Bill is absolutely correct when he comments on my posts showing why we should learn M. Here’s the (summarized) route to do the same thing using M code:

Load the initial table into Power Query
Go to Home –> Remove Rows –> Remove Top Rows –> 4
Go to Transform –> Use First Row as Headers
Add a custom column
- Name: Date
- Formula: =Date.From(Source[Column1]{2})

You’re done.

Why? The trick is all in the formula. Let’s build it up gradually.

We start by referring to the Source step.

=Source

This would return a table to the column (as you saw earlier). We then modify the formula and append [Column1] to this so that we have:

=Source[Column1]

This returns the list of all of the values in Column1 from the Source step. (Never mind that we moved past that step – it will still refer to it as if it was still around.) Next we append the index of the data point we want. Remembering that Power Query is base 0, that means that we need #2 to get to the 3rd data point:

=Source[Column1]{2}

Now, if you went with this as your formula you’d find that it actually returns a DateTime value. So the last step is to wrap it in a formula to extract just the date:

=Date.From(Source[Column1]{2})

Final Thoughts

So now you’ve seen two ways to pull this off… one via the user interface, one more efficient by writing a small bit of M code. Different options for different comfort levels.

What I love about Power Query is that you don’t NEED to master M to use it effectively. But if you DO master M, then it sure can make your queries more efficient and elegant.

Also, I should mention this… if the user really DID want to merge these two queries together, it is as easy as adding a new step (by clicking that fx button), then putting in the name of the other query. That will bring the data over, and then it’s simply a matter of following the rest of this post to stitch them together.

The post Refer to other steps in Power Query appeared first on The Ken Puls (Excelguru) Blog.

I was working through a scenario today and came up against something unexpected when multiplying NULL values in Power Query.

Background

I have a fairly simple table of transactions that looks like this:

And wanted to turn it into this:

Seems simple enough, but I ran into an odd problem.

Steps

Getting started was pretty easy:

Pull the data into Power Query using From Table
Remove the final column
Select the Bank Fee column –> Transform –> Standard –> Multiply –> –1

So far everything is good:

Then I tried to do the same to the Discount column.

Multiplying NULL values

At this point, something odd happened. I did the same thing:

Select the Discount column –> Transform –> Standard –> Multiply –> –1

But instead of getting a NULL or 0 for John’s record, it gave me –1. Huh?

This is honestly the last result I expected. How can a NULL (or empty) cell be equivalent to 1? I think I’d rather have an error than this.

Regardless, easy enough to fix, I just inserted a step before the multiplication step to replace null with 0:

Good stuff, so now just finish it off:

Right click the Customer column –> UnPivot Other columns

And all looks good…

… until I load it into the Excel table:

Seriously? Negative zero?

To be honest, if I’m feeding a PivotTable with this anyway, I really don’t need the discount record for John. To fix this I just went back to the Power Query and inserted another step right before the Unpivot portion when replaced 0 with null. The result is now really what I was originally after:

End Thoughts

I can’t help but think that this behaviour has changed, as I actually tripped upon it refreshing a previous solution. The killer is that the data has changed, but I’m pretty sure the old data set had these issues in it too, and it wasn’t a problem.

Regardless, I”m a little curious as to your opinions. Is this expected behaviour? Bug or feature?

The post Multiplying NULL values in Power Query appeared first on The Ken Puls (Excelguru) Blog.

A couple of weeks ago, Rudi asked how you would go about setting up a query to remove all rows up to a specific value. Specifically, his question was this:

The other day I was asked if Power Query could delete all top rows up to a found value. I could not find a solution and its been a burning question till now.
For example: If I import a csv file containing columnar info, but the headings for the list are in different rows for each import. I know that the first heading in column A is called "ID Number", but each import has this heading in a different row.
How do I determine an applied step to delete all rows above "ID Number". I cannot use the delete top rows as its not always 5 rows, some import the headings start in row 10, others in row 3...but the label I am looking for is always "ID Number".

While the question was answered in the initial post, I still though it would be interesting to do a full post on this for others who might need to create similar functionality.

The Data Set

I didn’t have Rudi’s exact data, so I knocked up a little sample with an ID Number and Amount column starting in row 3, which you can download here.

The components I was after here was the ID Number header and an extra column of some kind. In addition, I wanted to have some garbage data above, as I didn’t want to give the impression we can just filter out blank rows.

Now, I assume this data is loaded from an external file, but it doesn’t really matter, I’ll just load this from a range, as it’s just a data source. The key is that the header row is not the first row. So I defined a new range to cover the data

Select A1:B7 –> Name Box –> Data

I can then select the name from the Name box to select my data:

I then loaded it into Power Query by creating a new query –> From Table

Determine the Header Row

This is the first job. In order to remove any rows above the header row, we need to know which row the header resides. To do that, we need to add an index column, and filter to the specific header that we’re after.

Add Column –> Index Column –> From 0
Filter Column1 –> only select ID Number

This results in a single row of data, and conveniently has the Index number of the ID Number row listed. In this case it’s 2.

Call up the Original Table

We’ll park that value for a moment, and call up the original table. To do that, we click on the fx button beside the formula bar to create a new step, then replace the formula with =Source.

Remove All Rows Up To A Specific Value

Now comes the tricky part.

We cant’ remove all rows up to a specific value immediately. We need to insert a step that removes the top 2 rows first, then modify it. So let’s do that:

Home –> Remove Rows –> Remove Top Rows –> 2

This gives us the following table and formula:

The key is to understand this formula first.

The Table.Skip() function removes (actually skips importing) the first x rows
Custom1 is the name of our previous step
2 is the number of rows

So what we really need to get is the number of rows. We can extract that from the Filtered Rows step like this:

#”Filtered Rows”[Index]{0}

Where:

#”Filtered Rows” is the name of the step in the Applied Steps window
[Index] is the column we want to look at in that step
{0} indicates the first row of that step (since Power Query starts counting at 0

So let’s modify the function to:

= Table.Skip(Custom1,#"Filtered Rows"[Index]{0})

As you can see, it works nicely:

In fact, we can go even better than this. Why are we referring to Custom1 at all? Doesn’t that just refer to Source anyway? Let’s also replace Custom1 with Source, and remove the Custom1 step from our query:

Cleanup

We can now promote our header rows, remove the unnecessary Changed Type step and set our data types correctly:

Testing

If you try inserting new rows at the top of the data range, then refreshing the completed query… it just works! The output table will never end up with extra rows at the top, as we’re filtering for to start at the ID Number row.

Using M

If you learn to write your own M code from scratch, you can start combining lines as well, skipping the next to explicitly filter the primary table. Bill S provided code in answer to Rudi’s question (adapted to this workbook), which is shown below:

let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1),
Skip = Table.Skip(Source, Table.SelectRows(AddIndex, each ([Column1] = "ID Number")){0}[Index]),
PromoteHeaders = Table.PromoteHeaders(Skip)
in
PromoteHeaders

It essentially replicates the same steps as the query I built via the user interface, only he combined the Filtered Rows step into the Row Removal step. To bonus here is that the code is a bit shorter. The drawback is that it might not be quite so “de-buggable” to someone who isn’t as experienced with M.

The post Remove All Rows Up To A Specific Value appeared first on The Ken Puls (Excelguru) Blog.

Some time back I posted a routine to refresh Power Queries with VBA, allowing you to refresh all Power Queries in a workbook.

Things Changing…

The challenge with that specific macro is that it refers to the connection name, which may not always start with Power Query. (As we move closer to Excel 2016, we are now aware that these queries will NOT have names starting with Power Query.)

I have seen an approach where someone modified the code I provided earlier to include a variable, allowing you to easily change the prefixed text from Power Query to something else. While that works, it’s still not ideal, as names can be volatile. It makes more sense to try and find a method that is cannot be broken by a simple name change.

Refresh Power Queries With VBA

The macro below is a more targeted approach that checks the source of the query. If it is a connection that is built against Power Query engine (Microsoft.Mashup), then it will refresh the connection, otherwise it will ignore it.

Public Sub UpdatePowerQueriesOnly()
Dim lTest As Long, cn As WorkbookConnection
On Error Resume Next
For Each cn In ThisWorkbook.Connections
lTest = InStr(1, cn.OLEDBConnection.Connection, "Provider=Microsoft.Mashup.OleDb.1")
If Err.Number <> 0 Then
Err.Clear
Exit For
End If
If lTest > 0 Then cn.Refresh
Next cn
End Sub

This macro works with Excel 2010, 2013 and 2016, and should survive longer than the original version I posted.

The post Update: Refresh Power Queries With VBA appeared first on The Ken Puls (Excelguru) Blog.

This will be a short post, as today we are leading our second sold out Power Query workshop at http://powerquery.training/course. I wanted to make sure I still got something out for my readers today though. This time I’m looking at a feature that was added in the August Power Query Update: Extract Text.

You can find these commands on both the Transform and the Add Column tabs, with the former just converting your selected column, and the latter creating a new column of results while preserving the original column.

Here’s what’s interesting to me about these functions:

They replicate the LEN(), LEFT() and RIGHT() functions, saving you having to build them manually as I discuss here. This is handy, and pretty seamless.

Here, you’ve got a table of words, and the second column is generated using Excel’s LEFT() function. The final column was generated by:

Selecting the Word column –> Add Column –> Extract –> First Characters –> 4

Nice and consistent with Excel’s LEFT() function.

Likewise, Last Characters replicates the RIGHT() function by going to:

Selecting the Word column –> Add Column –> Extract –> Last Characters –> 4

And Length replicates the LEN() function:

Selecting the Word column –> Add Column –> Extract –> Length

The Range function is a user interface implementation of what should be equivalent to the MID() function. In this case, however, it’s still has the following issues for Excel pros:

It is Base 0, meaning that you want to start at the 3rd character of the text string, you need to specify that you want to start at character 2 (Power Query starts counting at 0, not 1)
If you provide a value for the “number of characters to return” that is larger than the total number of characters – the starting character, you’ll get an error. (Unlike the MID function)

So when you try to use Range in place of MID as follows:

Selecting the Word column –> Add Column –> Extract –> Range
Starting Number: 5
Number of Characters: 4

You get this:

Ugh. And correcting to subtract one from the starting index, you get this:

Better, but still errors.

Honestly, I was hoping the user interface implementation would solve those issues building the more complicated code shown in my blog post on the subject.

So, at the end of the day, it’s awesome, but still doesn’t offer full “Excel parity”. And if you want that, you’ll need to learn to work with formulas in Power Query.

The good news? We teach how to do that in our Power Query workshop. In addition, we’ve just announced a new registration intake. If you’re interested in learning how to master Power Query, check it out at http://powerquery.training/course.

The post Power Query’s Extract Text Feature appeared first on The Ken Puls (Excelguru) Blog.

I’m pleased to say that our new book “M is for Data Monkey” is now complete, through the technical edit, and into the copy edit stage. While there is still more work to go, the book is on schedule for release in November (with Amazon shipping in December, I believe.)

At this point we’re just finalizing the “Thank You” section, and wanted to invite everyone who has supported and pre-ordered the book the chance to get your name in M is for Data Monkey:

How to get your name in M is for Data Monkey

If you’d like to get your name inside, we need you to submit the following information via email to info@powerquery.training

A copy of your receipt for the pre-order of the book - if you haven’t pre-ordered it then you can do so here
The name that you want to be displayed in the book – for example, it can be Miguel ‘Mike’ Escobar, Ken ‘Power Query’ Puls, your company name, or whatever you want to see there.

We only have space for a limited amount of names (around 80), so please submit your information as soon as possible!

The deadline to submit this information is September 16th, 2015 at 23h59m US Eastern time, so don’t miss out!

The post Get Your Name in M is for Data Monkey appeared first on The Ken Puls (Excelguru) Blog.

When working with Power Query, it’s actually pretty simple to restrict records in a table using an AND criteria. As filters on different columns are applied in a cumulative fashion, we just keep applying more to filter down the data. But what about performing OR logic tests?

The issue

Assume we’ve got a list that looks like this:

And we want to preserve any record where the inventory item is Talkative Parrot OR the item was sold by Fred. So how would you do that?

We’d be tempted to start by filtering the Inventory Item column to just the Talkative Parrot item. But that would remove our very first row, which shows an item sold by Fred. And if we started by filtering the Sold By column to Fred, we’d lose the Talkative Parrot in row 2.

Performing OR logic tests – via merging columns

We could merge the Inventory Item column together with the Sold By column, then filter for any item that contains Talkative Parrot or Fred. While this might work in this situation, it could also lead to cross contamination if there were names like Manfred or Wilfred, or if the name we were searching for appeared in one of our Inventory items (like “Fred the dog”.)

Performing OR logic tests – via list functions

Using list functions is a far better way to make this kind of comparison. In fact, there is a function that is specifically built for this purpose, which works exactly like Excel’s OR() function; List.AnyTrue().

To use this, we’ll set up a new custom column with the following formula:

=List.AnyTrue(
{[Inventory Item]="Talkative Parrot",

[Sold By]="Fred"}
)

The formula breaks down as follows:

The function name is List.AnyTrue()
The function requires a list, which means that you need the {} inside the parenthesis.
Each list item (comparison), is nested inside the curly braces, and is separated by commas

Knowing this, we then just need to include the comparisons we want to make, with the field name surrounded in square brackets, and the comparison surrounded in quotes (as we are making a text comparison here.)

And the result, as with Excel’s OR() function, is a list of TRUE/FALSE values:

Of course, we can do better than this, by wrapping the original function in an if statement:

=if List.AnyTrue(
{[Inventory Item]="Talkative Parrot",[Sold By]="Fred"}
)
then "Meets Criteria!"
else "No Match"

Which returns useful messages instead of just TRUE/FALSE values:

Naturally, you could filter these records or keep them, as your need dictates.

Performing AND logic tests

If you’re simply looking to filter out records based on an AND criteria, then applying successive filters will work. But what if you only want to tag the records, so that you can use this field in a slicer? In that case you need to preserve both the matching and non-matching items.

Fortunately, Power Query has a list function for this as well: =List.AllTrue().

This function is created in the same manner as List.AnyTrue(), where the following would only show a “Meets Criteria!” message where the Inventory Item is Talkative Parrot AND the Sold By field contains “Fred”:

=if List.AllTrue(
{[Inventory Item]="Talkative Parrot",[Sold By]="Fred"}
)
then "Meets Criteria!"
else "No Match"

A Quick Note

This topic is contained in the forthcoming M is for Data Monkey book. And it’s not too late to get YOUR name inside!

The post Performing OR logic tests appeared first on The Ken Puls (Excelguru) Blog.

There was a really cool new feature added in the latest Power Query update: The ability to split Power Queries. This is something that has always been possible by editing the M code manually, but this makes it super simple.

Where this is useful

Where can this be super helpful is when you’ve built a nice query to reshape your data. It runs fine for a bit, then you realize that you need to create a larger data model out of the data. For example, assume we have this data:

And we run it through these steps:

To come up with this:

All of this is fairly easy as Power Query goes, but now the model needs to grow. In order to expand it, we also want to create a table of unique Inventory Items and a table of unique Sales people. Basically we want to de-aggregate the data that should have come in from separate tables in the first place.

Methods to Split Power Queries

Like always, there are a variety of ways to do this. You could create new queries to load the data from the original table, then cut it down to the columns needed in each case. But that causes and extra load requirement.

You could manually cut the code up to the step required, create a new blank query, then reference the new query from the previous. But that takes some knowhow and tinkering that many people won’t be comfortable with.

Starting in Power Query version 2.26 (released today), we have a MUCH easier way. Let’s assume that we want to split this query right after the Changed Type step, so that we can create an Items table and a Saleperson table in addition to the Transactions query that we already have.

How to Split Power Queries – the easy way

To start, we need to understand the steps and what they give us. We can step through each step of the query, and find the step that gives us the jumping off point we need. In my case, this is the Changed Type step. We then right click the step AFTER Change Type, and choose Extract Previous:

You’ll be prompted to enter a name (I’ll use “Base Load”), and upon clicking OK, you’ll see a couple of things happen:

A Base Load query is created
The Queries Navigator opens on the left, showing you’ve now got multiple queries
The Transactions query (the one I’ve been working on) gets shorter
The Transactions query’s Source Step gets updated to #”Base Load”

You can see these changes here:

So the Transactions query still contains all the subsequent steps, but the Source step changed, and the Changed Type step is now in the Base Load query:

The biggest piece of this whole thing, however, is that the Base Load query is still pointing to the raw source table, but the Transactions query now loads from Base Load, NOT the original data source. So it’s following the staging/loading approach I talk about in this blog post.

Now, how can we use this…?

Making Use of the Split Power Queries

So far, very little is different to the overall goal, except that we load in two stages. Let’s change that by creating a new query that references the Base Load query:

Right click the Base Load query in the Query Navigator (at left)
Choose Reference
Change the query name to Salespeople
Right click the Sold By column –> Remove Other Columns
Select the Sold By column –> Home –> Remove Duplicates
Right click the Sold By column –> Rename –> SalesPerson

And we’ve now got a nice query that shows our unique list of sales people:

Now let’s build the SalesItems table:

Right click the Base Load query in the Query Navigator
Choose Reference
Change the query name to SalesItems
Right click the Inventory Item column –> Remove Other Columns
Select the Inventory Item column –> Home –> Remove Duplicates
Right click the Inventory Item column –> Rename –> SalesItem

And this table is done now as well:

Loading the Split Power Queries to the Data Model

The final step is to load these to the Data Model. We’ve actually created three new queries in this session, but we don’t get the liberty of choosing a new destination for one of them. Instead, we get to choose a single loading style that will be applied to ALL of them. (If in doubt, I’d suggest that you load queries as Connection Only first, then change them after if you need to pick different destinations. This will save you waiting while Power Query litters your workbook with extra worksheets and loads the data to them.)

For our purposes here, I’ll load them directly to the Data Model:

Home –> Close & Load To…
- Select Only Create Connection
- Select Add to the Data Model
- Click Load

The only side effect here is that that the Base Load query was also loaded to the data model, and I don’t need that. So I’ll now quickly change that.

Go to the Workbook Queries pane –> right click Base Load –> Load To…
Uncheck “Add this data to the Data Model” –> Load

And I’ve now got my tables where I need them so that I can relate them and build my solution.

Final Thoughts

This is a fantastic feature, and I was actually going to blog on how to do this the old way, until they surprised me with this update. I’m a huge fan of the staging/loading approach, and this will certainly make it easier to retrofit a query after it’s already been built.

The post Split Power Queries appeared first on The Ken Puls (Excelguru) Blog.

I ran into an interesting wrinkle in a model I’m building, where I need to allocate units based on dates. The idea here is to allow a user to the number of units to allocate, the start date and the end date. From there, I wanted to use Power Query to work out how many months have elapsed, and then tell me how many units should be allocated to each year in the period.

Background:

Here’s a look at my data (which you can download here):

So the idea here is that I need to come up with a table that shows that data should be allocated as follows:

So, if we look at the Traditional Single Family, the sales cycle is the 6 months from Aug 2015 through Jan 2016. With the first 5 months being in 2015 and the final month being in 2016, that means we need to allocate 5/6 of the total units to 2015 and 1/6 to 2016.

Allocate Units Based on Date: Method

My initial thought was to try and find a date difference or duration type function to return a count of months between two dates. Unfortunately, such a function doesn’t seem to exist. For that reason, I decided I’d just go ahead and build my own function to do the job.

Step 1: Create a function to return a list of months

To start with, I needed a list of month end dates. I started a blank query, jumped into the Advanced Editor and built a simple query to provide a hard coded startdate and enddate, then create a list from one to the other:

let
startdate=#date(2015,8,1),
enddate=#date(2016,1,31),
Source = {Number.From(startdate)..Number.From(enddate)}
in
Source

That list yielded me a list of date serial numbers, so I then:

Went to Transform –> To Table
Changed the column’s data type to Date
Renamed the column to Date
Converted the column to month end dates (Transform –> Date –> Month End)
Removed Duplicates (Home –> Remove Duplicates)

The end result is a short table that shows only the month end dates:

Step 2: Add a Year End date column

I then needed to find a way to count the number of months in each year. To do that I:

Added a year end column (Select the Date column –> Add Column –> Date –> Year –> End of Year)
Went to Transform –> Group By and set up the grouping as follows:
- Group by: EndOfYear
- New column name: Months_x_Year
- Opeartion: CountRows

Step 3: Modify to list Months in Period

At this point I realized that I had a pretty serious miss in my logic. If I wanted to apply this as a proportion, I needed to also track the total amount of months in the period (so that I could allocate 5/6 to 2015 and 1/6 to 2016.)

To fix this, I added another level of grouping, but with a twist…

I removed the “Group By” column
I created an “Original” column, and set the operation to All Rows
I created a “Months_Total” column, set to SUM the Months_x_Year column

Here’s the configuration:

And the result:

This is pretty slick, as the grouping returned the total count of months, but also returned the original table. Of course, when you expand the table using the double headed arrow to the top right of the Original column, it runs the Months_Total row down each row that gets added:

Step 4: Turn the routine into a function

The next step was to go back into the Advanced Editor, and turn this into a function. That’s actually not hard at all, requiring only three lines to be modified. The first 4 lines of the function are shown here:

(startdate as date, enddate as date) as table =>
let
//startdate=#date(2015,8,1),
//enddate=#date(2016,1,31),

As you can see, I basically added the parameter line at the beginning (using the same variable names for startdate and enddate), then commented out the lines I initially used in order to populate the data I used to build my test case.

Finally, I renamed the function to fnGetAllocationBase, and saved it.

Step 5: Using the function

To use the function, we basically now just load the original table, then feed the start/end dates in to it. Here’s how I went through that process:

Select the table –> Power Query –> From Table
Select the First Month and Last Month columns –> Change Type –> Date
Add Column –> Add Custom Column
- Formula: =fnGetAllocationBase([First Month],[Last Month])

I now had a new column containing the tables I needed with my allocation basis:

As I didn’t need month granularity for my model, (we’re budgeting on an annual basis,) I’m now able to:

Remove the First Month and Last Month columns
Expand the columns from the Custom column
Add a new custom column with the following details:
- Name: Units
- Formula: =[Units To Allocate]*[Months_x_Year]/[Months_Total]
Remove the Units to Allocate, Months_x_Year and Months_Total columns
Set my data types

And the end result is a nice table that will serve my sales model nicely:

The post Allocate Units Based on Dates Using Power Query appeared first on The Ken Puls (Excelguru) Blog.

The other day as I was working through a model, I once again tripped upon the fact that Power Query’s Text.Trim function doesn’t clean whitespace inside the text string, only at the ends. For those who are used to Excel’s TRIM function, this is a frustrating inconsistency.

Just to circle on it, here’s the difference:

Source	Function	Result
Excel	=TRIM(“ trim me “)	“trim me”
Power Query	=Text.Trim(“ trim me “)	“trim me“

Typically, I’ve just gone through the cycle of replacing a double space with a single space a few times on the same column to deal with this issue. The issue, of course, is that you need to do this twice if there are 4 spaces, but add more spaces, and you have to do this more times. Doesn’t seem like a really robust solution.

At any rate, this time I emailed one of my friends on the Power Query team and suggested that they should implement a function to make this a bit easier.

My Suggestion for a Clean Whitespace Function

The gist of my suggestion was to create a new function that would not only trim the whitespace internally, but would also allow you to specify which character you want to clear out. This way it would work nicely to clean whitespace in the shape of spaces (the usual culprit in my world), but would also allow you to substitute in other characters if needed. (Maybe you need to replace all instances of repeating 0’s with a single 0.)

It got referred to another friend on the team, (who wishes to remain nameless,) and he responded with some pretty cool code. I’ve taken that code, broken it down and modified it a bit, and the end result is a slightly different version that can work the same as Excel’s TRIM() function, but adds an optional parameter to make it even more robust. For lack of a better name, I’m going to call it “PowerTrim”. (Just trying to do my part to keep the Power in Power Query!)

Here’s the function:

(text as text, optional char_to_trim as text) =>
let
char = if char_to_trim = null then " " else char_to_trim,
split = Text.Split(text, char),
removeblanks = List.Select(split, each _ <> ""),
result=Text.Combine(removeblanks, char)
in
result

And to implement it, you’d take the following steps:

Copy the code above
Create a new query –> From Other Sources –> Blank Query
Change the query name to PowerTrim
Go into the Advanced Editor
Select all the text and replace it with the code above –> Done

Like this:

How it Works

We’d call this from a custom column, feeding in a column of text, and specifying the character (or even string of characters) we’d like to trim. The function then works through the following process:

It checks to see if the char_to_trim was provided, and uses a space if not
It splits the text by that character, resulting in a list:

(This list shows the word “bookkeeper” split by “e”)

It then:

Filters out any blank rows
Combines the remaining items using the original character to split by

(The original version was actually all rolled up in one line, but I find it easier to debug, step through, examine and play with when it’s separated.)

Demo

Here’s some examples of the function in action. I started with a raw table from Excel. (Create a new query –> From Table)

And added a Custom column by going to Add Column –> Add Custom Column

Name: Trim_null
Formula: =PowerTrim([Text])

Notice that in the first row it trimmed the leading, trailing and internal spaces. Just like Excel! (Remember that if you used Power Query’s default Text.Trim() function, you would return “trim me”, not “trim me”.)

Now, let’s add another and try with an alternate character… like 0. Again, we go to Add Column –> Add Custom Column:

Name: Trim_0
Formula: =PowerTrim([Text],”0”)

In this case the extraneous zeroes are trimmed out of row 3, leaving only a single one. Cool stuff. Now what about the “e”. Let’s see how that one goes.

Once more to Add Column –> Add Custom Column:

Name: Trim_0
Formula: =PowerTrim([Text],”e”)

The first time I looked at this, I thought there was an issue with the function. But then I remembered in this case we are removing all leading and trailing e’s, as well as replacing any duplicate e’s with a single e. You can see that this is indeed what happened in both rows 2 and 4.

Final Thoughts

I wish there was a way to get this to easily role into the Text functions category, so that I could call it something like Text.PowerTrim() or even replace the Text.Trim() function with my own. Unfortunately a query name can’t contain the period character, which kind of sucks. I guess it’s to to protect you from accidentally overwriting a function, but I’d like the ability to do it intentionally.

The post Clean WhiteSpace in PowerQuery appeared first on The Ken Puls (Excelguru) Blog.

Some more savvy Excel users know that you can break text onto multiple lines in a cell by pressing Alt+Enter mid entry. Today’s post explores how we can split by line breaks in order to break these types of cell contents into multiple columns.

Set up the data

To start with, let’s set up some simple data:

In cell A2, type “Text” and press Enter
In cell A3 type “This” –> Alt + Enter –> “is” –> Alt + Enter –> “text” –> Enter

The result should look like this:

And now we’ll go and pull it in to Power Query:

Select the data –> create new query –> From Table

Split by Line Breaks

At this point, you’d certainly be forgiven for thinking that only the first line was pulled in. But if you select the cell, you’ll see in the preview window that all the data is there:

So let’s try and split it up.

Right click the Text column –> Split Column –> By Delimiter

Unfortunately, there is no line break or carriage return option in the dialog, which means that you’ll need to pick “Custom”, and enter the special character for a Line Feed:

Even worse, with entering this, Power Query is overly aggressive when you click OK. It assumes that this is special text, so escapes it to text, and appends some commands that actually mess you up:

Notice how we have two columns with nothing in the second. What gives there?

To correct this code, we need to modify the formula in the formula bar to do two things:

Undo the escaping that Power Query did on our #(lf) entry, and
Remove the code that is telling which columns to import

So first, we need to replace:

"#(#)(lf)"

with

"#(lf)"

And second, we need to remove this completely:

,{"Text.1", "Text.2"}

And the results are much better:

The Good/Bad News

The bad news is that currently it’s a bit painful to do this. The good news is that it can be done, and the better news is that Power Query is constantly being updated. I’m sure it won’t be long before they give us an easier to use/more discoverable mechanism to make this work.

Other Special Characters

Should you need them, here are three special characters that you can refer to in Power Query:

Line feed: #(lf)
Carriage return: #(cr)
Tab: #(tab)

The post Split by line breaks in Power Query appeared first on The Ken Puls (Excelguru) Blog.

I’m pleased to let people know that breaking Power Query via Power Pivot is a thing of the past … at least for users of Excel 2013 or higher. (Sorry, if you’re on 2010, you still need to be careful.)

The information has been around for a bit, and it’s one of the topics we cover in our http://powerquery.training/course as well: how to break your Power Query by doing one of the following actions in Power Pivot:

Renaming a table
Renaming a column sourced from Power Query
Deleting a column sourced from Power Query

Any of these three actions would set your query into an un-editable state, but worse, nothing would appear to happen. The query would refresh as normal, until you eventually tried to change it. At that point all hell would break loose and your only option was to rebuild your query (and related data model table) from scratch.

This has been covered in detail in the following sources:

Matt Allington's Blog
M is for (Data) Monkey e-Book (and yes, you can download a copy NOW!)

But now, breaking Power Query via Power Pivot is a thing of the past…

This issue was fixed in Excel 2016, but it left many of us hanging with an older version that still exhibited the problems. If you’re on 2013, however, that problem has now been fixed. I share the links at the bottom of the post to make sure you’re updated, but first I’ll demonstrate that the fix is really working.

To set the stage, I created a simple Calendar table in Power Pivot, and loaded it to the Data Model.

Corruption Method #1: Deleting Columns

My first test was to attempt to delete the Year column in Power Pivot. At first it looks like nothing has really changed:

But when I click Yes, Power Pivot comes back with a message to let me know that I can’t do it after all:

Hooray! This is fantastic news, as it means that I can’t actually destroy my entire data model. Beautiful!

Corruption Method #2: Renaming Columns

Next I tried to rename the Year column to myYear.

Nope. Can’t break the model that way either.

Corruption Method #3: Renaming the Table

Finally, I tried to rename the table from Calendar to myCalendar:

And it looks like we’re protected from shooting our model in the foot too.

My thoughts on the fix

I’m 99% happy with this fix. It protects us from accidentally blowing up our data models, which is super important. Especially because it was possible to break the model and still run for months without every realizing it. That just shouldn’t be allowed to happen. So why am I not 100% happy?

Well, the first part is that Excel 2010 users are still susceptible to the issue. That’s a challenge, although to be fair Microsoft has been pretty forthcoming that the Load to Data Model hack is not truly a supported method anyway. So really, there’s not much of a surprise there. I’m not holding any points back on this one.

The last part – the remaining 1% for me - is that the fix, as implemented, means that you cannot ever rename a table in Power Pivot that was source from Power Query. In fact, even if you go back to Power Query and rename the table there, it still shows under the original name in Power Pivot. Granted it’s not a total show stopper, but you do want to give some thought to your query naming before you push it into the data model that very first time.

How can you ensure you have the fix?

If you’re running automatic updates for Office 2013, you should already have the fix in place. But if you want to check (or you don’t), then here’s the deal:

The full support KB article on the subject can be found here.

It will direct you to install the following updates:

KB3039800: update for Office 2013 – From October 13, 2015
KB3039739: update for Office 2013 – September 8, 2015
KB3085502: MS15-099 security update for Excel 2013 – September 8, 2015

(There is a 32 and 64 bit version of each, so make sure you pick up the right version.)

For reference, I just tried to install them, without checking if they’d been installed first. Fortunately it does a check first, so for me each of them came back with a message like this:

So there, you go. Great news for users of Power Query and Power Pivot 2013 and higher. You can now model with the confidence that you won’t accidentally blow up your solution!

The post Breaking Power Query via Power Pivot is a thing of the past appeared first on The Ken Puls (Excelguru) Blog.

This past weekend I attended SQL Saturday in Portland, OR. While I was there, I attended Reza Rad’s session on Advanced Data Transformations with Power Query. During that session, Reza showed a cool trick to merge data based on two columns through the user interface… without concatenating the columns first.

The Issue

Assume for a second that we have data that looks like this:

There’s two tables, and we want to join the account name to the transaction. The problem is that the unique key to join these two tables (which isn’t super obvious here) is a combination of the Acct and Dept fields. (Elsewhere in the data the same account exists in multiple departments.

To get started, I created two connection only queries, one to each table.

Select a cell in the left table (Transactions) –> create a new query –> From Table –> Close & Load To… Connection only
Select a cell in the right table (COA) –> create a new query –> From Table –> Close & Load To… Connection only

My Original Approach

Now, with both of those created, I want to merge the data so I get the account name on each row of the Transactions table. So how…?

Originally I would have edited each query, selected the Acct and Dept columns, and merged the two columns together, probably separating them with a custom delimiter. (This can be done via the Merge command on the Transform or the Add Column tab.)

Essentially, by concatenating the columns, I end up with a single column that I can use to dictate the matches.

Reza’s presentation showed that this isn’t actually necessary, and I don’t need to merge those columns at all…

Merge Data Based on Two Columns

So here’s how we can get those records from the COA Table into the Transactions table:

Right click the Transactions query in the Workbook Queries pane
Choose Merge
Select the COA query

The data now looks like this, asking for us to select the column(s) we wish to use for the merge:

So here’s the secret:

Under Transactions, click the Acct column
Hold down the CTRL key
Click the Dept column

And Power Query indicates the order of the columns you selected. It will essentially use this as a temporary concatenated value!

So now do the same to the COA table:

And then complete the merge. As you can see, you get a new column of data in your query:

of course, we can expand NewColumn to get just the Name field, and everything is working perfectly!

End Thoughts

This is pretty cool, although not super discoverable. The really nice piece here is that it can save you the work of creating extra columns if you only need them to merge your data.

I should also mention that Reza showed this trick in Power BI Desktop, not Excel. But because it’s Power Query dealing with the data in both, it works in both. How cool is that?

The post Merge Data Based on Two Columns appeared first on The Ken Puls (Excelguru) Blog.

I recently received a comment on one of my blog posts asking how to separate values and text, especially when there is no common delimiter such as a space separating them. This is a bit of an interesting one, as there is no obvious function to make this happen.

Background

The scenario the user has here is a list of values with their unit of measure, similar to this:

This issue here is that we don’t really have anything to easily split this up, as there isn’t really a good pattern. Sometimes there are spaces after the values, sometimes not. The letters change, and there is non consistency to how many characters the values represent. So how would you approach this?

You can download the sample workbook here.

My Route

I think that a solution for this type of problem is going to be specific to the data in use. Looking at the sample data, I figured that I can probably bank on all the numbers being at the beginning of the string, and that I probably won’t see something like square meters expressed as m2. Of course, if that assumption wasn’t correct, I’d have to come up with another method.

At any rate, the angle I approached this was to build a custom function to remove the leading numeric values. That should leave me with the text values, which I could then replace in the original string. Let’s take a look.

Removing Numbers

As we recommend in M is for Data Monkey, the way to build a custom function is to start with a regular query that will let us step through each piece you need to do.

So focussing on doing this through the user interface, here’s how I started this solution.

Create new Power Query –> From Other Sources –> Blank Query
In the formula bar, I typed in 1.07Kg (no quotes, just that text and pressed Enter
I then right clicked the text in the Power Query window, and choose to convert it to a list

Of course, you can’t do a ton with Lists in the user interface, so I converted it to a table:

List Tools –> Transform –> To Table –> OK

To be fair, I could have started by creating a record or a list from scratch (as we show you how to do in M is for Data Monkey,) but I didn’t really need to here in order to get up and running quickly. Regardless, I’m now sitting in a nice place where I have the entire UI exposed to do what I need (which was my original goal.)

At this point, things become pretty easy:

Right click Column1 –> Replace Values –> Replace 0 with nothing
Repeat for 1 through 9 and the decimal character

This removed all numbers and decimals, leaving me with just text. But because I know some of the values had spaces in them as well, I should deal with that:

Right click Column1 –> Transform –> Trim

The final thing I did was to drill into the data point there, as I don’t really want to return a table when I convert this into a function. To do that I needed to:

Click the fx on the left of the formula bar
Append the following to the text in the formula bar: [Column1]{0}

Notice that we now have just the data point, not the Column1 header.

Converting the Query to a Function

Now, we’ve got a neat little function that will let me take a data point, sanitize it, and turn it into data point with no leading values. But how can I repurpose that to use it for every record? The answer is to turn this query into a custom function, as we describe in Chapter 22 of M is for Data Monkey. Here’s how we do it:

Go to View –> Advanced Editor
Right before the “let” line, add the following:

(Source) =>

Go and place two / characters in front of the current Source line in order to comment it out (otherwise it would overwrite the function input)

//Source = “1.07Kg”,

Click Done
Rename the query to fxRemoveNumbers

That’s it. We’ve converted it to a function. Now you can go to Home –> Close & Load to save it and it’s ready for use. The interesting part here is that creating the logic is the hard part, converting it to a function is deadly easy.

Separate Values and Text

So now let’s use our new function to separate values and text. Here’s how I did this:

Select any cell in the table –> create a new query –> From Table
Go to Add Column –> Add Custom column
- New column name: Measure
- Column formula: fxRemoveNumbers([Quantity])

And we’ve got a nice new column with just the textual values.

Not bad, now we just need to figure out a way to replace the matching text in the Quantity column with nothing… After checking MSDN’s Power Query formula guide, I found a formula called Text.Replace() that seems it should do just that:

Go to Add Column –> Add Custom column
- New column name: Value
- Column formula: =Text.Replace([Quantity],[Measure],"")

To summarize here, we’re going to look at what is in the Quantity column and replace any instance of the text in the Measure column with the value between the two sets of quotes (i.e. nothing.) The results are shown below:

Now it’s just a simple matter of doing some cleanup:

Right click the Value column –> Change Type –> Decimal Number
Right click the Quantity column –> Remove

And there you go. It’s finished. We simply need to go to Home –> Close & Load to commit it, and then refresh it any time we need it.

M is for Data Monkey

The book is now available and is packed with good information that will help you solve this issue as well as many others. Learn more about the book here.

The post Separate Values and Text in Power Query appeared first on The Ken Puls (Excelguru) Blog.

There will be a regular blog post coming later this week, but we wanted to just throw out a quick heads up that we are currently accepting registrations for the last PowerQuery.Training class of 2015.

Registrations are open now for the class which begins on November 24, 2015. This will be your last chance of 2015 to get an in depth training class on the best damn tool to hit Excel in 20 years. (Sorry Power Pivot, but Power Query is going to reach more people overall.)

For more details on why you need to take this amazing live online workshop, check out the details here.

To register, you can follow this link (and click the Register button on the bottom right of the page.)

And don’t forget that when you register you get a free digital copy of our amazing new M is for Data Monkey book too.

Hope to see you there!

The post Last PowerQuery.Training Class of 2015! appeared first on The Ken Puls (Excelguru) Blog.

My last blog post was interesting in that I got a few emails about it. Both Imke Feldman and Bill Szysz sent me better methods, and a blog commenter asked for a slightly different version. For this post, I’m going to adapt Imke’s technique to show how we can Keep Only Numbers from a string of text (removing all other characters.)

In this Post:

In this post, we are going to keep only numbers in our data set. Actually, we’ll also keep spaces and decimals the first time around, but we could easily modify the function to clear those too. So for our first go, we’ll convert the data in the left column below, to show as displayed in the right column:

Of course, I started by just pulling the data into Power Query via the From Table command.

How to Keep Only Numbers

Looking at this from a logic point of view, what we want to accomplish is to remove any character that is not a number. Ideally, we would like to use a function like this in a custom column in order to do so:

=Text.Remove(text as nullable text, removeChars as any)

The first parameter should be pretty easy, we could just feed in the [Quantity] column, but how would we provide all the characters to the last parameter?

Here’s the cool part… removeChars is an “any” datatype… that means we’re not restricted to a single character, we can actually provide a list. So all we need to do is find a way to create a list of the characters to remove.

This is where Imke’s email to me was really helpful. She had a step similar to the following in her code:

CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_))

So what does this do? It actually creates a list of non-contiguous numbers (33-45, 47, 58-126), then transforms each value in the list into it’s alphanumeric equivalent. A partial set of the results is shown here:

For reference, character 32 is a space, 46 is a period, and 49-57 are the values from 0 through 9 – facts that you can discover by changing the values inside the lists.

In order to use this, I just popped into the Advanced Editor, and pasted the line above right between the “let” and “Source=…” lines. (Don’t forget to add a comma at the end.) And with a nice list of values contained the the CharsToRemove step, we can now create the custom column from the Source step:

Add Columns –> Add Custom Column
- Name: Result
- Formula: =Text.Remove([Quantity],CharsToRemove)

And it loads up nicely:

Now, keep in mind here that the purposed of this is to strip all characters except the numbers. In the case of things like m2 and m3 in this data set, we’re left with a the final value, but that is exactly what the query is designed to do.

The final M code for this solution is:

let
CharsToRemove = List.Transform({33..45,47,58..126}, each Character.FromNumber(_)),
Source = Excel.CurrentWorkbook(){[Name="RawData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Quantity", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Result", each Text.Remove([Quantity],CharsToRemove))
in
#"Added Custom"

Keeping Only Numbers

What if we wanted to also remove any spaces and decimals? Easy enough, just add those values to the original list in the CharsToRemove step as follows:

CharsToRemove = List.Transform({32,46,33..45,47,58..126}, each Character.FromNumber(_))

And the result:

Removing Numbers Only

Now let’s keep the text and remove the numeric characters from 0-9 only. To do this we modify the original list values again:

CharsToRemove = List.Transform({48..57}, each Character.FromNumber(_))

And the result:

End Result

This is pretty neat. Once we recongnize which character code represents each character, we can build a list of those characters to remove, and take care of them all in one shot. To put it all together, here is a look at the different views all shown in one table:

You can also download the completed file here.

The post Keep Only Numbers in Power Query appeared first on The Ken Puls (Excelguru) Blog.

In this week’s post we’re going to circle back to the original post on how to Separate Values and Text in Power Query from November 4, 2015. That post attracted a couple of suggestions from Imke Feldman and Bill Szysz with better methods.

The Issue at Hand

So why do we need to examine this again? Well, the reality is that the solution I built worked perfectly for the data set that I used. Bill, however, mocked up some different data which looked like this:

Now, my friend Scott would tell you that the user (I’ll paraphrase this) “should get a stern lesson on proper data entry practices”, but if the data is already in the system… it’s too late and we have to deal with it.

If you tried my method, you’ll find that it fails, as shown below:

Basically, any measure that has a number in it, or commas or spaces mid number… they’re all killers to my routine. So Bill emailed me to show me how he would approach the situation.

Bill’s Method

I’ve broken the steps back a bit from Bill’s original submission, and built it in a way that I think most users will approach this as you’ll see. (Bill’s original submission was a bit more perfect, but I show how I would have arrived there trying to build what he ended up with.)

If you’d like to follow along, the source workbook can be downloaded here.

Step 1: Pull in the Data

Of course, to start with, we need the data…

Create a new query –> From Table
Right click the Quantity column –> Transform –> lowercase

This last step is actually quite important. The reason is that we now want to split the data apart at the first instance of a character between a and z. Since Power Query is case sensitive, forcing the text to lowercase means that we won’t miss splitting based on a character in the A to Z set. It also means that we give Power Query less processing to do, since it only has to look for 26 characters, not 52 (both lower and upper case.)

Step 2: Separate Values and Text

Now that we know what we want to do, let’s do it. Let’s split the text by the first alpha character:

Go to Add Column –> Add Custom Column
- New Column Name –> Value
- Custom Column Formula:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyz")

This formula is quite interesting, as it will split by an of the characters between the quotes. Since we forced the text to lowercase, it will react to any letter of the alphabet from a-z or A-Z. But there is one small issue… it returns a list, not the text:

Since we’re only interested in the first item in this list at the moment (everything that precedes the first letter), we can modify the formula to drill in to just the first element. To do that:

Click the gear icon beside the Added Custom step in the Applied Steps window
Modify the formula to read as follows:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyz"){0}

Remembering that Power Query counts from a base of zero, and that the number between the curly braces allows us to drill into a specific item in the list, we then get back a list which includes only the first element, as follows:

With this done, we can extract the remaining values from the right using some text functions. (You can learn more about these in my post on 5 Very Useful Text Formulas – Power Query edition, or by reading Chapter 17 of M is for Data Monkey)

Go to Add Column –> Add Custom Column
- New Column Name –> Measure
- Custom Column Formula:

=Text.End(Text.From([Quantity]), Text.Length([Quantity])-Text.Length([Value]))

At this point, we can identify an issue in the way we stepped through the process. Can you see it?

In the original data set, the L (for litres) was capitalized. In our output, it’s not. If you don’t care about this, then skip step 3, but if you think this is important… we need to modify our steps a bit.

Step 3: Fix the Lower Case Steps

We caused the issue shown above by converting the Quantity column to lower case. Because that column sticks around, we really need it to retain it’s original format so that we can split the measure and retain the correct case for the characters. But ideally, we’d like to do this without modifying that original formula as follows:

=Text.SplitAny([Quantity],"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"){0}

So how? If we need it converted in order to split with a smaller list, what do we need to do?

The answer is to nest the lowercase step into our first Added Custom step. Let’s go modify the first Added Custom step:

Click the gear icon beside the Added Custom step
Modify the formula to read as follows:

=Text.SplitAny(Text.Lower([Quantity]),"abcdefghijklmnopqrstuvwxyz")

Now, let’s remove the Lowercased Text step and see if it still works (don’t forget to select the Added Custom1 step after you remove the Lowercased Text step:

Note: If you’re expecting this from the beginning, there is obviously no need to convert to lowercase, cause the error then fix it. You could skip the pain and just wrap the column in a Text.Lower() function to begin with. The reason I showed this route is that I find I frequently iterate this way when building my own data cleanup scenarios.

Step 4: Final Cleanup

The only thing left to do is convert the Value column to numbers. You’ll get an error if you try it though, as there are still spaces in the middle of some numbers (the commas are okay, but the spaces aren’t.)

Right click the Value column –> Replace Values
- Value to Find –> a single space
- Replace With –> leave this area blank
Right click the Value column –> Change Type –> Decimal Number

And finally:

Rename your query
Go to Home --> Close & Load

With the results working nicely:

The post Separate Values and Text – Part 2 appeared first on The Ken Puls (Excelguru) Blog.

The Symptom

How to fix date errors

Future Proofing

Drawbacks/Improvements

Why is this Power Query Update important?

Does this Power Query Update have any NEW features?

Background

Filter the most recent rolling 12 months from a table in Power Query

Step 1: Grab the data

Step 2: Show the most recent record first

Step 3: Create variables to hold the required data range

Step 4: Implement the variables into a filter

Step 5: Pivoting the data back into the original format.

Proof Positive

The Question

The Mockup

Options, Options, and more Options

Building the Output

Loading the data

Filter down to just the date cell

Refer to Prior Steps

It’s Faster with M

Final Thoughts

Background

Steps

Multiplying NULL values

End Thoughts

The Data Set

Determine the Header Row

Call up the Original Table

Remove All Rows Up To A Specific Value

Cleanup

Testing

Using M

Things Changing…

Refresh Power Queries With VBA

How to get your name in M is for Data Monkey

The issue

Performing OR logic tests – via merging columns

Performing OR logic tests – via list functions

Performing AND logic tests

A Quick Note

Where this is useful

Methods to Split Power Queries

How to Split Power Queries – the easy way

Making Use of the Split Power Queries

Loading the Split Power Queries to the Data Model

Final Thoughts

Background:

Allocate Units Based on Date: Method

Step 1: Create a function to return a list of months

Step 2: Add a Year End date column

Step 3: Modify to list Months in Period

Step 4: Turn the routine into a function

Step 5: Using the function

My Suggestion for a Clean Whitespace Function

How it Works

Demo

Final Thoughts

Set up the data

Split by Line Breaks

The Good/Bad News

Other Special Characters

But now, breaking Power Query via Power Pivot is a thing of the past…

Corruption Method #1: Deleting Columns

Corruption Method #2: Renaming Columns

Corruption Method #3: Renaming the Table

My thoughts on the fix

How can you ensure you have the fix?

The Issue

My Original Approach

Merge Data Based on Two Columns

End Thoughts

Background

My Route

Removing Numbers

Converting the Query to a Function

Separate Values and Text

M is for Data Monkey

Other Posts on the Subject