Quantcast
Channel: PowerBI Archives - The Excelguru Blog
Viewing all articles
Browse latest Browse all 178

Clean WhiteSpace in PowerQuery

$
0
0

The other day as I was working through a model, I once again tripped upon the fact that Power Query’s Text.Trim function doesn’t clean whitespace inside the text string, only at the ends.  For those who are used to Excel’s TRIM function, this is a frustrating inconsistency.

Just to circle on it, here’s the difference:

Source Function Result
Excel =TRIM(“  trim   me  “) “trim me”
Power Query =Text.Trim(“  trim   me  “) “trim   me“

Typically, I’ve just gone through the cycle of replacing a double space with a single space a few times on the same column to deal with this issue.  The issue, of course, is that you need to do this twice if there are 4 spaces, but add more spaces, and you have to do this more times.  Doesn’t seem like a really robust solution.

At any rate, this time I emailed one of my friends on the Power Query team and suggested that they should implement a function to make this a bit easier.

My Suggestion for a Clean Whitespace Function

The gist of my suggestion was to create a new function that would not only trim the whitespace internally, but would also allow you to specify which character you want to clear out.  This way it would work nicely to clean whitespace in the shape of spaces (the usual culprit in my world), but would also allow you to substitute in other characters if needed.  (Maybe you need to replace all instances of repeating 0’s with a single 0.)

It got referred to another friend on the team, (who wishes to remain nameless,) and he responded with some pretty cool code.  I’ve taken that code, broken it down and modified it a bit, and the end result is a slightly different version that can work the same as Excel’s TRIM() function, but adds an optional parameter to make it even more robust.  For lack of a better name, I’m going to call it “PowerTrim”.  (Just trying to do my part to keep the Power in Power Query!) 😉

Here’s the function:

(text as text, optional char_to_trim as text) =>
let
char = if char_to_trim = null then " " else char_to_trim,
split = Text.Split(text, char),
removeblanks = List.Select(split, each _ <> ""),
result=Text.Combine(removeblanks, char)
in
result

And to implement it, you’d take the following steps:

  • Copy the code above
  • Create a new query –> From Other Sources –> Blank Query
  • Change the query name to PowerTrim
  • Go into the Advanced Editor
  • Select all the text and replace it with the code above –> Done

Like this:

image

How it Works

We’d call this from a custom column, feeding in a column of text, and specifying the character (or even string of characters) we’d like to trim.  The function then works through the following process:

  • It checks to see if the char_to_trim was provided, and uses a space if not
  • It splits the text by that character, resulting in a list:

image

(This list shows the word “bookkeeper” split by “e”)

It then:

  • Filters out any blank rows
  • Combines the remaining items using the original character to split by

(The original version was actually all rolled up in one line, but I find it easier to debug, step through, examine and play with when it’s separated.)

Demo

Here’s some examples of the function in action. I started with a raw table from Excel.  (Create a new query –> From Table)

image

And added a Custom column by going to Add Column –> Add Custom Column

  • Name:  Trim_null
  • Formula:  =PowerTrim([Text])

image

Notice that in the first row it trimmed the leading, trailing and internal spaces.  Just like Excel!  (Remember that if you used Power Query’s default Text.Trim() function, you would return “trim   me”, not “trim me”.)

Now, let’s add another and try with an alternate character… like 0.  Again, we go to Add Column –> Add Custom Column:

  • Name:  Trim_0
  • Formula:  =PowerTrim([Text],”0”)

image

In this case the extraneous zeroes are trimmed out of row 3, leaving only a single one.  Cool stuff.  Now what about the “e”. Let’s see how that one goes.

Once more to Add Column –> Add Custom Column:

  • Name:  Trim_0
  • Formula:  =PowerTrim([Text],”e”)

image

The first time I looked at this, I thought there was an issue with the function.  But then I remembered in this case we are removing all leading and trailing e’s, as well as replacing any duplicate e’s with a single e.  You can see that this is indeed what happened in both rows 2 and 4.

Final Thoughts

I wish there was a way to get this to easily role into the Text functions category, so that I could call it something like Text.PowerTrim() or even replace the Text.Trim() function with my own.  Unfortunately a query name can’t contain the period character, which kind of sucks.  I guess it’s to to protect you from accidentally overwriting a function, but I’d like the ability to do it intentionally.

The post Clean WhiteSpace in PowerQuery appeared first on The Ken Puls (Excelguru) Blog.


Viewing all articles
Browse latest Browse all 178

Trending Articles