Quantcast
Channel: PowerBI Archives - The Excelguru Blog
Viewing all articles
Browse latest Browse all 178

Creating a VLOOKUP Function in Power Query

$
0
0

Tonight I decided to actually follow through on something I’d been musing about for a while:  building a full fledged VLOOKUP function in Power Query.  Why?  Yeah… that’s probably a good question!

Replicating VLOOKUP’s exact match is REALLY easy in Power Query.  You simply take two tables and merge them together.  But the approximate match is a little harder to do, since you don’t have matching records on each side to merge together.

Now, to be fair, you could go the route of building a kind of case statement, as Chris Webb has done here.  In actual fact, you probably should do that if you want something that is lean and mean, and the logic won’t change.  But what if you wanted to maintain a table in Excel that holds your lookup values, making it easy to update? Shouldn’t we be able to take that and use it just like a VLOOKUP with an approximate match?  I don’t see why not.  So here’s my take on it.

Practical Use Cases

I see this as having some helpful use cases.  They’ll mostly come from Excel users who are experienced with VLOOKUP and maintain lookup tables, and reach back to that familiarity.  And they would probably be tempted to do something like this:

image

The concern, of course, is that landing data in the worksheet during this cycle contributes to file size, memory usage and ultimately re-calc speed, so if you can avoid this step on the way to getting it into Power Pivot, you plainly want to do that.

The cool thing is that by building this the way I’ve done it, you’re not restricted to landing your data in the worksheet to use VLOOKUP with it.  You can pull data into Power Query from any source (csv, text file, database, web page) and perform your VLOOKUP against your Excel table without that worksheet round trip.

Let’s Take a Look…

Now, I AM going to use Excel based data for this, only because I have a specific scenario to demonstrate.  You can download a sample file – containing just the data – from this link.  (The completed file is also available at the end of the post.)

So, we have a series of numbers, and want to look them up in this table:

image

I really used my imagination for this one and called it “LookupTable”.  Remember that, as we need that name later.  Note also that the first record is 1, not 0.  This was done to demonstrate that an approximate match can return a #N/A value, as you’ll see in a minute.

Now here’s what things would look like using standard Excel VLOOKUP formulas against that table:

image

Hopefully this makes sense.  The formulas in columns 2, 3 and 4 are:

  • =VLOOKUP([@Values],LookupTable,2,TRUE)
  • =VLOOKUP([@Values],LookupTable,3)
  • =VLOOKUP([@Values],LookupTable,2,FALSE)

Just to recap the high points here… column 2 declares the final parameter as ,TRUE which will give us an approximate match.  Column 3 doesn’t declare the final parameter, which will default to ,TRUE and give an an approximate match.  Column 4 declares the final parameter as ,FALSE which means we’ll want an exact match.  The end result is that only one value matches, which is why we get all those #N/A results.

Standard VLOOKUP stuff so far, right?

Creating the VLOOKUP function in Power Query

Before we get to using the function, we need to create it.  To do that we’re going to go to:

  • Power Query –> From Other Sources –> Blank Query
  • View –> Advanced Editor

Highlight all the code in that window and replace it with this… (yes, it’s not short)

let pqVLOOKUP = (lookup_value as any, table_array as table, col_index_number as number, optional approximate_match as logical ) as any =>
let
/*Provide optional match if user didn't */
matchtype =
if approximate_match = null
then true
else approximate_match,

/*Get name of return column */
Cols = Table.ColumnNames(table_array),
ColTable = Table.FromList(Cols, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ColName_match = Record.Field(ColTable{0},"Column1"),
ColName_return = Record.Field(ColTable{col_index_number - 1},"Column1"),

/*Find closest match */
SortData = Table.Sort(table_array,{{ColName_match, Order.Descending}}),
RenameLookupCol = Table.RenameColumns(SortData,{{ColName_match, "Lookup"}}),
RemoveExcess = Table.SelectRows(RenameLookupCol, each [Lookup] <= lookup_value),
ClosestMatch=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},"Lookup"),

/*What should be returned in case of approximate match? */
ClosestReturn=
if Table.IsEmpty(RemoveExcess)=true
then "#N/A"
else Record.Field(RemoveExcess{0},ColName_return),

/*Modify result if we need an exact match */
Return =
if matchtype=true
then ClosestReturn
else
if lookup_value = ClosestMatch
then ClosestReturn
else "#N/A"
in Return
in pqVLOOKUP

Now:

All right… the function is there.  Now let’s go make use of it… (we’ll come back to how it works in a bit.)

Using the VLOOKUP function in Power Query

Now, before we go any further, I want to ask you a favour.  I need you to pretend for a second.  Pretend that the data we are connecting to next is a database, not an Excel table.  You’ll see how this can be useful if you’ll play along here.  (The only reason I’m using an Excel table for my source data is that it’s easier to share than a database.)

Let’s go click in the DataTable table.  (This one:)

image

Now, let’s upload this “database” into Power Query…

  • Go to Power Query –> From Table

You should have something like this now:

image

Funny how Power Query reads the #N/A values as errors, but whatever.  Let’s get rid of those columns so that we’re left with just the Values column.

  • Right click Values –> Remove Other Columns

Now, we’re going to make a really small M code edit.

  • Go to View –> Advanced Editor
  • Copy the second line (starts with Source =…)
  • Paste it immediately above the line you just copied
  • Modify it to read as follows:
    • Source –> LookupSource
    • DataTable –> LookupTable

Your M code should now look as follows:

image

  • Click Done

Nothing really appears to look different right now, but you’ll notice that you have an extra step called “LookupSource” on the right.  If you switch back and forth between that and Source, you’ll see we are looking at the original DataTable and the LookupTable.  The reason we do this is to make the next step really easy.

  • Go to Add Column –> Add Custom Column
  • Call the column 2 True
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,true)

Okay, so what’s what?

  • pqVLOOKUP is the name of our function we added above
  • [Values] is the value we want to look up
  • LookupSource is the table we want to look in to find our result
  • 2 is the column we want to return
  • true is defining that we want an approximate match

And, as you can see when you click OK, it works!

image

Let’s do the next two columns:

  • Go to Add Column –> Add Custom Column
  • Call the column 3 default
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,3)

So this time we asked for a return from the 3rd column, and we omitted the final parameter.  Notice that it defaulted to true for us:

image

Last one…

  • Go to Add Column –> Add Custom Column
  • Call the column 2 false
  • Enter the following formula:
    • pqVLOOKUP([Values],LookupSource,2,false)

And how about that, all but one comes back with #N/A:

image

And with that you can load this into a table in the worksheet:

image

Notice that the results are identical to that of the original Excel table, with one exception… the #N/A I have provided is text, not an equivalent to the =NA() function.

The completed file is available here.

How Does the VLOOKUP Function in Power Query Actually Work?

This VLOOKUP actually has some advantages over the VLOOKUP we all know and love.  The most important is that we don’t need to worry if the list is sorted or not, as the function takes care of it for you.  It essentially works like this:

  • Pull in the data table
  • Sort it descending by the first column
  • Remove all records greater than the value being searched for
  • Return the value in the requested column for the first remaining record UNLESS we asked for an Exact match
  • If we asked for an Exact match then it tests to see if the return is a match and returns #N/A if it’s not

Some key design principles I used here:

  • The parameters are all in EXACTLY the same order as Excel’s VLOOKUP
  • The required, optional and default parameters match what you already know and use in Excel
  • The function is dynamic in that it will work no matter what your lookup table column names are, how many rows or columns it has
  • It returns results that are in parallel with Excel’s output
  • The function is pretty much a drag’n’drop for your project.  The only thing you need to remember is to define the lookup table in the first part of your query

So how cool is that?  You love VLOOKUP, and you can now use it in Power Query to perform VLOOKUP’s from your Power Query sourced database queries against tables of Excel data without hitting the worksheet first!  (In fact, if your database has an approximate table, you could VLOOKUP from database table against database table!)

The post Creating a VLOOKUP Function in Power Query appeared first on The Ken Puls (Excelguru) Blog.


Viewing all articles
Browse latest Browse all 178

Trending Articles