This post solves a tricky issue of removing offset duplicates or, in other words, removing items from a list that exist not only in a different column, but also on different rows.

Problem History

This data format is based on a real life example that my brother in law sent me. He is a partner in a public practice accounting firm*, and has software to track all his clients. As he’s getting prepared for tax season he wants to get in contact with all of his clients, but his tax software dumps out lists in a format like this:

As you can see, the clients are matched to their spouses, but each client (spouse or not) has their own row in the data too. While this is great to build a list of unique clients, we only want to send one letter to each household.

The challenges we have to deal with here is to create a list of unique client households by removing the spouse (whomever shows up second) from the list. The things we need to be careful of:

Not accidentally removing too many people based on last name
Getting the duplicate removal correct even if the spouse has a different last name

You can download a file with the data and solution here if you'd like to follow along.

The solution

Alright, so how do we deal with this then? Well, the first thing, naturally, is to pull the data into Power Query:

Click in the table –> create a new query –> From Table

This will launch us in to the Power Query editor where we can start to make some magic happen.

The first thing we need to do is to give each line in our client file a unique client ID number. To do that:

Go to Add Column –> Add Index Column
Right click the Index column –> Rename –> ClientID

Which creates a nice numbered list for us:

So basically, what we have here now is a client ID for each "Client" (not spouse) in our list.

Figuring out the Spouse's ClientID

The next step to this problem is to work out the Spouse's client ID for each row as well. To do that we're going to employ a little trick I've actually been dying to need to use. Winking smile

See, ever since I've started teaching Power Query to people, I've mentioned that when you go to append or merge tables, you have to option to use merge the table you're working on against itself. As I've said for ages "I don't know when I'll need to use this, but one day I will, and it's comforting to know that I can." Well… that day is finally here!

Go to Home –> Merge Queries
From the drop down list, pick to merge the query to itself

Now comes the tricky part… we want to merge the Client with the Spouse, so that we can get the ClientID number that is applicable to the entries in the Spouse columns. So:

In the top table, select Client FirstName –> hold down CTRL –> select Client LastName
In the bottom table, select Spouse FirstName –> hold down CTRL –> select Spouse LastName

The result should look like this:

Once you have that set up correctly, follow these steps to merge and extract the necessary data:

Click OK

The results look like this:

Before you go further, have a look at the order of the ClientID records. Nothing special, they are in numerical order… remember that…

Now, let's extract the key components from that column of tables (i.e. the ClientID for the Spouse):

Click the Expand arrow to the top right of the newly created NewColumn
Uncheck all the items in the filter except the ClientID column
Uncheck the default prefix option at the bottom
Click OK
Right click the new ClientID.1 column –> Rename –> SpouseID

And the results look like this:

Looks good, and if you check the numbers, you'll see that our new column has essentially looked up the spouse's name and pulled the correct value from the ClientID column. (Zoe Ng has a client ID of 2. Zoe is also Tony Fredrickson's spouse – as we can see on row 4 – and the Spouse ID points back to Zoe's value of 2.

Remember how I mentioend to pay attention to the order of the records in the previous step? Have a look at the ClientID column now. I have NO IDEA why this changed, but it happend as soon as we expanded the merged column. I'm sure there must be some logic to it, but it escapes me. If you know, please share in the comments. It doesn't affect anything – we could sort it back into ClientID order easily - it's just odd.

At any rate, we can now fully solve the issue!

Removing Offset Duplicates

So we have finally arrived at the magic moment where we can finish this off. How? With the use of a custom column:

Go to Add Column –> Add Custom Column
Provide a name of "Keep?"
Enter the following formula:
- if [ClientID]<[SpouseID] then "Keep" else "Remove"
Click OK

And here is what you'll end up with:

That's right! A nice column you can filter on.

The trick here is that we are using the first person in the list as the primary client, and the spouse as the secondary, since the list is numbered from top to bottom. Since we've looked up the spouses ID number, we can then use some very simple math to check if the ClientID number is less than the Spouse's ClientID. If it is we have the primary client, if not, we have the spouse.

So let's filter this down now:

Filter the Keep? column and uncheck the Remove item in the filter
Select the ClientID, SpouseID and Keep? columns –> right click –> remove

And finally we can go to Home –> Close & Load

And there you are… a nice list created by removing offset duplicates to leave us with a list of unqiue households:

*Speaking of accountants

Just a quick note to say that even though I'm an accountant, my brother in law Jason is so good at tax that I use him to do mine. If you need a good accountant in BC, Canada, look him up here.

The post Removing offset duplicates appeared first on The Ken Puls (Excelguru) Blog.

Removing offset duplicates

Problem History

The solution

Figuring out the Spouse's ClientID

Removing Offset Duplicates

*Speaking of accountants

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List