Count Duplicates Based on Text Field
I'm running into a problem that I'm struggling to solve and was hoping that I could get some help here. I have a numbered list that represents employees. Sometimes, employees are converted from part-time to full-time and when this happens, they are assigned a new employee ID. I need to be able to identify cases where there are multiple list items that represent the same individual (based on a text field that shows their name). In excel, this would be accomplished by using a countif formula. I'd like to know if there is a simple solution that I'm just not seeing. I think I could create a new list based on the concatenation of employee ID and name, and then have the name be a list property and then do a count based on that, but I'd like to know if there is a solution that requires less steps (preferably no need for a new list or import action). Screenshot below; The "Count of items" field is what I'd like to make formulaic.
You can achieve the same result in your example using the RANK function. Create a helper line item to rank (in example below, Count = 1), and then the Rank formula = RANK(Count, ASCENDING, MAXIMUM, TRUE, Name).
The combination of ASCENDING for the direction parameter, MAXIMUM for the equal behaviour parameter and Name for the grouping parameter will give you the highest index of matches. Note that the "Name" value will have to match exactly.
See example below:
PS: My screenshot doesn't show it, but the "Count" line item that is used to help do the rank a) doesn't need to be against the dimensionality of the module, and b) doesn't even need to be a line item itself! You can hard-code "1" into the RANK formula --> RANK(1, ASCENDING, MAXIMUM, TRUE, Name)23
Playing Devil's Advocate, what if you have two different individuals with the same name?0
I can actually use email (which is a text field) instead of name to make it a tighter match. Understand that this may have holes, but this is just being used to flag potential issues so I'm not too concerned.
In that case, see if this. works:
- Can you make a flat list out of the email addresses?
- if so, then create a constant line item with a 1 as the value
- then create a module based on the flat email list and do a sum off the constant line item. If it is >1, then you have duplicates.
Does that work for you?
Thanks Rob. That makes sense. I was trying to see if there was a way to do it without having to run an import process (to create that list), but it sounds like this is the only way.
Thanks Rob. That’s the approach I’ll take. Was trying to see if I could skip the import process (that we’d use to populate that email list), but it sounds like that’s not possible.
Appreciate your help on this!
@kevin.cho Brilliant. I think this can be turned into Best Practice Article.3
Sure, go for it. You can extend the capabilities of Rank function instead of just focusing on this particular use case. Looking at the hidden capabilities of Rank Function I started liking it even more.
Some of the limitations of Isfirstoccurrence (IFO) can easily be tackled with Rank function. For Example unlike IFO you can easily get the Rank 1s (Unique or in other words first occurred values) on any line item.2
Thank you @kevin.cho , this works perfectly.
Thank you @rob_marshall and @Misbah for your help and commentary as well.0
thanks a lot.
Very simple, you save my life 😀0