Bad data doesn’t just lead to bounce backs and dead ends — they cost you money. According to SiriusDecisions research, cleaning and deduping a single record costs $10 — and if you leave it in your database, it could cost about $100. With data decaying at a rate of at least 30% annually, losses can add up fast. Gartner reports that poor data quality costs U.S. businesses up to $14.2 million per year, and at the macro level — taking into consideration missed opportunities, operational overhead and other related costs — up to $3 trillion per year.
If you’re not able to find the time to tackle your duplicates head on, you’re not alone. Experian found that up to 94% of businesses report issues with data quality, and organizations without any formal data governance initiatives tend to see duplication rates between 10-30%.
Fortunately, data duplication is an addressable problem. Here are eight easy steps to a successful deduplication project, with some helpful tips for getting it right.
Step 1: Find the Root Cause
Duplicates don’t appear out of thin air — they’re created. If you figure out what’s causing the duplicates, you can prevent them. There may be multiple sources, and here are the most common:
- Manual list imports
- Incorrect platform settings (unique record or dedupe rules)
- Inaccurate or overly complex integration processes
- Rogue processes utilizing the platform API
Looking for clues can help you identify the root cause. Check the lead source first, and the answer may be obvious. Other helpful data points include:
- Creation Date: Many duplicate records may share a creation date that aligns with an event, such as a conference or webinar.
- Owner/Created By User: If a significant number of dupes are created by a single individual or small set of users, ask them how and why they are creating these dupes.
- Region/Address Data: If most, or all, duplicate leads are coming from a specific region, ask the team responsible for loading leads for that region.
- Email Domain: If you notice several duplicates with the same email suffix, the source is usually at the account level or a very targeted campaign/event.
- Activity Date: Look for activities that share the same date as the duplicate record creation date.
Step 2: Stop the Madness!
Immediately halt all the sources you identified as producing duplicates. You may be hesitant to do this for fear of causing disruption to the sales process, but it’s critical to keeping your database functional. You can then implement new training or processes to minimize duplicates. For example, if you find that users are manually entering duplicate records in Salesforce, you can implement a dedupe checker and train users on the proper way to create new records.
Pro tip: Many marketing organizations import lists, and these can be a main source of duplicates. Find a dedupe solution that checks list uploads against your existing database. Or, modify the deduplication rules in your platform.
Step 3: Make a Plan
Before you execute on your deduplication strategy, it’s critical that you plan it out carefully and vet the processes, logic and timing with stakeholders ahead of time. Some things to consider include:
- Which systems will be impacted by a record merge and how?
- How does the merge process work in your platform? Does it automatically reassign child records from the duplicate to master record? Will it overwrite values from the master record with values from the duplicate record?
- What impact, if any, will the merge process have on fields such as Lead Score? Can modifying field values trigger changes to lead scoring, lead routing, and/or ownership?
- Will merging involve multiple steps in some cases? For example, in Salesforce, you may need to capture the Account ID of the “winning” SFDC Contact before merging a duplicate SFDC Lead into the winning record.
Pro Tip: Make sure to involve the guardians of the data and any subsystems that might be impacted.
Step 4: Identify Duplicate Records
Now it’s time to analyze your data and determine the appropriate actions to take on specific records.
- First, determine how you will define a unique lead record. Most of the time, you can use email addresses. However, there are times when an email address is not sufficient enough to classify as a unique lead and you may need to combine email address with one or more fields, such as Company Name or Account ID, to define uniqueness.
- Next, use the definition to group all records with matching values for these fields. For example, if Email Address is the only field in your uniqueness definition, then all records that share firstname.lastname@example.org are part of the same duplicate group.
- Finally, define a general rule, or set of rules, to determine which record in each duplicate group is the winner (master record). Duplicate records within the same group will be merged into this master record. Typically, the oldest record is chosen as the master record, but you need to decide what set of criteria makes the most sense for your business based on the data points available to you.
By the end of this process you should have each record categorized by one of the following action descriptors:
- Winner: This is the master record of a duplicate group.
- Merge: This record is a duplicate and will be merged into the appropriate master record.
- No Action: This record is to be excluded and shall remain untouched by the merge process.
Pro Tip: In some cases, there is a justifiable business need to create duplicate records, such as having multiple records with the same email address. Be sure to note if there are any cases when a duplicate record should not be merged, such as SFDC Contact records that belong to different Accounts.
Step 5: Validate Your Results
Once you’ve categorized your records, validate the results to ensure that you’ve implemented your logic for identifying unique and winning records correctly, and that your results are what you expected. You may need to tweak the logic slightly so that every case is handled as you intended. If everything looks right, you can proceed to the next step.
Step 6: Prepare the Data
The process of preparing your data depends on how you will perform the merge process.
If you plan to merge your records manually, then a simple file containing Duplicate ID, Email Address + Unique Field (if any) and Master Record ID is sufficient. However, leveraging the platform API to perform the merge step may be preferable. If you do, you’ll need to refer to your platform’s API documentation for instructions on how to execute the merge process. (You may only need the Duplicate ID and Master Record ID pair to feed into the API.)
Pro Tip: If you’re performing the merge actions programmatically, test out your process in a sandbox environment first to confirm that the merge process is working as expected with no negative impact to your data or the platform’s performance. You’ll also want to note the average processing time of a merge call to gauge how long it will take to merge your entire set of duplicate records.
Step 7: Communicate Your Plan
No data deduplication project should be executed without organizational buy-in. Make sure you share your plan with key stakeholders in advance, and include the following information:
- What you’re planning to do and why it’s important
- When the process is expected to begin and end
- What actions they need to take, if any
- What they can expect when the deduplication is finished
Pro Tip: It’s best to perform the merge over a weekend to minimize potential disruption to the organization. In some cases, a merge can slow platform performance or impact users’ ability to access data. Also, if you have lead routing or lead scoring in place, be sure to disable the triggers.
Step 8: Ready, Set, Merge!
It’s showtime! And while planning can help minimize any potential impacts, it can be difficult to account for all possible scenarios in a sea of data. Make sure a team member monitors the process, so they can stop it if something goes wrong.
Make it a Group Effort
DemandGen Offers an array of Data Services, from normalization to enrichment, progressive profiling, segmentation and more. Contact us today for a consultation about how we can help you optimize your data to provide more value to your organization.