Duplicate data records in your marketing automation (MA) platform and CRM can be costly in significant ways. Bad data quality hurts U.S. businesses to the tune of more than $15 million each year, and contacting prospects and customers multiple times with the same information negatively impacts potential revenue by 25 percent.
If you’re not able to find the time to tackle your duplicates head on, you’re not alone. Experian found that up to 94% of businesses report issues with data quality. And organizations without any formal data governance initiatives tend to see duplication rates between 10 and 30 percent. That’s a lot of bad data!
The good news is that, more often than not, duplication issues can be stopped and corrected. If you’re ready to get a handle on your organization’s duplicates, follow these 8 steps:
- Identify and mitigate the root cause(s) of duplication.
The first step is to identify what’s causing your duplication issue (and there may be multiple sources). In my experience, dupes are typically generated by any combination of the following:
- Manual list imports
- Incorrect platform settings (unique record or dedupe rules)
- Inaccurate or overly complex integration processes
- Rogue processes utilizing the platform API
Identifying the root cause(s) is not always easy and hardly ever fun, but you can use record data points to help identify how these records are being created. When it comes to forensics, Lead Source is usually your strongest clue.
Other helpful data points include:
- Creation Date: Many duplicate records may share a creation date that aligns with an event, such as a conference or webinar.
- Owner/Created By User: If a significant number of dupes are created by a single individual or small set of users, ask them how and why they are creating these dupes.
- Region/Address Data: If most, or all, duplicate leads are coming from a specific region, the team responsible for loading leads for that region would be a great place to start.
- Email Domain: If you notice several duplicates with the same email suffix, the source is usually at the account level or a very targeted campaign/event.
- Activity Date: Look for activities that share the same date as the duplicate record creation date.
- Stop all identified sources of duplication.
This step is often easier said than done, but if you continue to let duplication occur, you’ll be doomed to repeat these steps over and over again.
If users are manually entering dupes in Salesforce, for example, you can implement a dedupe checker and train users on the proper way to create new records.
List importing processes are a bit more difficult get a handle on, but solutions exist that will check to see if records in your list already exist prior to a list load. In other cases, you may just need to modify the dedupe rules in your platform.
If you need assistance with this step, DemandGen can help!
- Plan, plan, plan.
Before moving forward with any merging (deduplication) efforts, it is imperative that you plan it out carefully and have the solution (process, logic, and timing) vetted by stakeholders and guardians of all impacted subsystems.
Some things to consider when preparing a merging exercise:
- Which systems will be impacted by a record merge and how?
- Merging typically reassigns child records (activities, custom object records) from a duplicate record to the original/master record, effectively deleting the duplicate record(s).
- How does the merge process work in your platform?
- Does it automatically reassign child records from the duplicate to master record?
- Will it overwrite values from the master record with values from the duplicate record?
- What impact, if any, will the merge process have on fields like Lead Score?
- Can modifying field values trigger changes to lead scoring, lead routing, and/or ownership?
- Will merging involve multiple steps in some cases?
- In Salesforce, for example, you may need to capture the Account ID of the “winning” SFDC Contact before merging a duplicate SFDC Lead into the winner record.
- Identify duplicate records.
Now it’s time to analyze your data and determine the appropriate actions to take on specific records.
First, define what a unique lead record should be for your business. In most cases, a unique lead record is defined as a unique email address. Thus, any subsequent records with the same email address is a duplicate.
However, there are times when an email address is not sufficient enough to classify as a unique lead. You may need to combine email address with one or more fields, such as Company Name or Account ID, to define uniqueness.
Once you have defined what a unique lead is, use this definition to group all records with matching values for these fields. For example, if Email Address is the only field in your uniqueness definition, then all records that share email@example.com are part of the same duplicate group.
Next, define a general rule, or set of rules, to determine which record in each duplicate group is the winner (master record). Duplicate records within the same group will be merged into this master record. Typically, the oldest record is chosen as the master record, but you need to decide what set of criteria makes the most sense for your business based on the data points available to you.
By the end of this process you should have each record categorized by one of the following action descriptors:
- Winner: This is the master record of a duplicate group.
- Merge: This record is a duplicate and will be merged into the appropriate master record.
- No Action: This record is to be excluded and shall remain untouched by the merge process.
NOTE: In some cases, there is a justifiable business need to create duplicate records, such as having multiple records with the same email address. Be sure to note if there are any cases when a duplicate record should not be merged, such as SFDC Contact records that belong to different Accounts.
- Validate your results.
After you have categorized your data by Winner, Merge, and No Action, validate the results to ensure two things:
- You implemented your uniqueness and winning logic correctly.
- Your results are what you expected (you may find that your original logic may not have handled every case as intended and some logic tweaking is in order).
I like to export the data, sorted by duplicate groups, into Microsoft Excel. I use a macro that emphasizes each duplicate group with a bold border and colorizes each record by its action category. I find that this helps out tremendously with the validation process:
If all looks great, then you can proceed to the next step. Otherwise, go back to step 4, revise your logic as needed, and try again.
- Prepare to merge your duplicates.
The process of preparing your data depends on how you will perform the merge process.
If you plan to merge your records manually, then a simple file containing Duplicate ID, Email Address + Unique Field (if any), and Master Record ID is sufficient for someone to act on.
However, it is best to leverage the platform API to perform the merge step, in which case you will need to refer your platform’s API documentation for instructions on how to execute the merge process. Quite often, you only need the Duplicate ID and Master Record ID pair to feed into the API.
If you are performing the merge actions programmatically, it is always best to test out your process in a sandbox environment first and confirm the following:
- The merge process is working as expected (with no negative impact to your data or performance of your platform instance)
- The average processing time of a merge call (to gauge how long it will take to merge your entire set of duplicate records)
- Prepare the business for what to expect.
Now that you’ve prepared your data, it’s time prepare the business for what to expect so there are no surprises and no confusion. At a minimum, communicate the following details to your organization:
- That you are performing a merge process in your platform
- When the process is expected to start and finish
- What actions they need to take, if any
- What to expect when the process is finished
I find that it is best to perform the merge process over a weekend to minimize potential impact and disruption (platform performance and data collisions). Asking users not to access the system during the process is not always necessary (and very rarely is an option).
Also, be sure to disable triggers as it makes sense. You may have lead routing or lead scoring processing, for example, that you do not want affected during the merge process.
- Merge your records.
Now that you have properly planned, vetted, validated, tested, and communicated your process, you can now begin with your merging process!
It is very important to have someone monitor the process, so they can put an immediate halt to it should something go awry. While planning can help minimize any potential impacts, it can be difficult to account for all possible scenarios in a sea of data.
Get a handle on your data
If you’re ready to get the most value from your data, DemandGen offers a number of Data Services:
- Tackle duplicate records
- Apply ongoing data normalization practices
- Enrich the data you already have
- Gather more information with forms and progressive profiling
- Segment your data for more targeted campaigns
Let us know how we can help!
Rick Segura, Data & Insights Technical Specialist at DemandGen, is a vastly experienced data guru. Having 20+ years of database experience, Rick is a master of ETL (data processing), data merging & aggregation, report & dashboard development, data analysis, and automation. He is very passionate about delivering intuitive, purposeful data-based solutions.