The Dedupe and Match add-on enables you to
- identify and remove duplicate records in any single object in your Salesforce org, and
- find records in any single object that match the contents of a given CSV file.
Both these functions include powerful matching criteria such as fuzzy matching, company name or address reductions, and first name matching.
To install the add-on into your Salesforce org:
- Ensure that you have Apsona for Salesforce already installed. If not, please install it from
our AppExchange listing.
- Visit our AppExchange
listing for the Dedupe and Match add-on,
click the Get it Now button, and install it into your org.
- Click Setup - Installed Packages - Manage Licenses - Apsona Dedupe and Match, and assign licenses to your users.
- Click the Apsona tab in your org, and within Apsona, click Settings - Clear cache so that the newly-installed add-on becomes available.
If you are seeing cases where a user with an assigned license is unable to see the Dedupe/Match menu item, please make sure that the user's profile has
access to all of the elements in the Dedupe package — specifically the
custom object, and the
checkbox on the Apsona Item object.
The deduplication process
The general process followed when deduplicating records in a particular object (e.g., the Contact object) is an iterative one. In each iteration, you
would set up some dedupe parameters, find the duplicates based on those parameters, and merge them. You would then revise the dedupe parameters to catch
more duplicates, and merge the result. For example, as a first iteration, you might
identify all Contact records that contain the exact same name and email fields, and merge those together. In the next iteration, you might loosen the First Name match function to match on loose First Names (e.g., treat Bob and
Robert as the same), and then find duplicates and merge them. The process is iterated with progressively looser match criteria until very few duplicates
Each iteration produces a list of duplicate groups, based on the criteria you set up. And within each such duplicate group, you would need to
identify the so-called master record
- the one that will be retained. After the merge is carried out:
- The master record will be retained, and any data from the non-master records will be merged into the master.
- The non-master records will be deleted.
- Any child records of the non-master records will be re-parented to become children of the master record.
Accessing the Dedupe tool
The single-table dedupe function is available via the Tools menu in every object within Apsona. For instance, to deduplicate Contact records, click
the Contacts menu in Apsona, then click Tools - Dedupe/Match - Dedupe.
The list of available "Dedupe Actions" is then displayed. Each dedupe action contains all the parameters needed to specify how the action works, such as
the names and match criteria for the fields being matched, the additional fields to show, and the filter conditions to use for the records being
deduplicated. At this point, you can click the New button to create a new Dedupe Action, or click one of the actions in the list to work with that action.
Editing a Dedupe Action
When creating or editing a Dedupe Action, you specify four groups of parameters:
- a filter condition that limits the range of records to be deduplicated;
- the match fields that should be used to match duplicate records;
- the additional fields that you wish to see alongside the match fields; and
- the master rules that the tool should use to identify the master record in each duplicate group.
Each of these groups appears in its own tab in the Dedupe Action Editor.
The Deduped Records tab
In this tab, you set the "record source" - the means of determining the records which are subject to deduplication. You have two choices, either a
filter or a CSV file, and you can choose via the two radio buttons in this tab (see screen shot below).
Filter as record source
If you select the "Filtered list of records" button, you are asserting that you want to deduplicate only the records meeting the filter you set up, and
other records will not even be considered for deduplication. Such filtering is useful to improve performance. For example, suppose that a list
of Leads were added recently, all of whose addresses are in California, and you therefore don't wish to consider non-California Lead records. In this case, you might set a filter that
says "Mailing State is California." The dedupe tool will restrict its scope to just the records in
this filter, i.e., records that don't meet the filter conditions will not be considered for deduplication.
You can set filter criteria in much the same way filters are set everywhere else in Apsona. See
for details about the filtering feature.
Using this filter enables you to speed up the
deduplication process. But if you really want to include every record in your object, then simply set the filter criterion to "Created Date is not empty".
CSV file as record source
There are some situations where you have identified, using other methods external to this tool, the list of records to be deduplicated. Specifically,
you would have available a list of record IDs of master and non-master records, in the form of a CSV file. In such a situation, you do not need the
dedupe tool to identify the duplicate records, you only need it to merge the records. In such a situation, you would select the "CSV file" button for
the record source. When you do so, you click the file selector button that appears, and upload a CSV file containing the record IDs.
The CSV file you provide must contain two columns: The first column contains the master record ID and the second the non-master record ID. If you
have a group of duplicates with more than one non-master, simply repeat the row with the same master record ID, and the other non-master ID. In the
example screen shot below, notice that row 2 contains a master and non-master record ID, depicting a group of two duplicate records, one of which is master. Rows 3, 4 and 5
have the same master record ID but different non-master IDs, so they represent a group of four duplicate records, one of which (the one in the leftmost
column) is master.
It's worth pointing out that if you select the CSV file input, you have already identified the master records, so there is no need for the Match Fields
and Group Selection tabs. Those tabs will therefore only appear if you selected the Filter as record source; they will not appear if you select the CSV
record source. Also, in the Master & Field Rules tab (see below), the Master Rules section will only appear with the Filter as record source, but
not with the CSV file as record source.
Changing the record source
You can set the record source only when creating a new dedupe action. Once you have saved the action, you can no longer change its record source: only
the button corresponding to the action's record source will be enabled, and the other button will be disabled.
When using a CSV record source, if you find that no records (or fewer than expected) appear in the deduplication results grid below, your first step
would be to ascertain that the record IDs in your CSV data are in fact available and valid record IDs in your Salesforce object. To do this, you can use
Apsona's search feature that matches record IDs, described in this article
The Match Fields tab
This tab lets you select the fields in your object based on which you wish to determine duplicates. For example, when deduplicating Contact records, you might select
the First Name, Last Name and Email Address fields to indicate that two contact records that match on those three fields should be considered duplicates.
To select a match field, click the check box next to it in the Available fields panel on the left of the tab. Doing so will produce a row in the
Selected fields area on the right. You can then select the match criteria for the field, including the match function, the contraction, and whether word
order should be ignored. Match criteria are described later in this article.
The Additional Fields tab
Once you have selected some match fields, you would also want to inspect some additional fields from each identified duplicate record so that you can
decide which of the duplicates you wish to keep. To this end, you select one or more fields in the Additional Fields tab, and the values of those fields
will appear in the deduplication results, as described below.
The additional fields tab lets you select fields from the current object as well as parent objects. For example, the screen animation below shows the
Contact object, which (in this example org) includes two lookups to the Account object, one named Account and the other named Primary Affiliation. So
both lookups are available in the selector. To select an additional field, click the lookup and object you wish to use, and then click the field within
that object. The field now appears in the list on the right. (Click the image to replay the animation.)
The Group Selection tab
This tab lets you control the groups of records selected for deduplication, based on at least one record in each group being filtered in some way. For
instance, as in the screen shot below, you can require that at least one record in each group being considered must have a Lead Source of Partner
Notice that this setting is different from the filter setting in the Deduped Records tab, in that the filter setting limits the entire range of
records being deduped. For example, if you set a filter term requiring Lead Source to be Partner Referral, then any record that doesn't have such a Lead
Source will be omitted altogether. In other words, a filter setting of "Lead Source equals Partner Referral" in the Deduped Records tab means that none
of the records with other Lead Source values will even be considered for deduplication; but the same setting in the Group Selection tab means only that
at least one record in each group must have that Lead Source value.
This distinction is useful in cases where, for example, you have some old Leads that have a Lead Source of "Partner Referral", which you regard as your
original source of truth; and you want to deduplicate any new Leads against those existing old Leads. In such a case, you would use the Group Selection
setting rather than the filter in the Deduped Records tab.
The Master Rules tab
Within each group of duplicates that is found, you would need to identify one record as the master record
– the one record from
the group that will be retained after merging duplicates. The data from the other "non-master" records in that group will be merged into the master, and the non-master records will be
deleted. (They will still be available in the recycle bin, until you empty the recycle bin.) The Master Rules tab lets you set up a few rules
via which the dedupe tool automatically identifies a default master record in each duplicate group it finds. This way, you do not need to inspect every single
duplicate group to identify a master record in it.
The Master Rules tab, when empty, appears as in the screen shot below. It contains sections for master rule identification, tie breaking, field
overrides, and field combining.
(Note that the master rules section will not appear if you have selected a CSV file as record source, as noted above.)
Managing master rules
To specify a new master rule, click the "Add rule" button in the master record section, as shown in the animation below. This displays the Add rule
popup. Here, you can select a particular field, associate an operator and optional value with it, and specify a score. And the master record in each
group is the one with the highest score. Ties are broken according to what you specify in the "Break ties in favor of.." choice. Here are some examples,
as in the animation below:
- Select the Contact's Birthdate field, select "is newest" as operator, and give it a score of 1. The rule then stipulates that in each group of
duplicates, the record with the highest value of the Birthdate field receives a score of 1.
- Select the Contact's Lead Source field, select "equals" as operator and "Purchased List", and give it a score of 1. This rule gives additional weight to records
that have "Purchased List" as Lead Source value in them.
You can remove a rule by clicking the red x button at the left of the rule. (Click the image below to replay the animation.)
In some cases, you might find that your scoring rules produce an ambiguity, in that there is more than one master record with the same score. For
example, your rules might yield a duplicate group containing 5 records, three of which have a score of 10, and two have a score of 15. The latter two
would both qualify to be master, and the dedupe tool would have to use the tie-breaking rule to determine which one is master. And in such situations,
it is sometimes better to simply not merge the group with such an ambiguity, and then make another pass with a more refined set of master rules, perhaps
in a separate merge action.
To cater to situations like this, there is the "Master record score must be unique" checkbox.
If you check this box, the tool will avoid identifying master records for any groups that have the above ambiguity. So you won't see the master row
checked in such groups. You can proceed with clicking the Merge All button, which will only merge the groups for which master records have been
identified. (Note, though, that if you can still manually identify a master in such a group by manually clicking a row in it.)
As noted above, the master record is the one that is retained after deduplication, and the non-masters are deleted. In some situations, you would want
to retain some of the data in the non-master records, by carrying that data into the retained master record. For example:
- You might want the Lead Source field to always contain the value from the oldest record in the group (so that you can retain the very first source
where you got the Contact).
- You might want the Email field to always be from the most recently modified record, so that you can retain the Contact's most current email
To delete a field override, you simply click the red x button at the left of the rule.
The animation below shows the above two override rules being added. (Click the image to replay the animation.)
In addition, you can check the "Overwrite nulls in master" to ensure that, if there is a field that is empty in the master but filled in a
non-master, the non-master value is copied over to the master during deduplication. This ensures that you don't lose data.
When you merge the data from a non-master into a master record, the non-master is deleted. But you might want to retain some of the data from the
deleted non-masters by combining them into corresponding master record fields. The "Combine fields" panel enables this feature.
The "Combine fields" panel shows a list of all the fields of type text, rich text (HTML), multi-picklist or checkbox. If you select a field in this list, the data
for that field found in the non-master records will be combined into the master record's field as follows:
- Text fields: The contents of each selected non-master text field will be concatenated into the corresponding master text field, separated by line breaks.
- Rich text fields: The contents of each selected non-master rich text field will be concatenated into the corresponding master text field, separated by HTML
<br/> tags (HTML line breaks).
- Multi-picklist fields: If a multi-picklist value is checked in one of the non-masters, it becomes checked in the master. Thus the master
ends up containing the "union" of all the picklist choices in the master and the non-masters.
- Check boxes: If the field is checked in one of the non-masters, it becomes checked in the master. Thus the master field ends up with the OR
value of that field across the master and non-master records.
After setting up a Dedupe Action as described above, click the Find button. This extracts the data records from Salesforce, applies the parameters in
the Dedupe Action, and displays a grid below, showing the duplicates. You can also click the Save and Find button to save the Dedupe Action
so that it appears in the actions list.
The Deduplication Results list
The deduplication results are shown in a list of groups, with each group captioned with the "signature" of the group (more about signatures below) and the number of records in the
group. At the top left of the list is shown the total number of duplicates and the number of groups. At the left of each row, there is an icon to open
the record in Salesforce, and a score indicator showing the records score under the current master rules in the merge action.
To select a master record in the group, simply click the row for the master, and the row will appear in bold, with a check mark at left. If you
want to un-select it, simply click the row again. You can select at most one row in a group as master.
When you hover your mouse over the caption row for the group, two buttons appear in the row, for removing the group and for custom merge. Clicking the
remove button simply removes the entire group from consideration for merging, but has no effect on the data records. By clicking that button, you're
telling the dedupe tool to ignore that entire group.
Exporting deduplication results
The Export button at the top of the results list includes two options: "Grid data" and "All data".
Both options produce CSV files as results. The Grid data option produces a CSV file containing the duplicate records with signature information, which
can be used by a Domain Expert in your organization to vet the correctness of the duplicate information. The "All data" option produces a full backup of
all the records in the result list, including all the fields of each record. This can be useful as a safeguard in case of errors during the merge, so
that you can restore your data to the state before the merge.
A "custom merge" is a one-off merge of a single group of duplicates. To perform a custom merge, you first select a master click the custom merge button within the duplicate
group caption. This produces a popup such as the one below.
The leftmost column in the popup shows the modifiable fields in the object. (Non-modifiable fields aren't shown, since you can't merge data into them
anyway.) The subsequent columns show the data in the master and non-master records of
the group, in that order. Each row in the popup corresponds to a single field of the object. Within each row, if there is a discrepancy between the
master's data and the data in any of the non-master records, the master cell is highlighted in green, indicating the value which will be used for the
merge. If you want one of the non-master values to be used for a particular field, simply click its value, and it will be highlighted.
After having selected the values you want to merge, click the Merge button. This will merge the data in the non-master records into the master,
and delete the non-master records. Also, the group you merged will be removed from the results list.
You can also merge, in one click, all the groups for which master records have been identified. To do this, click the Merge All button at the top of the
list. This produces the popup shown below.
Notice that this popup will merge only the groups for which master records have been selected. If you have set up masters rule for the Dedupe Action,
the results list will automatically have a master in each group. But you can still de-select masters on a per-group basis before performing a mass
Using "small" batches
There is a Salesforce API limitation that requires merges to be done in batches of at most 200 groups. So the Apsona Dedupe add-on breaks down your record groups
into such batches, so that you as an end-user are not affected by such a limit. Normally, the Dedupe add-on code tries to use 200 as the batch
size. But there are cases where, for example, a back-end trigger has not been properly bulkified and is unable
to tolerate large batches, so you might see error messages with merges failing because of trigger errors. For such situations, you might try checking
the "Use small batches" box. Doing so will reduce Apsona's batch size from 200 to 5, and that is usually adequate to circumvent such trigger
issues. Using smaller batch size can, however, result in more API calls being used.
When merging duplicates in the Account, Contact or Lead object, the add-on uses
Salesforce's native API for merging records
. But for merging other objects, it uses a custom
merging algorithm, because there is no Salesforce native API available for other objects. Some consequences of this approach should be pointed out:
During the merge process, child records of non-master records must be automatically re-parented to the new master record. For instance, when merging
Campaign records, any Campaign Member child records of non-master Campaigns must become children of the new master record. But if the child is part
of a master-detail relationship, we cannot simply set the lookup field in the child record to point to the new master, since the lookup field in a
master-detail relationship is not modifiable. So the algorithm copies the child record, makes the copy become a child of the new master, and deletes the
old child record. This means that the Created and Last Modified Date fields of the child records cannot be carried over from the old child
Additionally, some errors can occur when reparenting child records, usually from business logic implemented via triggers or validation rules. For
instance, when merging Campaign records, you might see errors saying something like "the Contact or Lead is already a member of the Campaign." This
can happen because, before the merge, the same Contact was a member of both the master and a non-master Campaign, so the merge causes more than one
Campaign Member record for the master with the same Contact. Such errors are reported in CSV file available for download when the merge is complete.
When finding duplicates, match functions let you decide what it means for two field values to "match." For example, the "Exact" match function requires
its two fields to match exactly, while the "Case-insensitive" match function treats two values as the same if the differ only in upper or lower
case. There is a range of match functions available for use in different situations, and we describe them here.
are available and can be applied in addition to match functions. Contractions are of two types:
- Alphanumerics only, which removes all characters other than alphabets, numerals, the & (ampersand) sign and spaces. This contraction is
useful where, for example, you wish to catch duplicates which differ only in punctuation characters, e.g., you want to treat
100 Main St.,
Apt. 100 as a duplicate of
100 Main St Apt 100.
- Fuzzy matching, which essentially treats two values as the same if they sound the same when pronounced. This contraction uses the
well-known metaphone algorithm
to decide whether two values match.
Additionally, you can check the "Word Order Ignored" box if you want two values to be treated the same if they differ only in word order, e.g., you
wish to treat
County of Santa Clara
as a duplicate of
Santa Clara County
||Allow only exact matches.
||Allow matches only if the two values differ only in case (i.e., in upper- or lower-case).
||Treat the values as street addresses, in which commonly-used address abbreviations are treated as equivalent. This function also ignores
case (upper or lower case) of the values. For example, these two values are treated as the same:
100 Main Street, Apartment 100
100 Main St Apt 100
Below is the list of words treated
as equivalent to their corresponding abbreviations.
|Street address - relaxed
||Similar to Street address, except that two values that differ by the presence or absence of one of the aabbreviations are also treated as equivalent. For example, unlike the plain "Street address" function, the relaxed version treats these values as matching:
1002 El Camino Avenue
1002 El Camino Ave.
1002 El Camino
|Zip code - 5 or 9 digit
||Match the two values if they differ only in that one is a 5-digit number and the other is a 9-digit number, ignoring spaces and dashes.
||Treat the values as company names, in which commonly-used company-related words are treated as equivalent. This function also ignores
case (upper or lower case) of the values. For example, these values are treated as the same:
Below is the list of words treated as equivalent to their corresponding abbreviations.
|Company name - relaxed
||Similar to Company name, except that two values that differ by the presence or absence of one of the above words are treated as the same. For example, these values are treated as matching:
Acme Financial Corporation
|First 2 words
||Treat the values as matching if their first two words match, ignoring case.
|First 3 words
||Treat the values as matching if their first three words match, ignoring case.
|First 4 words
||Treat the values as matching if their first four words match, ignoring case.
||Treat the values as matching if their first letter matches, ignoring case. This can be useful when, for example, you want to use
the first letter of a Contact's first name along with the last name as the match key, thereby treating
John Smith and
J. Smith as the same person.
|First 2 letters
||Treat the values as matching if their first two letters match, ignoring case.
|First 3 letters
||Treat the values as matching if their first three letters match, ignoring case.
||Treat the values as matching if they refer to the same North-American nickname, e.g., Bob is treated as equivalent to Robert.
|North American Phone
||Allow matches only on 10-digit numbers, ignoring punctuation characters.
|Salesforce Record ID
||Allow matches if both the values are 15-character or 18-character alphanumeric values.
||Match if the values have the same top-level email domain. For example,
email@example.com will all match because they all share the same top-level email domain
||Allow matches if both the values look like the same website URL, ignoring the
Matching CSV data
This function can match an existing input CSV data file against the records in any Salesforce object, where "matching" involves using the match functions
and features available in the above Single Table Dedupe functionality. The result of such a match is a new CSV file that contains all the rows and columns
of the input file, along with data from the records that were matched.
As a simple example, suppose you have a CSV file that looks like this:
|First name||Last name|
Suppose further that you have two records similar to Bob Smith in your Contact object, and none matching Dave Jones.
By matching this data against your Contact object's records using the Match CSV function, you would be able to produce a result like this:
|First name||Last name||Contact.Contact ID||Contact.Full Name||Contact.Account Name|
|Bob||Smith||003000000000012345||Robert A. Smith||Acme Drilling Corporation|
|Bob||Smith||00300000000001a345||Rob Smith||Alpha Pneumatics Inc.|
In the above list that shows the result of the match, the two records matching Bob Smith are both available. The original row for Dave Jones is also
available but has no associated data, thus indicating no match.
The matching process
To begin the process, navigate to either the console view ("All records") or the Tabular View for your object in Apsona, and access the Tools menu.
- Click the "Match CSV..." menu item under the Dedupe/Match menu under Tools. (Once you have a license available for the Dedupe/Match add-on, this menu item is
available for all objects, both native and custom.)
The Match CSV popup appears. The example below illustrates the process for the Contact object, but the process is similar for other objects as well.
- Click the "Choose File" button and select a CSV file (from your computer) that you wish to use for matching.
The popup then shows a preview of the CSV file, with just the first four records, so that you can decide whether that is the correct CSV file you want to work with.
- Click the Next button. This produces Step 2 of the popup.
In this step, select the object's fields you want to match against each CSV column. (You do not need to match all the CSV columns, just the ones that you
want to use. Any unmatched column data will be retained in the output.) Also select any necessary match functions, contractions or "Word Order Ignored"
options you might need. These aspects work in the same manner as described above with the dedupe function.
- Click Next to produce Step 3 of the popup. Select any fields you want to include in the output file.
- You can include fields from the current object as well as parent objects. In the example of the above screen shot, fields from the Account and
the Primary Affiliation Account (available from a second lookup on the Contact record) are available, among others.
- You can include a field even if it is used as a match field in step 2. This is sometimes needed, e.g., when you are doing a fuzzy match on
First Name, you want the actual matched First Name to be available in the output.
- Click Next. This produces the last step of the popup, in which you set filter criteria to restrict the records against which to match your CSV
data. If you don't restrict the records in any way (e.g., you set a filter that says Created Date exists, which would be true for all records), you
would match against every record in your object, and that can cause slow-down if the object contains lots of records. So the filtering mechanism is
just a check to limit the range against which to match.
- Click Finish. This runs the Match CSV action and produces a "Download results" link. Clicking that link produces a CSV file like the one below.
- The first column of the file contains the "signature" that was used to match the CSV data against Salesforce records. As with deduplication, two
records with the same signature are deemed matching records. In the above example, the first two records matched "Katherine Smith" and "Kathy
Smith" which both turned out to be the same contact, so the Contact ID in column E is the same
- If the same CSV row matches more than one data row based on the match criteria you selected, then the field values from all the matching data
rows will be included in the CSV output. In the above example, notice that rows 8 and 9 have the same first and last name, but match two different
contacts. So the CSV rows are repeated.
- The second column indicates the number of matches found. This rows 8 and 9 show a count of 2, as do rows 6 and 7.