Contact us Blog

The Dedupe and Match Add-on

The Dedupe and Match add-on enables you to Both these functions include powerful matching criteria such as fuzzy matching, company name or address reductions, and first name matching.

Installation

To install the add-on into your Salesforce org:

  1. Ensure that you have Apsona for Salesforce already installed. If not, please install it from our AppExchange listing.
  2. Visit our AppExchange listing for the Dedupe and Match add-on, click the Get it Now button, and install it into your org.
  3. Click Setup - Installed Packages - Manage Licenses - Apsona Dedupe and Match, and assign licenses to your users.
  4. Click the Apsona tab in your org, and within Apsona, click Settings - Clear cache so that the newly-installed add-on becomes available.

Single-table deduplication

The deduplication process

The general process followed when deduplicating records in a particular object (e.g., the Contact object) is an iterative one. In each iteration, you would set up some dedupe parameters, find the duplicates based on those parameters, and merge them. You would then revise the dedupe parameters to catch more duplicates, and merge the result. For example, as a first iteration, you might identify all Contact records that contain the exact same name and email fields, and merge those together. In the next iteration, you might loosen the First Name match function to match on loose First Names (e.g., treat Bob and Robert as the same), and then find duplicates and merge them. The process is iterated with progressively looser match criteria until very few duplicates remain.

Each iteration produces a list of duplicate groups, based on the criteria you set up. And within each such duplicate group, you would need to identify the so-called master record - the one that will be retained. After the merge is carried out:

Accessing the Dedupe tool

The single-table dedupe function is available via the Tools menu in every object within Apsona. For instance, to deduplicate Contact records, click the Contacts menu in Apsona, then click Tools - Dedupe/Match - Dedupe.

The list of available "Dedupe Actions" is then displayed. Each dedupe action contains all the parameters needed to specify how the action works, such as the names and match criteria for the fields being matched, the additional fields to show, and the filter conditions to use for the records being deduplicated. At this point, you can click the New button to create a new Dedupe Action, or click one of the actions in the list to work with that action.

Editing a Dedupe Action

When creating or editing a Dedupe Action, you specify four groups of parameters: Each of these groups appears in its own tab in the Dedupe Action Editor.

The Deduped Records tab

In this tab, you set the "record source" - the means of determining the records which are subject to deduplication. You have two choices, either a filter or a CSV file, and you can choose via the two radio buttons in this tab (see screen shot below).

Filter as record source

If you select the "Filtered list of records" button, you are asserting that you want to deduplicate only the records meeting the filter you set up, and other records will not even be considered for deduplication. Such filtering is useful to improve performance. For example, suppose that a list of Leads were added recently, all of whose addresses are in California, and you therefore don't wish to consider non-California Lead records. In this case, you might set a filter that says "Mailing State is California." The dedupe tool will restrict its scope to just the records in this filter, i.e., records that don't meet the filter conditions will not be considered for deduplication.

You can set filter criteria in much the same way filters are set everywhere else in Apsona. See this article for details about the filtering feature.

Using this filter enables you to speed up the deduplication process. But if you really want to include every record in your object, then simply set the filter criterion to "Created Date is not empty".

CSV file as record source

There are some situations where you have identified, using other methods external to this tool, the list of records to be deduplicated. Specifically, you would have available a list of record IDs of master and non-master records, in the form of a CSV file. In such a situation, you do not need the dedupe tool to identify the duplicate records, you only need it to merge the records. In such a situation, you would select the "CSV file" button for the record source. When you do so, you click the file selector button that appears, and upload a CSV file containing the record IDs.

The CSV file you provide must contain two columns: The first column contains the master record ID and the second the non-master record ID. If you have a group of duplicates with more than one non-master, simply repeat the row with the same master record ID, and the other non-master ID. In the example screen shot below, notice that row 2 contains a master and non-master record ID, depicting a group of two duplicate records, one of which is master. Rows 3, 4 and 5 have the same master record ID but different non-master IDs, so they represent a group of four duplicate records, one of which (the one in the leftmost column) is master.

Available tabs

It's worth pointing out that if you select the CSV file input, you have already identified the master records, so there is no need for the Match Fields and Group Selection tabs. Those tabs will therefore only appear if you selected the Filter as record source; they will not appear if you select the CSV record source. Also, in the Master & Field Rules tab (see below), the Master Rules section will only appear with the Filter as record source, but not with the CSV file as record source.

Changing the record source

You can set the record source only when creating a new dedupe action. Once you have saved the action, you can no longer change its record source: only the button corresponding to the action's record source will be enabled, and the other button will be disabled.

The Match Fields tab

This tab lets you select the fields in your object based on which you wish to determine duplicates. For example, when deduplicating Contact records, you might select the First Name, Last Name and Email Address fields to indicate that two contact records that match on those three fields should be considered duplicates. To select a match field, click the check box next to it in the Available fields panel on the left of the tab. Doing so will produce a row in the Selected fields area on the right. You can then select the match criteria for the field, including the match function, the contraction, and whether word order should be ignored. Match criteria are described later in this article.

The Additional Fields tab

Once you have selected some match fields, you would also want to inspect some additional fields from each identified duplicate record so that you can decide which of the duplicates you wish to keep. To this end, you select one or more fields in the Additional Fields tab, and the values of those fields will appear in the deduplication results, as described below.

The additional fields tab lets you select fields from the current object as well as parent objects. For example, the screen animation below shows the Contact object, which (in this example org) includes two lookups to the Account object, one named Account and the other named Primary Affiliation. So both lookups are available in the selector. To select an additional field, click the lookup and object you wish to use, and then click the field within that object. The field now appears in the list on the right. (Click the image to replay the animation.)

The Group Selection tab

This tab lets you control the groups of records selected for deduplication, based on at least one record in each group being filtered in some way. For instance, as in the screen shot below, you can require that at least one record in each group being considered must have a Lead Source of Partner Referral.

Notice that this setting is different from the filter setting in the Deduped Records tab, in that the filter setting limits the entire range of records being deduped. For example, if you set a filter term requiring Lead Source to be Partner Referral, then any record that doesn't have such a Lead Source will be omitted altogether. In other words, a filter setting of "Lead Source equals Partner Referral" in the Deduped Records tab means that none of the records with other Lead Source values will even be considered for deduplication; but the same setting in the Group Selection tab means only that at least one record in each group must have that Lead Source value.

This distinction is useful in cases where, for example, you have some old Leads that have a Lead Source of "Partner Referral", which you regard as your original source of truth; and you want to deduplicate any new Leads against those existing old Leads. In such a case, you would use the Group Selection setting rather than the filter in the Deduped Records tab.

The Master Rules tab

Within each group of duplicates that is found, you would need to identify one record as the master record – the one record from the group that will be retained after merging duplicates. The data from the other "non-master" records in that group will be merged into the master, and the non-master records will be deleted. (They will still be available in the recycle bin, until you empty the recycle bin.) The Master Rules tab lets you set up a few rules via which the dedupe tool automatically identifies a default master record in each duplicate group it finds. This way, you do not need to inspect every single duplicate group to identify a master record in it.

The Master Rules tab, when empty, appears as in the screen shot below. It contains two sections, one for rules to identify the master record, and the second for specifying overrides for particular field values. (Note that the master rules section will not appear if you have selected a CSV file as record source, as noted above.)

Managing master rules

To specify a new master rule, click the "Add rule" button in the master record section, as shown in the animation below. This displays the Add rule popup. Here, you can select a particular field, associate an operator and optional value with it, and specify a score. And the master record in each group is the one with the highest score. Ties are broken according to what you specify in the "Breka ties in favor of.." choice. Here are some examples, as in the animation below: You can remove a rule by clicking the red x button at the left of the rule. (Click the image below to replay the animation.)

Field-level overrides

As noted above, the master record is the one that is retained after deduplication, and the non-masters are deleted. In some situations, you would want to retain some of the data in the non-master records, by carrying that data into the retained master record. For example:

To delete a field override, you simply click the red x button at the left of the rule.

The animation below shows the above two override rules being added. (Click the image to replay the animation.)

In addition, you can check the "Overwrite nulls in master" to ensure that, if there is a field that is empty in the master but filled in a non-master, the non-master value is copied over to the master during deduplication. This ensures that you don't lose data.

Finding duplicates

After setting up a Dedupe Action as described above, click the Find button. This extracts the data records from Salesforce, applies the parameters in the Dedupe Action, and displays a grid below, showing the duplicates. You can also click the Save and Find button to save the Dedupe Action so that it appears in the actions list.

The Deduplication Results list

The deduplication results are shown in a list of groups, with each group captioned with the "signature" of the group (more about signatures below) and the number of records in the group. At the top left of the list is shown the total number of duplicates and the number of groups. At the left of each row, there is an icon to open the record in Salesforce, and a score indicator showing the records score under the current master rules in the merge action.

To select a master record in the group, simply click the row for the master, and the row will appear in bold, with a check mark at left. If you want to un-select it, simply click the row again. You can select at most one row in a group as master.

When you hover your mouse over the caption row for the group, two buttons appear in the row, for removing the group and for custom merge. Clicking the remove button simply removes the entire group from consideration for merging, but has no effect on the data records. By clicking that button, you're telling the dedupe tool to ignore that entire group.

Exporting deduplication results

The Export button at the top of the results list includes two options: "Grid data" and "All data". Both options produce CSV files as results. The Grid data option produces a CSV file containing the duplicate records with signature information, which can be used by a Domain Expert in your organization to vet the correctness of the duplicate information. The "All data" option produces a full backup of all the records in the result list, including all the fields of each record. This can be useful as a safeguard in case of errors during the merge, so that you can restore your data to the state before the merge.

Custom merge

A "custom merge" is a one-off merge of a single group of duplicates. To perform a custom merge, you first select a master click the custom merge button within the duplicate group caption. This produces a popup such as the one below. The leftmost column in the popup shows the modifiable fields in the object. (Non-modifiable fields aren't shown, since you can't merge data into them anyway.) The subsequent columns show the data in the master and non-master records of the group, in that order. Each row in the popup corresponds to a single field of the object. Within each row, if there is a discrepancy between the master's data and the data in any of the non-master records, the master cell is highlighted in green, indicating the value which will be used for the merge. If you want one of the non-master values to be used for a particular field, simply click its value, and it will be highlighted.

After having selected the values you want to merge, click the Merge button. This will merge the data in the non-master records into the master, and delete the non-master records. Also, the group you merged will be removed from the results list.

Mass merge

You can also merge, in one click, all the groups for which master records have been identified. To do this, click the Merge All button at the top of the list. This produces the popup shown below. Notice that this popup will merge only the groups for which master records have been selected. If you have set up masters rule for the Dedupe Action, the results list will automatically have a master in each group. But you can still de-select masters on a per-group basis before performing a mass merge.

Match functions

When finding duplicates, match functions let you decide what it means for two field values to "match." For example, the "Exact" match function requires its two fields to match exactly, while the "Case-insensitive" match function treats two values as the same if the differ only in upper or lower case. There is a range of match functions available for use in different situations, and we describe them here.

Contractions are available and can be applied in addition to match functions. Contractions are of two types: Additionally, you can check the "Word Order Ignored" box if you want two values to be treated the same if they differ only in word order, e.g., you wish to treat County of Santa Clara as a duplicate of Santa Clara County.
Function Description
Exact Allow only exact matches.
Case-insensitive Allow matches only if the two values differ only in case (i.e., in upper- or lower-case).
Street address Treat the values as street addresses, in which commonly-used address abbreviations are treated as equivalent. This function also ignores case (upper or lower case) of the values. For example, these two values are treated as the same:
                100 Main Street, Apartment 100
                100 Main St Apt 100
              

Below is the list of words treated as equivalent to their corresponding abbreviations.
apartmentapt
avenueav
boulevardblvd
circlecir
courtct
highwayhwy
laneln
placepl
streetst
northn
souths
easte
westw
northeastne
southeastse
northwestnw
southwestsw
suiteste
Street address - relaxed Similar to Street address, except that two values that differ by the presence or absence of one of the aabbreviations are also treated as equivalent. For example, unlike the plain "Street address" function, the relaxed version treats these values as matching:
                1002 El Camino Avenue
                1002 El Camino Ave.
                1002 El Camino
              
Zip code - 5 or 9 digit Match the two values if they differ only in that one is a 5-digit number and the other is a 9-digit number, ignoring spaces and dashes.
Company name Treat the values as company names, in which commonly-used company-related words are treated as equivalent. This function also ignores case (upper or lower case) of the values. For example, these values are treated as the same:
                Acme Corporation
                acme corp.
                Acme corp
              

Below is the list of words treated as equivalent to their corresponding abbreviations.
&and
incorporatedinc
hospitalhosp
corporationcorp
llcllc
limitedltd
financialfin
companyco
servicesvc
servicessvc
Company name - relaxed Similar to Company name, except that two values that differ by the presence or absence of one of the above words are treated as the same. For example, these values are treated as matching:
                Acme Financial Corporation
                acme fin
                acme corp
              
First 2 words Treat the values as matching if their first two words match, ignoring case.
First 3 words Treat the values as matching if their first three words match, ignoring case.
First 4 words Treat the values as matching if their first four words match, ignoring case.
First letter Treat the values as matching if their first letter matches, ignoring case. This can be useful when, for example, you want to use the first letter of a Contact's first name along with the last name as the match key, thereby treating John Smith and J. Smith as the same person.
First 2 letters Treat the values as matching if their first two letters match, ignoring case.
First 3 letters Treat the values as matching if their first three letters match, ignoring case.
First name Treat the values as matching if they refer to the same North-American nickname, e.g., Bob is treated as equivalent to Robert.
North American Phone Allow matches only on 10-digit numbers, ignoring punctuation characters.
Salesforce Record ID Allow matches if both the values are 15-character or 18-character alphanumeric values.

Matching CSV data

This function can match an existing input CSV data file against the records in any Salesforce object, where "matching" involves using the match functions and features available in the above Single Table Dedupe functionality. The result of such a match is a new CSV file that contains all the rows and columns of the input file, along with data from the records that were matched.

As a simple example, suppose you have a CSV file that looks like this:
First nameLast name
BobSmith
DaveJones
Suppose further that you have two records similar to Bob Smith in your Contact object, and none matching Dave Jones. By matching this data against your Contact object's records using the Match CSV function, you would be able to produce a result like this:
First nameLast nameContact.Contact IDContact.Full NameContact.Account Name
BobSmith003000000000012345Robert A. SmithAcme Drilling Corporation
BobSmith00300000000001a345Rob SmithAlpha Pneumatics Inc.
DaveJones
In the above list that shows the result of the match, the two records matching Bob Smith are both available. The original row for Dave Jones is also available but has no associated data, thus indicating no match.

The matching process

To begin the process, navigate to either the console view ("All records") or the Tabular View for your object in Apsona, and access the Tools menu.
  1. Click the "Match CSV..." menu item under the Dedupe/Match menu under Tools. (Once you have a license available for the Dedupe/Match add-on, this menu item is available for all objects, both native and custom.) The Match CSV popup appears. The example below illustrates the process for the Contact object, but the process is similar for other objects as well.
  2. Click the "Choose File" button and select a CSV file (from your computer) that you wish to use for matching. The popup then shows a preview of the CSV file, with just the first four records, so that you can decide whether that is the correct CSV file you want to work with.
  3. Click the Next button. This produces Step 2 of the popup. In this step, select the object's fields you want to match against each CSV column. (You do not need to match all the CSV columns, just the ones that you want to use. Any unmatched column data will be retained in the output.) Also select any necessary match functions, contractions or "Word Order Ignored" options you might need. These aspects work in the same manner as described above with the dedupe function.
  4. Click Next to produce Step 3 of the popup. Select any fields you want to include in the output file. Notice that:
    • You can include fields from the current object as well as parent objects. In the example of the above screen shot, fields from the Account and the Primary Affiliation Account (available from a second lookup on the Contact record) are available, among others.
    • You can include a field even if it is used as a match field in step 2. This is sometimes needed, e.g., when you are doing a fuzzy match on First Name, you want the actual matched First Name to be available in the output.
  5. Click Next. This produces the last step of the popup, in which you set filter criteria to restrict the records against which to match your CSV data. If you don't restrict the records in any way (e.g., you set a filter that says Created Date exists, which would be true for all records), you would match against every record in your object, and that can cause slow-down if the object contains lots of records. So the filtering mechanism is just a check to limit the range against which to match.
  6. Click Finish. This runs the Match CSV action and produces a "Download results" link. Clicking that link produces a CSV file like the one below.
    • The first column of the file contains the "signature" that was used to match the CSV data against Salesforce records. As with deduplication, two records with the same signature are deemed matching records. In the above example, the first two records matched "Katherine Smith" and "Kathy Smith" which both turned out to be the same contact, so the Contact ID in column E is the same
    • If the same CSV row matches more than one data row based on the match criteria you selected, then the field values from all the matching data rows will be included in the CSV output. In the above example, notice that rows 8 and 9 have the same first and last name, but match two different contacts. So the CSV rows are repeated.
    • The second column indicates the number of matches found. This rows 8 and 9 show a count of 2, as do rows 6 and 7.