Matching similar string values
The Fuzzy Matching action allows you to compare string values from one table with string values from a lookup table to match similar values. An output table is returned containing likely matches, along with match scores from 0 to 1. You can:
- Define the matching logic to be used by selecting one of the three Fuzzy Matching Algorithms -
- Jaro-Winkler measures similarity between two strings, and is best suited for shorter strings such as city or country names
- Smith-Waterman compares segments of strings; for example, bc vs. abcde is a perfect match score of 1, but cb vs. abcde is a match score of 0.5
- Levenshtein is best to use for longer strings, as the formula for match score is 1 - #Single Character Edits/Length of longer string
- Overlap is effective across a variety of data characteristics; it breaks strings into all possible 3-character segments and then produces a score between 0-1 based on the formula -
#matching segments / minimum number of segments of the two strings being compared
- Define pass-through fields to be returned as additional columns in the output table
- Adjust the fuzzy matching options to change the number of matches written to the output table, or alter the logic for case sensitivity and special characters
On the Configuration tab:
-
Use Input Fields to identify fields from the input table to use for pass-through and matching.
-
Drag fields from Input Fields to define pass-through fields.
-
Drag fields from Input Fields to define the field to match.
-
Use Lookup Fields to identify fields from the lookup table to use for pass-through and matching.
-
Drag fields from Lookup Fields to define pass-through fields.
-
Drag fields from Lookup Fields to define the field to match.
-
Select matching options.
-
Select the algorithm to be used for matching.
Possible applications include:
- Find a match for missing/invalid string values in a “master” table
- Compare two lists of string values to find those values without a match in the other list
- Find matches for values that may be a result of user error or some other type of error
Use fuzzy matching to compare similar strings
- Drag the Fuzzy Matching icon onto the design surface.
- On the Connections tab
, enter a Name and a Description to identify the action, then choose input tables and output options for the record matching.
Input -
- For Database Connection, select the database from which the tables originate, or choose New Database Connection to establish a new connection to another database.
- For Input Table Name, select the table containing the candidate records.
- For Lookup Table Name, select the table containing the master records.
Output -
- For Database Connection, select the database to which the output is written, or choose New Database Connection to establish a new connection to another database.
- Enter or select the Output Table Name depending on the Output Mode you choose.
- For Create/overwrite a table, enter the name of the table being created or overwritten, and indicate whether the table should be deleted after the macro or workflow has executed.
- For Use an existing table, select the table to be used from the drop-down, and indicate whether the rows in the table should be deleted before the new data is written to the table.
- On the Configuration tab
:
Fields to Match -
- Drag from Input Fields to the Input placeholder icon. This field must be a string data type.
- Drag from Lookup Fields to the Lookup placeholder icon. This field must be a string data type.
- Optionally, drag any input pass-through fields from Input Fields to the Input Pass Through Fields area.
- Optionally, drag any lookup pass-through fields from Lookup Fields to the Lookup Pass Through Fields area.
Options -
- For Threshold, indicate the minimum percentage of a matching data characteristics required. The default is 85 (for 85%).
- For Max Matches, enter the maximum number of matches within the threshold to be written to the output table.
- Indicate whether to Ignore Case and/or Ignore Special Characters. If selected, these characters do not affect the threshold of the match.
Fuzzy Matching Algorithm -
Choose the matching algorithm to be used. Select each one for a description of the logic it employs:
- Jaro-Winkler
- Smith-Waterman
- Levenshtein
- Overlap
Fuzzy Matching data example
In the following example, fuzzy matching is used to find city names that might include misspellings:
Last modified: Thursday December 19, 2024