The Feature Extraction and Scores Rules are used by Data Source Collectors to locate low-scoring data and are an integral part of ActiveNav Cloud. These rules are updated from time to time, and when available, a new Rule Pack message will appear on the Data Sources page as shown below:
WARNING: It is important to note that when you Update the Rules Pack, it will overwrite any existing rules, including any custom rules that may have been added to Feature Extraction/Scores rules. Please make copies of your Feature Extraction and Scores rules before you proceed with Updating the Rules Pack.
Copying your Feature Extraction / Scores Rules
It is important to ensure that you backup your Feature Extraction and Scores Rules before taking a Rule Pack update. If you have made changes to your Feature Extraction / Scores Rules, these changes will be overwritten when an update is made.
Copy the Rules (ensuring that you retain the correct format, spacing/format/syntax is critical in YAML), then paste them into a text editor, e.g., MS Word or Notepad, and save it. Once updated, you can add back any new rules that you created.
WARNING: When using a text editor to save the YAML, it is very important that the encoding is set to UTF-8 to ensure that the format/syntax/spacing is retained (Word and Notepad are set to UTF-8 by default, but other text editors may not be).
Command | Windows | Mac |
Select All | CTRL + A | CMD + A |
Copy | CTRL + C | CMD + C |
Paste | CTRL + V | CMD + V |
Understanding Rules
Rules are written in YAML (a human-readable programming language used for configuration files).
NOTE: When writing rules in YAML, the spacing is critical. For example, if the filter property in the screenshot below was not in line with definitions property it would cause an error. Here's a tool that will validate your YAML https://onlineyamltools.com/validate-yaml
- Address NA Keywords - Element name; this must be a unique name.
- definitions - Contains a mapping for the following values:
- baseConfidence - A number between 0 and 100; the higher the number, the higher the confidence is that the data found matches the feature extraction rule.
- isAnchor - If the element is an anchor, it can be associated with other objects to form a higher confidence; values are true or false. This is used for Proximity filters.
- name - The name of the element.
- type - Either the type is a Keyword, e.g., "Passport", or a Pattern, e.g., "0000-0000-0".
- required - Determines if this object is required for a match, true/false.
-
validationLabels - e.g., "CreditCard", "UsItin", "UsSsn".
- values - These are keywords/patterns that can be associated with the element, e.g., "Credit Card", "3[47][0-9]{13}".
- targetContent - This is where the search is targeted, e.g., FileName, FilePath, Content. There can be one or more values.
- filter - The filter determines if the rule is searching everything or just the proximity to another element, values are either Any or Proximity,
Confidence Calculation Explanation
The confidence calculation is adjusted to approach 100 as elements of the feature rule are found.
If, for example, the baseConfidence for an element is 50, and the baseConfidence for an element in proximity is 25, when found together, ActiveNav Cloud's algorithm will calculate the confidence which would be increased due to the proximity of each element.