Rules

Understanding Feature Extraction Rules

Feature Extraction Rules and Score Definitions are used by data source collectors to locate low-scoring data and are an integral part of ActiveNav Cloud.

These rules will be updated from time to time which will expand upon the existing ruleset and not overwrite baseline or custom rules.

Search by Name

Copying your Feature Extraction / Scores Rules

It is good practice however, to ensure that you backup your Feature Extraction Rules and Score Definitions.  Please see the KBAs Configuring Score Definitions and Feature Extraction Rule View for further information.

Understanding Rules

Rules are written in JSON (a human-readable notation format used for configuration files).

Scoring Definition

NOTE: When writing feature extraction rules in JSON, it is important that the formatting syntax is obeyed. If it is not, an upload of custom rules will fail. There are many JSON format validators available online to check your file.

You can find out more about JSON here - w3schools.com - What is JSON?

  • dataElementName - Element name; this must be unique.
  • definitions - Contains a mapping for the following values: 
    • baseConfidence - A number between 0 and 100; the higher the number, the higher the confidence is that the data found matches the feature extraction rule.
    • isAnchor - If the element is an anchor, it can be associated with other objects to form a higher confidence; values are true or false. This is used for Proximity filters.
    • name - The name of the element.
    • type - Either the type is a Keyword, e.g., "Passport", or a Pattern, e.g., "0000-0000-0".
    • required - Determines if this object is required for a match, true/false.
    • validationLabels - e.g., "CreditCard", "UsItin", "UsSsn".
    • values - These are keywords/patterns that can be associated with the element, e.g., "Credit Card", "3[47][0-9]{13}". 
    • targetContent - This is where the search is targeted, e.g., FileName, FilePath, Content. There can be one or more values.
  • filter -  The filter determines if the rule is searching everything or just the proximity to another element, values are either Any or Proximity.
  • searchable - Accepts true or false and determines whether the rule can be made available for target searches.
  • tags - Each extractor can accept one or more tags such as "Country/Great Britain"  or "Direct Identifiers/National IDs"  .

Confidence Calculation Explanation 

The confidence calculation is adjusted to approach 100 as elements of the feature rule are found.

If, for example, the baseConfidence for an element is 50, and the baseConfidence for an element in proximity is 25, when found together, ActiveNav Cloud's algorithm will calculate the confidence which would be increased due to the proximity of each element.