Skip to content
  • There are no suggestions because the search field is empty.

Metadata Classification

A guide on metadata classification and its usage

Metadata Classification groups objects into categories so you can rapidly make decisions based on a Discovery Only pass of the inventory. It does not require Feature Extraction hits or Scoring to generate meaningful insights into the object inventory.

ActiveNav Cloud includes a library of predefined classification rules, all of which can be edited, removed, or extended with additional rules.

Uploading and Exporting Classification Rules

The options for uploading and exporting classification rules can be found under the Business Rules menu option. Classification Rules - Upload and Export

Classification rule files must be in JSON format and conform to the required schema. The following section provides a technical guide to creating custom rule sets, including rule examples.

Available Metadata

The following list of object properties (referred to as fields in the rules) are currently available for use within Metadata Classification.

Metadata

Value Type

Comments

FileName

String

 

Extension

String

 

Owner

String

Retrieval of owner is optional, the tenant must have configured this retrieval via System Settings for this to be populated. Notwithstanding the tenant configuration, a repository may not return an owner.

ObjectType

String

 

Path

String

This is the human-readable path and only evaluates to the parent container i.e. file name and extensions are their own fields as shown above.

Size

Number

The file size in bytes. Note that some repositories may not provide a size.

ModifiedDate

Date

 

CreatedDate

Date

 

AccessedDate

Date

Of the currently supported repository types, only Windows File Share can reliably populate this assuming target share is configured accordingly. (Windows does not have to track last accessed.)

As an example, assume we have an object on Windows File Share with the UNC path of \\an.share\TestData\Classification testing\exampleFile.docx the metadata for that file may look like:

Metadata Value
File Name exampleFile
Extension docx
Owner azuread\josephbloggs
Object Type Word Processing
Path Windows File Share|an.share|TestData|Classification testing
Size 57344  
Modified Date 2025-03-05 07:45:09
Created Date 2024-11-24 13:32:27
Accessed Date 2025-03-05 07:45:09

Available Operators for Metadata Types

Generic operators that apply to all data types:

  • Equals
  • NotEquals

Operators that apply to String data type only:

  • Contains
  • DoesNotContain
  • StartsWith
  • EndsWith
  • LongerThan
  • ShorterThan
  • Regex

Operators that apply to Date data type only:

  • BeforeThisDate
  • AfterThisDate
  • OlderThanDays
  • YoungerThanDays

Operators that apply to Number data type only:

  • BiggerThan
  • BiggerThanOrEquals
  • SmallerThan
  • SmallerThanOrEquals

Normalization

To keep metadata classification in line with how data is ingested into ActiveNav Cloud, some normalization rules are applied to data types when they are 'null' or meet certain thresholds.

Data Type Condition Normalisation
String Is null Is treated as an empty string ""
Number Is null Is treated as 0
Date Is less than 1753-01-01 Is treated as 1753-01-01
Date Is not null Only the 'date' part is used, time will be treated as 00:00:00

JSON Schema Details

The classification schema is made up of three node types:

  • Classification
  • Rule
  • RuleGroup

There is no root node, the root element is an array and this array can only contain objects of type Classification.

There is a schema file available that can be used with various applications to validate a JSON file conforms to the format expected. Please contact your customer success representative to obtain this schema. 

The basic schema would be formatted similarly to this:

[
  {
    "Name": "My Classification",
    "NodeType": "Classification",
    "Comment": "An optional field to describe my classification",
    "Rules": [
      {
        "NodeType": "Rule",
        "Field": "Path",
        "Operator": "Contains",
        "Value": "invoices"
      },
      {
        "NodeType": "RuleGroup",
        "CombineWithPrevious": "OR",
        "Rules": [
          ...
        ]
      }
    ]
  }
]

Classification Nodes

Classification nodes are grouping mechanisms for child entities. A Classification node can have child Rule/RuleGroups or child Classifications but not both. If the Classification node is at level 4 in the hierarchy of Classification nodes, then it can no longer have child classifications. Please note that this depth limitation is not validated by the schema file mentioned above but will be validated on upload to AN Cloud.

Property Type Is Mandatory Notes
name string Yes  
comment string No This is not displayed in the UI or classification results, This is purely for helpful direction when viewing the raw JSON config. Cannot be more than 200 characters.
childClassifications array One of childClassifications or rules must exist Cannot exist if rules are specified. Can only contain entities that represent a Classification Node.
rules array One of childClassifications or rules must exist. Cannot exist if childClassifications are specified. Can only contain entities that represent a Rule or RuleGroup node.

Rule Nodes

A rule node contains a single operation that will be actioned against the specified object metadata field.

Property Type Is Mandatory Notes
nodeType string Yes Will always be Rule.
combinedWithPrevious string Yes when not the first 'entity' in the rules array Accepted values are AND or OR.
negate boolean No Will default to false. When true the result of the operation will be inverted.
comment string No This is not displayed in the UI or classification results, This is purely for helpful direction when viewing the raw JSON config. Cannot be more than 200.
field string Yes Must be one of the following values: Path, FileName, Extension, ObjectType, CreatedDate, ModifiedDate, AccessedDate, Size, Owner.
operator string Yes Must follow the rules mentioned earlier in the wiki regarding operators to metadata type. Must be one of the following values: Equals, NotEquals, Contains, DoesNotContain, StartsWith, EndsWith, LongerThan, ShorterThan, Regex, BeforeThisDate, AfterThisDate, OlderThanDays, YoungerThanDays, BiggerThan, BiggerThanOrEquals, SmallerThan, SmallerThanOrEquals.
value string Yes The value that will be used with the operator when performing the action against the object metadata field.

Rule Groups

A Rule Group is a mechanism to provide parentheses to the logical operation being carried out against an Object during classification. This provides the ability to enforce logical order when evaluation occurs.

For example, the following operation would be three Rule nodes in the JSON file:

Extension Equals "docx" OR Extension Equals "txt" AND Size > 11

It would always equate to true if the extension is docx and would only equate to true for a txt if the Size is also greater than 11. While that may be the intention, it can be difficult to understand or visualize when writing more complex operations.

By placing parentheses like as the example below:

( Extension Equals "docx" OR Extension Equals "txt" ) AND Size > 11

The meaning of the operation has changed, now this will only equate to true for docx or txt object should the object size also be greater than 11. The above example would now be a single Rule Group node (with two child Rule nodes representing the extension operations) and a single Rule node (representing the Size operation).

Property Type Is Mandatory Notes
nodeType string Yes Will always be RuleGroup.
combinedWithPrevious string Yes when not the first 'entity' in the rules array Accepted values are AND or OR.
negate boolean No Will default to false. When true the result of the operation will be inverted.
comment string No This is not displayed in the UI or classification results, This is purely for helpful direction when viewing the raw JSON config. Cannot be more than 200 characters.
rules array Yes Can only contain entities that represent a Rule or RuleGroup node.

Simple Example Configuration Files

No Classification Rules

If you do not want to use Classification, you should upload an empty configuration file as shown below. This is simply an empty array as below:

[]

Single Classification Without Rule Group

The following shows a single Classification Node containing three rules representing the equation below:

Extension Equals "docx" OR Extension Equals "txt" AND Size > 11
[
  {
    "name": "RulesOnly",
    "comment": "An example classification with rules only",
    "rules": [
      {
        "nodeType": "Rule",
        "field": "Extension",
        "operator": "Equals",
        "value": "docx"
      },
      {
        "nodeType": "Rule",
        "combineWithPrevious": "OR",
        "field": "Extension",
        "operator": "Equals",
        "value": "txt"
      },
      {
        "nodeType": "Rule",
        "combineWithPrevious": "AND",
        "field": "Size",
        "operator": "BiggerThan",
        "value": "11"
      }
    ]
  }
]

Single Classification Using Rule Group

The following shows a single Classification Node containing three rules representing the equation below:

( Extension Equals "docx" OR Extension Equals "txt" ) AND Size > 11
[
  {
    "name": "RulesGroupExample",
    "comment": "An example classification using a RuleGroup",
    "rules": [
      {
        "nodeType": "RuleGroup",
        "rules": [
          {
            "nodeType": "Rule",
            "field": "Extension",
            "operator": "Equals",
            "value": "docx"
          },
          {
            "nodeType": "Rule",
            "combineWithPrevious": "OR",
            "field": "Extension",
            "operator": "Equals",
            "value": "txt"
          }
        ]
      },
      {
        "nodeType": "Rule",
        "combineWithPrevious": "AND",
        "field": "Size",
        "operator": "BiggerThan",
        "value": "11"
      }
    ]
  }
]

Multiple Classification Nodes

The following config shows the following hierarchy (and simple rules).

  • M365 Large Files
    • Large SharePoint
    • Large OneDrive
  • Old files
    • Ancient files
    • Older files
[
  {
    "name": "M365 Large Files",
    "comment": "An example using more classification nodes",
    "childClassifications": [
      {
        "name": "Large SharePoint",
        "rules": [
          {
            "nodeType": "Rule",
            "field": "Path",
            "operator": "StartsWith",
            "value": "SharePoint Online|",
            "comment": "is sharepoint"
          },
          {
            "nodeType": "Rule",
            "combineWithPrevious": "AND",
            "field": "Size",
            "operator": "BiggerThan",
            "value": "2147483648",
            "comment": "is > 2gb"
          }
        ]
      },
      {
        "name": "Large OneDrive",
        "rules": [
          {
            "nodeType": "Rule",
            "field": "Path",
            "operator": "StartsWith",
            "value": "OneDrive|",
            "comment": "is OneDrive"
          },
          {
            "nodeType": "Rule",
            "combineWithPrevious": "AND",
            "field": "Size",
            "operator": "BiggerThan",
            "value": "524288000",
            "comment": "is > 500 mb"
          }
        ]
      }
    ]
  },
  {
    "name": "Old Files",
    "childClassifications": [
      {
        "name": "Ancient Files",
        "rules": [
          {
            "nodeType": "Rule",
            "field": "ModifiedDate",
            "operator": "BeforeThisDate",
            "value": "1980-01-01"
          }
        ]
      },
      {
        "name": "Older Files",
        "rules": [
          {
            "nodeType": "Rule",
            "field": "ModifiedDate",
            "operator": "BeforeThisDate",
            "value": "2000-01-01"
          },
          {
            "nodeType": "Rule",
            "combineWithPrevious": "AND",
            "field": "ModifiedDate",
            "operator": "AfterThisDate",
            "value": "1980-01-01"
          }
        ]
      }
    ]
  }
]

Between Values

There is no explicit between operator, therefore to achieve that functionality requires two rules - one with a greater than (or equal) and one with a less than (or equal).

The below config will represent the following desired equation:

Size >= 10kb and Size <= 1mb
[
  {
    "name": "Between Example",
    "rules": [
      {
        "nodeType": "Rule",
        "field": "Size",
        "operator": "BiggerThanOrEquals",
        "value": "10240"
      },
      {
        "nodeType": "Rule",
        "combineWithPrevious": "AND",
        "field": "Size",
        "operator": "SmallerThanOrEquals",
        "value": "1048576"
      }
    ]
  }
]

Negation

Rule, and RuleGroup nodes allow negating the logical operation. However given the Rule node is a single operation and (currently) all operators have their opposite operation type, it probably has most value in negating a RuleGroup. This enables a potentially complex operation or an operation that is easier to read a certain way, to be negated easily as a whole without needing to work out the inverse of each individual element.

Negation is simply a flag on the Rule or RuleGroup node as shown in the example below:

[
  {
    "name": "Negation Example",
    "comment": "An example classification using a negated RuleGroup",
    "rules": [
      {
        "nodeType": "RuleGroup",
        "comment": "This rulegroup is negated",
        "negate": true,
        "rules": [
          {
            "nodeType": "Rule",
            "field": "Extension",
            "operator": "Equals",
            "value": "docx"
          },
          {
            "nodeType": "Rule",
            "combineWithPrevious": "OR",
            "field": "Extension",
            "operator": "Equals",
            "value": "txt"
          }
        ]
      },
      {
        "nodeType": "Rule",
        "combineWithPrevious": "AND",
        "field": "Size",
        "operator": "BiggerThan",
        "value": "11"
      }
    ]
  }
]

This takes the previous RuleGroup example and negates it. So, whereas the previous RuleGroup example required the size to be greater than 11 and the extension must be docx or txt, it now changes to must be size greater than 11 and the extension is not docx or txt.

Metadata Classification Appendix

Some repositories have nuances and oddities in the way the return certain data fields. A list of these anomalies can be found the Metadata Classification Appendixopen_in_new article.