Metadata Classification
A guide on metadata classification and its usage
Metadata Classification groups objects into categories so you can rapidly make decisions based on a Discovery Only pass of the inventory. It does not require Feature Extraction hits or Scoring to generate meaningful insights into the object inventory.
ActiveNav Cloud includes a library of predefined classification rules, all of which can be edited, removed, or extended with additional rules.
Uploading and Exporting Classification Rules
The options for uploading and exporting classification rules can be found under the Business Rules menu option. 
Classification rule files must be in JSON format and conform to the required schema. The following section provides a technical guide to creating custom rule sets, including rule examples.
Available Metadata
The following list of object properties (referred to as fields in the rules) are currently available for use within Metadata Classification.
|
Metadata |
Value Type |
Comments |
|---|---|---|
|
FileName |
String |
|
|
Extension |
String |
|
|
Owner |
String |
Retrieval of owner is optional, the tenant must have configured this retrieval via System Settings for this to be populated. Notwithstanding the tenant configuration, a repository may not return an owner. |
|
ObjectType |
String |
|
|
Path |
String |
This is the human-readable path and only evaluates to the parent container i.e. file name and extensions are their own fields as shown above. |
|
Size |
Number |
The file size in bytes. Note that some repositories may not provide a size. |
|
ModifiedDate |
Date |
|
|
CreatedDate |
Date |
|
|
AccessedDate |
Date |
Of the currently supported repository types, only Windows File Share can reliably populate this assuming target share is configured accordingly. (Windows does not have to track last accessed.) |
As an example, assume we have an object on Windows File Share with the UNC path of \\an.share\TestData\Classification testing\exampleFile.docx the metadata for that file may look like:
| Metadata | Value |
|---|---|
| File Name | exampleFile |
| Extension | docx |
| Owner | azuread\josephbloggs |
| Object Type | Word Processing |
| Path | Windows File Share|an.share|TestData|Classification testing |
| Size | 57344 |
| Modified Date | 2025-03-05 07:45:09 |
| Created Date | 2024-11-24 13:32:27 |
| Accessed Date | 2025-03-05 07:45:09 |
Available Operators for Metadata Types
Generic operators that apply to all data types:
- Equals
- NotEquals
Operators that apply to String data type only:
- Contains
- DoesNotContain
- StartsWith
- EndsWith
- LongerThan
- ShorterThan
- Regex
Operators that apply to Date data type only:
- BeforeThisDate
- AfterThisDate
- OlderThanDays
- YoungerThanDays
Operators that apply to Number data type only:
- BiggerThan
- BiggerThanOrEquals
- SmallerThan
- SmallerThanOrEquals
Normalization
To keep metadata classification in line with how data is ingested into ActiveNav Cloud, some normalization rules are applied to data types when they are 'null' or meet certain thresholds.
| Data Type | Condition | Normalisation |
|---|---|---|
| String | Is null | Is treated as an empty string "" |
| Number | Is null | Is treated as 0 |
| Date | Is less than 1753-01-01 | Is treated as 1753-01-01 |
| Date | Is not null | Only the 'date' part is used, time will be treated as 00:00:00 |
JSON Schema Details
The classification schema is made up of three node types:
- Classification
- Rule
- RuleGroup
There is no root node, the root element is an array and this array can only contain objects of type Classification.
There is a schema file available that can be used with various applications to validate a JSON file conforms to the format expected. Please contact your customer success representative to obtain this schema.
The basic schema would be formatted similarly to this:
[
{
"Name": "My Classification",
"NodeType": "Classification",
"Comment": "An optional field to describe my classification",
"Rules": [
{
"NodeType": "Rule",
"Field": "Path",
"Operator": "Contains",
"Value": "invoices"
},
{
"NodeType": "RuleGroup",
"CombineWithPrevious": "OR",
"Rules": [
...
]
}
]
}
]
Classification Nodes
Classification nodes are grouping mechanisms for child entities. A Classification node can have child Rule/RuleGroups or child Classifications but not both. If the Classification node is at level 4 in the hierarchy of Classification nodes, then it can no longer have child classifications. Please note that this depth limitation is not validated by the schema file mentioned above but will be validated on upload to AN Cloud.
| Property | Type | Is Mandatory | Notes |
|---|---|---|---|
| name | string | Yes | |
| comment | string | No | This is not displayed in the UI or classification results, This is purely for helpful direction when viewing the raw JSON config. Cannot be more than 200 characters. |
| childClassifications | array | One of childClassifications or rules must exist | Cannot exist if rules are specified. Can only contain entities that represent a Classification Node. |
| rules | array | One of childClassifications or rules must exist. | Cannot exist if childClassifications are specified. Can only contain entities that represent a Rule or RuleGroup node. |
Rule Nodes
A rule node contains a single operation that will be actioned against the specified object metadata field.
| Property | Type | Is Mandatory | Notes |
|---|---|---|---|
| nodeType | string | Yes | Will always be Rule. |
| combinedWithPrevious | string | Yes when not the first 'entity' in the rules array | Accepted values are AND or OR. |
| negate | boolean | No | Will default to false. When true the result of the operation will be inverted. |
| comment | string | No | This is not displayed in the UI or classification results, This is purely for helpful direction when viewing the raw JSON config. Cannot be more than 200. |
| field | string | Yes | Must be one of the following values: Path, FileName, Extension, ObjectType, CreatedDate, ModifiedDate, AccessedDate, Size, Owner. |
| operator | string | Yes | Must follow the rules mentioned earlier in the wiki regarding operators to metadata type. Must be one of the following values: Equals, NotEquals, Contains, DoesNotContain, StartsWith, EndsWith, LongerThan, ShorterThan, Regex, BeforeThisDate, AfterThisDate, OlderThanDays, YoungerThanDays, BiggerThan, BiggerThanOrEquals, SmallerThan, SmallerThanOrEquals. |
| value | string | Yes | The value that will be used with the operator when performing the action against the object metadata field. |
Rule Groups
A Rule Group is a mechanism to provide parentheses to the logical operation being carried out against an Object during classification. This provides the ability to enforce logical order when evaluation occurs.
For example, the following operation would be three Rule nodes in the JSON file:
Extension Equals "docx" OR Extension Equals "txt" AND Size > 11
It would always equate to true if the extension is docx and would only equate to true for a txt if the Size is also greater than 11. While that may be the intention, it can be difficult to understand or visualize when writing more complex operations.
By placing parentheses like as the example below:
( Extension Equals "docx" OR Extension Equals "txt" ) AND Size > 11
The meaning of the operation has changed, now this will only equate to true for docx or txt object should the object size also be greater than 11. The above example would now be a single Rule Group node (with two child Rule nodes representing the extension operations) and a single Rule node (representing the Size operation).
| Property | Type | Is Mandatory | Notes |
|---|---|---|---|
| nodeType | string | Yes | Will always be RuleGroup. |
| combinedWithPrevious | string | Yes when not the first 'entity' in the rules array | Accepted values are AND or OR. |
| negate | boolean | No | Will default to false. When true the result of the operation will be inverted. |
| comment | string | No | This is not displayed in the UI or classification results, This is purely for helpful direction when viewing the raw JSON config. Cannot be more than 200 characters. |
| rules | array | Yes | Can only contain entities that represent a Rule or RuleGroup node. |
Simple Example Configuration Files
No Classification Rules
If you do not want to use Classification, you should upload an empty configuration file as shown below. This is simply an empty array as below:
[]
Single Classification Without Rule Group
The following shows a single Classification Node containing three rules representing the equation below:
Extension Equals "docx" OR Extension Equals "txt" AND Size > 11
[
{
"name": "RulesOnly",
"comment": "An example classification with rules only",
"rules": [
{
"nodeType": "Rule",
"field": "Extension",
"operator": "Equals",
"value": "docx"
},
{
"nodeType": "Rule",
"combineWithPrevious": "OR",
"field": "Extension",
"operator": "Equals",
"value": "txt"
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "Size",
"operator": "BiggerThan",
"value": "11"
}
]
}
]
Single Classification Using Rule Group
The following shows a single Classification Node containing three rules representing the equation below:
( Extension Equals "docx" OR Extension Equals "txt" ) AND Size > 11
[
{
"name": "RulesGroupExample",
"comment": "An example classification using a RuleGroup",
"rules": [
{
"nodeType": "RuleGroup",
"rules": [
{
"nodeType": "Rule",
"field": "Extension",
"operator": "Equals",
"value": "docx"
},
{
"nodeType": "Rule",
"combineWithPrevious": "OR",
"field": "Extension",
"operator": "Equals",
"value": "txt"
}
]
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "Size",
"operator": "BiggerThan",
"value": "11"
}
]
}
]
Multiple Classification Nodes
The following config shows the following hierarchy (and simple rules).
- M365 Large Files
- Large SharePoint
- Large OneDrive
- Old files
- Ancient files
- Older files
[
{
"name": "M365 Large Files",
"comment": "An example using more classification nodes",
"childClassifications": [
{
"name": "Large SharePoint",
"rules": [
{
"nodeType": "Rule",
"field": "Path",
"operator": "StartsWith",
"value": "SharePoint Online|",
"comment": "is sharepoint"
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "Size",
"operator": "BiggerThan",
"value": "2147483648",
"comment": "is > 2gb"
}
]
},
{
"name": "Large OneDrive",
"rules": [
{
"nodeType": "Rule",
"field": "Path",
"operator": "StartsWith",
"value": "OneDrive|",
"comment": "is OneDrive"
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "Size",
"operator": "BiggerThan",
"value": "524288000",
"comment": "is > 500 mb"
}
]
}
]
},
{
"name": "Old Files",
"childClassifications": [
{
"name": "Ancient Files",
"rules": [
{
"nodeType": "Rule",
"field": "ModifiedDate",
"operator": "BeforeThisDate",
"value": "1980-01-01"
}
]
},
{
"name": "Older Files",
"rules": [
{
"nodeType": "Rule",
"field": "ModifiedDate",
"operator": "BeforeThisDate",
"value": "2000-01-01"
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "ModifiedDate",
"operator": "AfterThisDate",
"value": "1980-01-01"
}
]
}
]
}
]
Between Values
There is no explicit between operator, therefore to achieve that functionality requires two rules - one with a greater than (or equal) and one with a less than (or equal).
The below config will represent the following desired equation:
Size >= 10kb and Size <= 1mb
[
{
"name": "Between Example",
"rules": [
{
"nodeType": "Rule",
"field": "Size",
"operator": "BiggerThanOrEquals",
"value": "10240"
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "Size",
"operator": "SmallerThanOrEquals",
"value": "1048576"
}
]
}
]
Negation
Rule, and RuleGroup nodes allow negating the logical operation. However given the Rule node is a single operation and (currently) all operators have their opposite operation type, it probably has most value in negating a RuleGroup. This enables a potentially complex operation or an operation that is easier to read a certain way, to be negated easily as a whole without needing to work out the inverse of each individual element.
Negation is simply a flag on the Rule or RuleGroup node as shown in the example below:
[
{
"name": "Negation Example",
"comment": "An example classification using a negated RuleGroup",
"rules": [
{
"nodeType": "RuleGroup",
"comment": "This rulegroup is negated",
"negate": true,
"rules": [
{
"nodeType": "Rule",
"field": "Extension",
"operator": "Equals",
"value": "docx"
},
{
"nodeType": "Rule",
"combineWithPrevious": "OR",
"field": "Extension",
"operator": "Equals",
"value": "txt"
}
]
},
{
"nodeType": "Rule",
"combineWithPrevious": "AND",
"field": "Size",
"operator": "BiggerThan",
"value": "11"
}
]
}
]
This takes the previous RuleGroup example and negates it. So, whereas the previous RuleGroup example required the size to be greater than 11 and the extension must be docx or txt, it now changes to must be size greater than 11 and the extension is not docx or txt.
Metadata Classification Appendix
Some repositories have nuances and oddities in the way the return certain data fields. A list of these anomalies can be found the Metadata Classification Appendix article.