Choosing appropriate search syntax for Classification Rules

Types of Classification search term syntax

When designing a classification rule, the flexibility of the Lucene search syntax means that classification authors have the choice of either targetting specific fields or entering generic terms that will be matched against all available fields.

For instance a specific query might be formatted like this:

company_name:"Active Navigation" OR filename:"Active Navigation"

A generic query for Active Navigation would be:

"Active Navigation"

Considerations for choice of syntax

Our recommendation is that classification authors should plan their use of fields and metadata in such a way that they can construct queries using specific field names for the following reasons:

  • Classification rules expressed in this way are explicit and predictable; when making decisions based on the result of a classification you can be confident about the way the results were arrived at
  • Performance will be improved by making your search requirements explicit. When a generic search is specified it is expanded to create a complex search based on all available fields, this may impact on the overall performance of classification.
  • Adding additional fields may cause classification to fail (see below)

Problems that may occur when rules use generic terms

Because a search that uses a generic term is expanded to use all available fields, you may encounter issues with the maximum permitted number of search terms when if your system has a large number of fields.

In this case you will encounter "too many bool clauses" errors when running search/coverage operations in the Classification Designer or when classifying an index.