Using Analysis Filters for Indexes

Background

Discovery Center Release 4.3.9 introduced the capability to use filters to prevent files, based upon basic file metadata such as file extension, from being unnecessarily copied into its file cache for analysis. The purpose of the feature is to improve analysis performance for customers who are able to trade off potential inaccuracies (see below) in basic file metadata for improvements in performance.

Setting Analysis Filters

In general, analysis performance will benefit from applying two filter settings; these settings are changed by editing the relevant Discovery Center index configuration (Indexes > Index Configuration):

  • File Size: by default, the maximum file size for analysis is set to 20MB. Changing to smaller sizes may be applicable in some cases.
  • File Extensions: the text file attached to this article provides an example list of extensions that can be applied to prevent analysis from being attempted on file types not commonly applicable for textual analysis.

In all cases, the configuration of analysis filters file should be inspected and understood to ensure that their impact meets the needs of the current project.  In case of doubt, use Discovery Center reporting to determine what types of files are present and understand how analysis filter might take effect.

Note on Windows Metadata Accuracy

Any IT environment will contain instances of files for which basic file metadata is inaccurate. For example, a Windows user is able to edit a file extension so that it no longer accurately describes the file format ('myfile.xls' might become 'myfile.anyextension'); in such cases analysis filters might exclude those files from analysis even if Discovery Center is able to analyze them.

For the vast majority of use cases, these inaccuracies present no risk to project outcomes and will be entirely acceptable. However, for customers wishing to ensure that Discovery Center considers every possible file for analysis, regardless of basic metadata inaccuracies, analysis filters should not be used.

 

Download Example File Extension