Disabling Analysis of Hidden Text Within PDF Documents

Discovery Center can extract hidden text within documents such as PDF's with embedded OCR text. We understand that this may not be desired behavior, so we have described how to disable the extraction of hidden text.

Access and permission to modify the files on the Discovery Center server are required.

Resolution

The steps below describe how to disable the Document Conversion setting that controls this function:

  1. Locate the folder 'C:\Program Files (x86)\Active Navigation\Discovery Center\Analysis\Config'
  2. Inside this folder there are two configuration files called HTMLEXP.CFG & HTMLEXP-U.CFG
  3. Open both of these files in a text editor such as Notepad and locate the line 'showhiddentext yes'
  4. Add a '#' at the start of the line leaving '#showhiddentext yes'
  5. Save the file. If you encounter problems saving the file ensure that your user account has writes to edit files in this location and adjust file permissions if necessary.
  6. To disable the hidden text feature simply add the '#' back to the configuration file in the original position and re-save both files

If you wish to review the effect of this change to your documents, you can apply the same changes to an installation of the Regular Expression Validator utility. After loading a test file, you can monitor how the change impacts the document conversion.