Best Practices for Indexing Different Repositories

Discovery Center provides connectivity to a range of repositories, each with its own unique features. 

This article will provide guidance on how to interact with different repositories in Discovery Center and manage their differences.

General Best Practices 

A couple of practices apply regardless of the repository, which includes:

  • Index in Smaller Segments. While it is possible to index an entire server at once, this can often lead to management challenges as your indexing matures. Dividing servers into segments helps with administration and ensures that long-running tasks do not clog up the activity queue.
  • Consider Thread Counts.
    • Discovery Center takes advantage of threading to improve performance during indexing.
    • Test different thread settings to find those that fit best within your environment. Too many threads may slow processes or create memory pressure, while too few leaves performance below where it could be.
    • Start by doubling your thread setting and noting the difference in a test and then work from there.
  • Use filters. You can avoid analyzing files unnecessarily with filters.

File Shares (CIFS)

Our File Share connector works with any repository that provides a CIFS (Common Internet File System) interface including storage technologies. Consider the following suggestions to enhance your experience:

  • Index Shares. Set your indexes up at the share level (\\server\share); that way you can divide large sets into smaller segments and more readily handle issues and outages without losing progress.
  • Disable File Owner Checks. If file share indexes perform slowly, run a test with 'retrieve file owner' disabled. Some networks struggle to process this detail efficiently.
  • Monitor Overlaps and DFS.
    • If your environment has shares which overlap, or DFS links that re-direct our indexers, Discovery Center may index the same content more than once. Work with your Windows or storage admins to understand where these might be in play.
    • You can use index settings to ignore patterns and paths where needed.
  • Ignore Snapshots. If your storage environment retains snapshots, indexing these will add unnecessary volume to your license. Add ~snapshot to the ignored locations setting for relevant indexes or index configurations.

SharePoint (Including O365)

SharePoint can be a complex repository to work with and there are cases where its behavior can be difficult to follow. Since our connectors work with SharePoint's web services, indexes will perform far slower than an equivalent file share index. Consider the following suggestions to enhance your experience:

  • Check Permissions. Permissions are the most common cause of failures when connecting with SharePoint and it can be hard to configure accurately. When connecting to SharePoint persistently fails, we strongly urge you to check again with your SharePoint admins. More information on the SharePoint permissions required by Discovery Center can be found at the link here.
  • Monitor for Overlaps. SharePoint isn't always organized in a simple hierarchy, which means that sites that appear to be peers from a user perspective, have URL paths which overlap. This means indexing 'higher level' sites can interfere with results from 'lower level' sites. When two URLs overlap, be sure to index the 'higher level' first, followed by the 'lower levels'.
    Example - Index https://acmesharepoint before https://acmesharepoint/sites/portal.
  • Using Credentials. The Discovery Center SharePoint connector requires that users apply a specific set of local credentials to connect to the webservices. This can be achieved by navigating to System Settings > Credential Management, then creating a set of credentials that are valid within the targeted site. Once you have created these credentials, you can apply them to your site in the Network map tab. The SharePoint credential format requires it to be in Email or UPN format such as TestUser@Domain.com.

OneDrive

Connecting to OneDrive requires to use the SharePoint connector when configuring indexes. With indexes in place, OneDrive indexing and actions behave similarly to the equivalent in SharePoint.

  • Plan Ahead. Driven by the way OneDrive is implemented, each individual OneDrive site requires its own index.
  • Add Many. The efficient way for setting up indexes is to prepare a text file of OneDrive Sites, then use the 'Add Many' option for index creation. Using this feature along with a consistent naming convention will help you manage those indexes over time.

Note: To access other user's OneDrive locations, you should configure credentials that belong to a site collection administrator of the OneDrive sites. Consider setting up a secondary Admin to OneDrive, or utilizing an existing administrator's credential.

Exchange (Including O365)

Exchange can be challenging to get acceptable performance from, which is usually related to the way Exchange Web Services are configured. Discovery Center provides a few ways to modify your set up for success.

  • Things to consider. We have created a comprehensive article that we recommend you read prior to exchange deployment. Considerations for Analysis of Exchange Content.
  • Index by Mailbox Group. Our connector indexes across users' mailboxes and structures them into alphanumeric virtual nodes. To easily break the task into smaller segments, set up indexes for each node:
    • Mailboxes/A/ through Mailboxes/Z/
    • Mailboxes/#/ for mailboxes starting with a number
    • Mailboxes/_/  if you have mailboxes with non-alphanumeric start characters.

Google Drive

Configuration for Google Drive Connector

OpenText Content Server

Configuration for OpenText Connector