1. Preparation | Sensitive Data Cleanup and Monitoring

1.1 Activity Summary

Activity Description	Preparing the key elements of your data discovery project will help ensure that the initial deployment of ActiveNav Cloud will run as smoothly as possible. Taking the time to identify locations, prepare credentials, and engage with key users will allow you to achieve results as quickly as possible.
Goals	A project plan is in place to allow the initial deployment of ActiveNav Cloud to run smoothly.
Participants	Project Sponsor, IT team, Project / Program Manager
Pre-requisites	Cloud Services agreement is in place. Project / Program manager identified.
Outputs	Top level plan outlines goals and timeline. Initial list of repositories to be discovered is ready. Credentials prepared for access to repositories. Deployment architecture for FileShare Collectors is understood.

1.2 Project Plan, Milestones, and Timelines

The first goal should be to establish the scope of your data discovery project so that you can establish:

A list of the data locations to be addressed, the repository types and their estimated volume.
An estimated timescale for discovery based on the data volume.
The types of sensitive data you expect to encounter.
The relevant users for assessment of findings.

A list should be prepared that itemizes the repositories that will be targeted using ActiveNav Cloud, with estimates for the volume of data held, noting whether key activities such as preparing credentials have been completed. This list can then provide a basis for the later activities of defining Data Sources and Business Units.

The volume of data to be discovered will be a fundamental factor in establishing the duration of your initial data discovery phase. If you prepare a list of the locations that you plan to discover with ActiveNav Cloud, their location and approximate size will be able to set expectations around the time required for discovery (ActiveNav Support can provide details of expected performance for different data repository types). The size and geographic location of data repositories also provide key insights to guide deployment decisions.

The default configuration of ActiveNav Cloud implements a Scoring configuration that targets common forms of sensitive data, but it is wise to consider the types of sensitive data you anticipate being present within your business. This will let you prioritize locations that you feel may be of most risk and determine whether additional data elements should be targeted beyond the default configuration.

Finally, you should begin to consider the users within your business that will help you achieve your goals in understanding the sensitive data that is found. You will need to engage with key individuals for different aspects of your business, ensure they understand the reasons for the data discovery process, and set expectations for their involvement in reviewing the results of your discovery project.

Key roles within a data discovery project include the following.

Project Sponsor: Executive ownership of the project; responsible for determining overall scope and goals.
Project Manager: Overall responsibility for guiding the team to achieve goals of the project
IT Team: Provide expertise in installing on-premises components; preparing credentials for data access and configuring cloud data sources.
Application Administrator: responsible for managing the ActiveNav Cloud application, facilitating access for end users, and configuring discovery tasks
Data Analysts: A team of end users trained to review discovery findings. Normally different Business Units will be represented by dedicated Data Analysts who understand the working practices of the Business Unit to enable them to review the data found in context.

Dependent on the scale of the project and the type of organization, an individual may be responsible for multiple roles.

Example

Prest Team are deploying ActiveNav Cloud to gain an understanding of all the data they hold, but with a particular emphasis on the new data that has arrived because of the acquisition of RBTSkills.

Alexander Knight will be their Project Manager and Application Administrator. He will delegate roles to other users to review the findings of the discovery project.

Alexander works with the IT team to build out the list of key locations that they will target using ActiveNav Cloud, noting their location and estimated volume. This list will then assist in the definition of the deployment architecture and configuration of data discoveries.

1.3 Define Deployment Architecture

ActiveNav Cloud uses software component called Collectors to perform data discovery from content repositories.

Discovery of cloud-based repositories (e.g., Office 365) is performed by Cloud Collector hosted within ActiveNav Cloud, and repository location and credentials must be provided to enable access.

Aside from cloud hosted repositories, almost every project will involve some processing of on-premises content. Currently on-premise discovery is supported for Windows file share and iManage Work repository types.

Discovery of on-premise content requires installment of at least one On-Premise Collector within the network for each repository type to target. The Collectors should be installed as close as possible, in network terms, to the data it will be used to discover from.

The precise number of collectors that is needed depends on the volume of data, its distribution, and the desired time scale for discovery. The following should be considered to decide on the most appropriate deployment plan:

Data Location. If your file shares are physically distributed, then at least one File Share Collector should be installed in each network location. Discovery throughput will be significantly impacted if performed between different content locations.
Data Volume. In any given location, the number of File Share Collectors deployed can be increased to achieve higher discovery throughput. This would normally be a good choice for locations with large data volumes.

Collector Groups, with affinity to geographical locations, are used to organize On-Premise Collectors. Each distinct physical location will require at least one Collector Group to be defined. This allows ActiveNav Cloud to request work to be carried out by the most appropriate available Collector.

More details about the way that Collector Groups are used to manage the interaction of Collectors and repositories see the overview at the link below:

Collector and Collector Groups Overview

Example

Their repository list shows that Prest Team has 2 file servers which are in different data centers. Because the connection between the data centers is dependent on a high latency WAN, they will configure one Collector Group for each data center, with Windows File Share Collectors installed in each data center and assigned to the relevant Collector Group.

This allows the Collector to maximize the speed of data access for best performance, and the independent Collector Groups enables ActiveNav Cloud to process content from each data center simultaneously. Because the overall data volume is moderately sized, only one collector is deployed in each Collector Group.

Cloud Collectors are deployed automatically as part of their ActiveNav Cloud tenant and used to discover SharePoint Online, Teams, Exchange and Google Drive data repositories.

1.4 Provision environment and data access

At this stage, ActiveNav tenant will have been provisioned for you and your initial administrator user created. You should ensure that this user can access the cloud platform, and that MFA configuration can be achieved in line with any specific requirements that your business may have.

For each of the data repositories identified in your list of locations, you will need to ensure you have arranged the relevant access credentials.

File Share Access

For each file share you intend to discover, you should ensure you have credentials that have read access for the file share itself and the underlying file system. It may be possible for your IT department to provide details of an existing account, or they may prefer to configure an account specifically for your project. You can test access to the account by mapping a network drive using the provided credentials.

iManage Work

For on-premise iManage Work repositories, you must configure a Client ID and associated Secret within iManage to represent the iManage Work Collector, along with a user account and password that will be used by the Collector.

M365 Cloud Repositories

For Microsoft M365 data repositories, access for Cloud Collectors is achieved by registering an application in Azure AD. Each repository type requires a different set of API permissions to be granted, as outlined in the following Knowledge Base Articles (KBAs).

Configuring Azure AD for SharePoint Online (📺)

Configuring Azure AD for Exchange (📺)

Configuring Azure AD for Teams

Configuring Azure AD for OneDrive

When setting up your application, make a note of your Tenant ID, App ID and assigned Secret Value. These will be used when configuring credentials within the ActiveNav Cloud platform.

Google Workspace Drive Cloud Repository

Access to Google Workspace Drive is facilitated by configuring a Service Account in the Google Cloud environment. You must also identify or create a Google user account that will be used to discover Google Workspace users and to access Shared Drive data.

The process for preparing the service account and associated user account is outlined in this KBA:

Creating a Google Workspace Service Account & User Account

The JSON file downloaded during the Service Account creation and the identified user account will be used later to configure credentials in the ActiveNav Cloud platform.

iManage Cloud Repository

For access to an iManage Cloud you must create a user account that will be entered in the ActiveNav Cloud credentials interface as a credential of type "iManage Cloud Ropc".

You must also request access to the ActiveNav Cloud application within the iManage Cloud instance.

Adding iManage Cloud Credentials

Additional Considerations for Cloud Repositories

Discovery of content within a Cloud repository will entail the transfer of data from the repository host to the Collector that is processing it. In some cases there may be cost implications for the transfer of data from a Cloud repository. You should review the terms and conditions for your repository to ensure you understand any such costs.

1.5 Align Key Staff Resources

ActiveNav Cloud provides a lens to observe sensitive data within your organization, but users must assess whether data identified is appropriate to hold and if it is properly secured. You should identify key users within your organization that can provide expertise for the Business Units you have defined, and provide them with information about your project, the ActiveNav Cloud project, and the role you will want them to perform.

Example

At Prest Team, the Project Manager Alexander briefed Maxine Steele, Anastasia Romano and Leonardo Rossi about the project. Their goal is to ensure that unsecured and inappropriate sensitive data is not found within their respective data repositories. Leonardo in particular is briefed on the need to ensure that the user areas in Google Workspace Drive from the RBTSkills acquisition are well organized and do not hold stale data or objects with sensitive content.

1.6 Identify Business Units

An organization will naturally hold information relating to a range of different business areas – e.g. Finance, IT, Sales, etc. When data has been discovered that crosses a range of business areas, you will need to assign specific Data Analysts to focus on the content that is relevant to them.

Within ActiveNav Cloud, this is achieved by configuring Business Units. Business Units can be used in a range of ways depending on how the data is distributed across repositories. Some examples include:

In a simplest case, a business may have a single file server with file shares for each department. In this case, the Application Administrator can create one Business Unit per department and assign a single location – the relevant file share – to each Business Unit.
In larger organizations, there will normally be multiple file servers, perhaps for different offices, where data relevant to a particular business area can be found in multiple locations. In this case, multiple data paths would be assigned to a Business Unit, associating relevant locations from different file servers into a logical collection.
In some organizations, related data may be split across different repository types. For instance, it may be appropriate to combine SharePoint Online sites, Microsoft Teams and file shares into a single Business Unit for a particular business area.

Careful creation of Business Units will enable you to direct appropriate users to attend to the data areas that they are responsible for, allowing them to assess the state of the data and the impact of their activities to clean up any issues identified. Note that a single data path cannot be assigned to multiple Business Units.

While the Business Units that are required for your project and the data paths that are assigned to them will naturally evolve over time, establishing an initial plan for Business Units and associated locations will ensure that you will be ready to work with results as soon as your discovery process is underway.

The initial collection of Business Units will normally be a logical reflection of the structure of your business. Using your list of repositories and consulting with IT staff, you can create an initial mapping of data paths to these Business Units. This mapping can then be used to configure Business Units in ActiveNav Cloud.

Example

Prest Team reviewed their list of locations and identify 3 logical business areas to create in the first instance: “Sales”, “User Data”, and “Finance and Operational Data”. Data paths are mapped to these groups as their cleanup project begins.

Because the locations within the Prest Team repository list are already organized in a functional manner, they are able to use Business Units to simply collate primary repository locations together in logical groups. As noted above, larger organizations will normally have more complex distributions of data, and in such cases, it is expected that lower level locations within repositories will be identified to be mapped into appropriate Business Units.

The proposed Business Units for Prest Team are:

Sales

FileShare Content : \\RBT-543\Sales
Microsoft Teams : “Sales” Team
SharePoint Online : “Sales” site collection

User Data

Exchange Online : User Mailboxes
Google Workspace : Personal Drives

Finance & Operational Data

FileShare Content : \\RBT-543\Operations,\\KS-NAS-02\Operations,\\KS-NAS-01\Finance

SharePoint Online : “Finance” site collection

These are recorded in a spreadsheet to support straightforward import into ActiveNav Cloud as the data locations are discovered.

Maxine Steele will review the findings in the Sales Business Unit, ensuring that no sensitive data has been retained, e.g., financial information about prospects. Leonardo Rossi will be tasked with reviewing the findings for the User Data Business Unit to ensure that users are not inadvertently storing objects containing PII/PHI in their personal storage areas. Anastasia Romano will review the findings in the Finance & Operational Data Business Unit, ensuring that supplier and customer data is appropriately organized and secured.

Data that is not assigned to a specific Business Unit is grouped together under the “No Business Unit” category in ActiveNav Cloud. Alexander will review the findings within these unassigned locations, and he may decide to add additional data paths to the initial set of Business Units or identify new Business Units to create.

1.7 Document Policies

In order to implement and attain sensitive data compliance across your organization, it is essential to identify, define and document the policies that will drive your organization to success. When looking for sensitive data within your content, it is essential to understand and document what specific data (e.g. credit card information) is relevant to your organization, and how they should be managed from a security and compliance perspective, in line with your policies.

You should establish which data types are expected, and those that are not. For acceptable data types, you should establish in what form they are expected to be found, in which locations it is acceptable to be found, and the security controls that should be in place.

ActiveNav Cloud is pre-configured with a feature extraction and scoring configuration that focuses on commonly found sensitive data such as national identifiers, financial identifiers, etc.

If your business has specific types of sensitive data and/or applicable regulations, you can customize the out of the box configuration with additional Impact Areas, Data Element Types or Data Elements.

ActiveNav staff can provide guidance on customization of feature extraction and scoring configuration, but this is best to be carried out in parallel to the initial phases of discovery rather than allowing it to delay deployment.

Scores Overview

Next Step

Service Deployment and Configuration outlines the work that is required to progress your project plan by deploying On-Premise Collectors, configuring them, and validating their operation.