3. First Discovery | Data Profiling and Monitoring

3.1 Activity Summary

Activity Description

This stage may represent the first full scale discovery of the project, or the first discovery for a specific repository type. It will enable you to understand:
  • The speed with which you can process this type of repository.
  • The size of your data source.
  • The responsiveness of your data to your chosen rule configuration.
  • How to review and interpret results.

Goals

  • Begin to understand the characteristics of your data
  • Act on initial findings

Participants

Application Administrator, Data Analysts

Pre-requisites

  • System has been deployed, configured, and validated.
  • The inventory of data sources is available with valid credentials.

Outputs

Data has been discovered and locations assigned to Business Units for review

Activities

Service Deployment RuleConfigurationSelectLocationConfigureDataSourceMonitorDataSource2SetupBusinessUnitsVisualizeResults Next Step

3.2 Rule Configuration

The ActiveNav Cloud platform includes Feature Extraction rules and a Scoring model that are designed to focus on common types of sensitive personal and financial data that indicate areas of risk in your repositories.

For a Data Profiling use case, the use of Feature Extraction is an optional element of the data discovery process. If you wish to target specific attributes of your data for profiling purposes then you may wish to extend the default Feature Extraction and Scoring configuration. The creation of new Feature Extraction rules is covered in separate documentation.

One common option for profiling is the age of data. Your business may have specific age ranges that are significant and these may vary by business unit. There are built in data ranges for profiling but you may update these to align with any specific data retention requirements that apply to your data. 

Creating Date Ranges

If you do plan to develop your own configuration, you have a choice of doing so before you begin the Discovery process or to start Discovery in parallel to the customization of rules. If you choose the parallel path then you will be able to re-process data at a later date to apply updated configuration settings.

3.3 Select Location

When you are ready to proceed with Discovery the first step is to choose the location you will configure. If this is the very first “real” Data Source then you should aim to select one of your smaller locations to be able to complete the step in a reasonable amount of time.

3.4 Configure Data Source

The creation of a data source for a complete location is the same as the process used when validating Collector operation during deployment.

When the focus is on data profiling, then you may be able to choose the Discovery Only mode for data sources. This will enable you to perform profiling using metadata such as file age and type.

If you are targeting features within object names or object content then you should choose  the “Discovery with Feature Extraction” mode to ensure that object content is inspected  to apply feature extraction rules .

Create a Data Source

Your Data Source will be scheduled for Discovery as soon as the relevant Collector Group has a Collector available.

Example

Prest Team setup a Data Source for \\KS-NAS-02\IT as their first Discovery. As a member of the IT team, Alexander is familiar with the volume and type of data present in this data share and will be able to quickly assess the effectiveness of the Discovery process and learn the way that ActiveNav Cloud allows findings to be reviewed. He can then use his experience to outline standardized procedures to his Data Analyst team.

3.5 Monitor Data Source

A Discovery with Feature Extraction takes longer to process due to the need to access document content; the exact rate at which a Data Source will be processed is dependent on a range of environmental factors:

  • The speed with which containers and objects can be retrieved.
  • The specification of the Collector host.
  • The number of Collectors available to process content.
  • The average object size and the mix of object formats

The ActiveNav Cloud user interface shows a live indication of the throughput of data for the Data Source which will enable you to review progress. After the process has started running you can use the achieved throughput to estimate the total duration of the Discovery activity.

The link below describes the possible values for the final status of the Data Source when the Discovery process finishes.

https://support.activenav.com/understanding-data-source-status

Normally you expect to see the Data Source finish with “Completed” or “Completed with Warnings” status. If you see another status you may need to restart the Discovery process using the “Refresh All” option on the Data Source menu.

https://support.activenav.com/data-source-overview

3.6 Setup Business Units

As data is discovered you will be able to begin assigning data paths to your Business Units. Data paths must have been discovered before they can be assigned to a business unit – as a result it is normal for the set of data paths assigned to a business unit to grow as the number of data sources in the ActiveNav Cloud platform grows.

You can use your Business Unit mapping that your began during the preparation phase to guide this process, but it is also normal that the desired mapping evolves as you become more familiar with the discovered data.

One option to configure Business Units efficiently is to use the option to import Data Source locations in bulk and assign the locations to Business Units at the same time. Alternatively, if Data Source locations do not map directly to your Business Units you can use the Business Unit mapping to support import of Business Unit paths.

https://support.activenav.com/bulk-load-data-sources

https://support.activenav.com/how-to-create-business-units


3.7 Visualize Results

With Business Units configured, the Home page views within ActiveNav Cloud allow the scored results for discovered data to be reviewed.

Review of these primary reporting views will allow you to understand the impact that new Discovery results have made to the overall volume of data being managed and the extent of sensitive data identified. The following section will describe the recommended steps for acting on these findings.

Overview of results

The Administrator home page provides a summary of the total data within the tenant and the distribution of data by file type, repository, and geo-location.

After a very first discovery, or the first discovery for a specific repository or geo-location this view can provide a simple confirmation of the data found.

https://support.activenav.com/the-home-page

Data Profile

 The primary view for understanding the details of your data profile is the Profile home page. In the Profile view you can select Business Units of interest and access charts that summarize the age, size and type distribution of the data in each Business Unit.

https://support.activenav.com/profile-home-page-1

Specific Data

If you use a custom scoring and feature extraction configuration to identify specific data, then you can assess these findings using the Analyst home page. As with the Profile home page it is driven by a user’s chosen selection of Business Units.

For each Business Unit the Scoring Configuration is presented, with the scores found for the Business Unit indicated by color and value. This allows the user to rapidly assess the data profile of the chosen Business Units according to the configuration, and to see which Business Units have the data least compliant to the desired profile and in which aspect of the Scoring Configuration hierarchy.

https://support.activenav.com/analyst-home-page

Next Step

Profile Review allows an Analyst to dig deeper into the details of each container and object. Explorer methods for creating object lists for targets of remediation activities.