Overview | Data Profiling and Monitoring

Use Case Description

This Playbook describes how to perform Data Profiling using ActiveNav Cloud by discovering the objects within unstructured data repositories and applying Feature Extraction rules to identify items of interest.

A data profile provides insight into key characteristics of your repositories that can inform and drive information management activities. You can focus on intrinsic properties of objects such as their size, type and age to understand more about unstructured data use in your organization.

The visualization tools within ActiveNav Cloud will allow users to assess the overall make up of data, focusing on elements such as data age, type and size.

Retaining dark data that no longer has value to the organization represents cost and risk due to the unknown elements that may be contained. The review capabilities of the platform can be used to involve users in identifying data that is no longer needed and can safely be removed. After the review process, validated findings can be exported to be remediated according to the preferred approach of the business.

After performing the initial discovery and acting on the results, refresh of the discovered data on a periodic schedule allows easy monitoring of the status of the data catalog, and supports the maintenance of the desired data profile.

Use Case Flow

Our recommended project flow for the Data Profiling use case is built upon the core steps outlined in the following sections.

In essence there is no one size fits all approach. The size of your project, the user group that will be involved, and the type of repositories you target are all factors that will influence the best way to follow the overall process in your organization.

You may have already begun to use ActiveNav Cloud, in which case you can select specific elements of the Playbook to guide your use of the platform.

Regardless of the pattern you choose, we recommend paying close attention to the details outlined in the preparation phase to ensure that you have all necessary elements in place.

Linear approach for smaller projects

For smaller scale projects which target a smaller number of repositories or areas of the business, you may be able to achieve the majority of preparation, planning, and collector deployment activities in advance of beginning your discovery process.

You can then gradually work through the Discover > Review > Extend Scope process to build your unstructured data catalog and assess results.

Iterative approach for larger projects

For larger projects, it may not be reasonable to try and prepare everything before you begin your discovery activities, and it would present too much of a delay before you begin to see results and learn about your data.

In the latter case, we recommend taking a pragmatic iterative approach:

  1. Prepare sufficiently to discover your highest priority locations - deploying Collectors, acquiring credentials and planning Business Unit associations.

  2. Begin the Discover - Review process for this location.

  3. Continue preparation for additional locations to be discovered

  4. Continue to prepare further locations while discovery and cleanup has started for earlier locations.

  5. Repeat these steps to steadily build out your unstructured data catalog in parallel to review activities.

1. Preparation

Activity Description

Preparing the key elements of your data discovery project will help ensure that the initial deployment of ActiveNav Cloud will run as smoothly as possible. Taking the time to identify locations, prepare credentials, and to engage with key users will allow you to achieve results as quickly as possible.

Goals

A project plan is in place to allow the initial deployment of ActiveNav Cloud to run smoothly.

Participants

Project Sponsor, IT team, Project / Program Manager

Pre-requisites

  • Cloud Services agreement is in place

  • Project / Program manager identified

Outputs

  • Top level plan outlines goals and timeline.

  • Initial inventory of repositories to be discovered is ready.

  • Credentials prepared for access to repositories

  • Deployment architecture for File Share Collectors is understood.

2. Service Deployment and Configuration

Activity Description

This stage outlines the work that is required to progress your project plan by deploying on premise connectors, configuring them, and validating their operation.

Goals

  1. On Premise collectors are operational.

  2. Discovery process has been validated.

Participants

Project Sponsor, IT team, Project / Application Administrator

Pre-requisites

  • Project Plan defined.

  • Repository inventory prepared.

  • Deployment architecture defined.

Outputs

  • Data discovery process validated.

  • Initial users have access.

3. First Discovery

Activity Description

This stage may represent the first full scale discovery of the project, or the first discovery for a specific repository type. It will enable you to understand:

  • The speed with which you can process this type of repository.
  • The size of your data source.
  • The responsiveness of your data to your chosen rule configuration.
  • How to review and interpret results.

Goals

Begin to understand the characteristics of your data; act on initial findings.

Participants

Application Administrator, Data Analysts

Pre-requisites

  • System has been deployed, configured, and validated.

  • The inventory of data sources is available with valid credentials.

Outputs

Data has been discovered and locations assigned to Business Units for review.

4. Profile Review and Decision

Activity Description

This stage represents the review of the data profile characteristics and how to use the insights to make informed remediation decisions. It will enable you to understand:

  • The shape of your data from several perspectives (age, size, content composition and level of duplication).

  • How to review and interpret results to make informed decisions.

  • The speed with which you can identify responsive content that requires a remediation action. (e.g., stale date or non-business data types).

  • How to export a list of responsive files and hand off for application of the appropriate action.

Goals

Understand the shape of your data both globally and by ownership.

Identify areas where data can be remediated.

Participants

 Analyst, Business Unit data owners, IT Team

Pre-requisites

  • Either Discovery only or Discovery and Feature Extraction performed on content locations (data sources).

  • Content locations mapped to a business .

Outputs

A manifest of responsive files or containers that require an action has been exported from ActiveNav Cloud.

5. Remediation and Workflow Integration

Activity Description

Once Data Analysts have completed the review activities that are appropriate for the organization, or for specific Business Units, then the findings must be remediated.

There are a number of options that can be considered depending on the scale of the project and any existing business processes.

Goals

Reduce overheads by creating an efficient workflow and capture appropriate audit records.

Participants

Project / Program Manager, IT Team

Pre-requisites

  • Manifest of responsive files has been exported.

  • Required action(s) are understood.

Outputs

  • Objects identified as non-compliant with desired profile are remediated as required.

  • Business processes are refined to reflect findings and to reduce proliferation of data that need not be retained.

  • ActiveNav Cloud remediation is aligned with existing business practices.

6. Extend Discovery Scope

Activity Description

Once the discovery and review process has been established, it should be scaled out to extend the discovered catalog of unstructured data to address further locations and repositories.

Goals

Address the entire data scope and build a complete view of your information landscape.

Participants

Application Administrator, Data Analysts, IT team

Pre-requisites

  • Discovery and cleanup processes are established.

  • Configuration and preparation is complete for additional locations.

Outputs

Scale out the newly created discovery and review process, and extend the discovered catalog of unstructured data to address further locations and repositories.

7. Monitoring and Response

Activity Description

Once you have built a catalog of your unstructured data with ActiveNav Cloud, and addressed the findings of the initial discovery, you will enter a monitoring phase.

At this point you can use regular re-discovery of Data Sources to ensure that you do not allow re-growth of content within your unstructured data that does not comply with the desired data profile

Goals

  • Establish a periodic refresh for all Data Sources.

  • Maintain alignment with organizational policy.

  • Move to business as usual for management of data profile.

Participants

Project Manager, Application Administrator, Data Analysts, IT team

Pre-requisites

Initial discovery and review has been performed on the Data Sources to be monitored.

Outputs

Up to date visibility of data profile.