Overview | Data Profiling and Monitoring

Use Case Description

This Playbook describes how to perform Data Profiling using ActiveNav Cloud by discovering the objects within unstructured data repositories and applying Feature Extraction rules to identify items of interest.

A data profile provides you insight into key characteristics of your repositories that can inform and drive information management activities. You can focus on intrinsic properties of objects such as their size, type and age to understand more about unstructured data use in your organization.

    The visualization tools within ActiveNav Cloud guide will allow users to assess the overall make up of data, focusing on elements such as data age, type and size. Retaining dark data that no longer has value to the organization represents cost and risk due to the unknown elements that may be contained.

    The review capabilities of the platform can be used to involve users in identifying data that is no longer needed and can safely be removed. After the review process validated findings can be exported to be remediated according to the preferred approach of the business.

    After performing the initial discovery and acting on the results, refresh of the discovered data on a periodic schedule allows easy monitoring of the status of the data catalog, and supports the maintenance of the desired data profile.

    Use Case Flow

    1-Preparation 2-ServiceDeployment 3-FirstDiscovery 5-WorkflowIntegration  
    1-Preparation 6-ExtendedDiscoveryScope 2-ServiceDeployment 3-FirstDiscovery 5-WorkflowIntegration
    1-Preparation 6-ExtendedDiscoveryScope 2-ServiceDeployment 3-FirstDiscovery 5-WorkflowIntegration
    7-MonitoringAndResponse          

    Our recommended project flow for the Data Profiling use case is built upon the core steps outlined in the following sections.

    In essence there is no one size fits all approach. The size of your project, the user group that will be involved, and the type of repositories you target are all factors that will influence the best way to follow the overall process in your organization.

    You may have already begun to use ActiveNav Cloud, in which case you can select specific elements of the Playbook to guide your use of the platform.

    Regardless of the pattern you choose we recommend paying close attention to the details outlined in the Preparation phase to ensure that you have all necessary elements in place.

    Linear approach for smaller projects

    For smaller scale projects which target a smaller number of repositories or areas of the business you may be able to achieve the majority of preparation, planning, and collector deployment activities in advance of beginning your discovery process.

    You can then gradually work through the Discover > Review > Extend Scope process to build your unstructured data catalog and assess results.

    Iterative approach for larger projects

    For larger projects, it may not be reasonable to try and prepare everything before you begin your discovery activities and it would present too much of a delay before you begin to see results and learn about your data.

    In the latter case we recommend taking a pragmatic iterative approach:

    1. Prepare sufficiently to discover your highest priority locations - deploying Collectors, acquiring credentials and planning Business Unit associations.
    2. Begin the Discover - Review process for this location.
    3. Continue preparation for additional locations to be discovered
    4. Continue to prepare further locations while discovery and cleanup has started for earlier locations
    5. Repeat these steps to steadily build out your unstructured data catalog in parallel to review activities

    1. Preparation

    Activity Description

    Preparing the key elements of your data discovery project will help ensure that the initial deployment of ActiveNav Cloud will run as smoothly as possible. Taking the time to identify locations, prepare credentials, and to engage with key users will allow you to achieve results as quickly as possible.

    Goals

    A project plan is in place to allow the initial deployment of ActiveNav Cloud to run smoothly.

    Participants

    Project Sponsor, IT team, Project / Program Manager

    Pre-requisites

    • Cloud Services agreement is in place
    • Project / Program manager identified

    Outputs

    • Top level plan outlines goals and timeline.
    • Initial inventory of repositories to be discovered is ready.
    • Credentials prepared for access to repositories.
    • Deployment architecture for File Share Collectors is understood.

    2. Service Deployment and Configuration

    Activity Description

    This stage outlines the work that is required to progress your project plan by deploying on premise connectors, configuring them, and validating their operation.

    Goals

    1. On Premise collectors are operational.
    2. Discovery process has been validated.

    Participants

    Project Manager, IT Team, Application Administrator

    Pre-requisites

    • Project plan defined.
    • Repository inventory prepared.
    • Deployment architecture defined.

    Outputs

    • Data discovery process validated.
    • Initial users have access.

    3. First Discovery

    Activity Description

    This stage may represent the first full scale discovery of the project, or the first discovery for a specific repository type. It will enable you to understand:
    • The speed with which you can process this type of repository.
    • Validate the size of your data source.
    • The responsiveness of your data to your chosen rule configuration.
    • How to review and interpret results.

    Goals

    Begin to understand the characteristics of your data; act on initial findings

    Participants

    Application Administrator, Data Analysts

    Pre-requisites

    • System has been deployed, configured, and validated.
    • The inventory of data sources is available with valid credentials.

    Outputs

    Data has been discovered and locations assigned to Business Units for review

    4. Profile Review

    Activity Description

    This stage represents the review of the data profile characteristics and how to use the insights to make informed remediation decisions. It will enable you to understand:

    • The shape of your data from several perspectives (age, size, content composition and level of duplication)
    • How to review and interpret results to make informed decisions
    • The speed with which you can identify responsive content that requires a remediation action. (e.g. stale date or non-business data types)
    • How to export a list of responsive files and hand off for application of the appropriate action

    Goals

    Understand the shape of your data both globally and by ownership, and identify areas where data can be remediated 

    Participants

    Analyst, Business Unit data owners

    Pre-requisites

    • Either Discovery only or Discovery and Feature Extraction performed on content locations (data sources).
    •  Content locations mapped to business units 

    Outputs

    A manifest of responsive files or containers that require an action has been exported from ActiveNav Cloud

    5. Remediation Workflow

    Activity Description

    Once Data Analysts have completed the review activities that are appropriate for the organization, or for specific Business Units, then the findings must be remediated. There are a number of options that can be considered depending on the scale of the project and any existing business processes.

    Goals

    Reduce overheads by creating an efficient workflow and capture appropriate audit records.

    Participants

    Project / Program Manager, IT Team

    Pre-requisites

    • Manifest of responsive files has been exported
    • Required action(s) are understood

    Outputs

    • Objects identified as sensitive are remediated as required

    6. Extend Discovery Scope

    Activity Description

    Once the discovery and review process has been established, it should be scaled out to extend the discovered catalog of unstructured data to address further locations and repositories.

    Goals

    Address entire contracted data scope and build a complete view of sensitive data

    Participants

    Application Administrator, Data analysts, IT team

    Pre-requisites

    • Discovery and cleanup processes established.
    • Configuration and preparation complete for additional locations.

    Outputs

    Prevalence and management of sensitive data understood.

    7. Monitoring and Response

    Activity Description

    Once you have built a catalog of your unstructured data with ActiveNav Cloud, and addressed the findings of the initial discovery, you will enter a monitoring phase.

    At this point you can use regular re-discovery of Data Sources to ensure that you do not allow re-growth of content within your unstructured data that does not comply with the desired data profile

    Goals

    • Establish a periodic refresh for all Data Sources.
    • Maintain alignment with organizational policy
    • Move to business as usual for management of data profile

    Participants

    Project Manager, Application Administrator, Data Analysts, IT team

    Pre-requisites

    Initial discovery and review has been performed on the Data Sources to be monitored.

    Outputs

    Up to date visibility of data profile characteristics