Data governance is the cornerstone of an organisation's success in managing, using and protecting data. In a landscape that is increasingly reliant on data-driven insights, understanding the nuances of data governance is imperative. This article explores the fundamental principles and essential elements that are critical to building a robust data governance framework. From defining data governance to exploring its critical components, discover how establishing a solid foundation can guide organisations towards effective data management and decision making.
WHAT IS DATA GOVERNANCE?
Governance is the act of making and enforcing decisions and data governance is the act of making and enforcing those decisions for data within an organisation. Decisions are not made in a vacuum and successful data governance is an integral part of a data strategy describing the organisation's approach to acquisition, management, use and disposal of data. Data governance is the set of policies and processes for the use of data in the organisation.
Establishing data governance provides assurance to the accountable executives, the board and other stakeholders of an organisation that data is managed appropriately without the need to be directly involved in every data decision. The critical attributes of data governance are:
Accountability: clearly defining data ownership.
Transparency: ensuring clarity of what data is collected, used and shared.
Proportionality: implementing governance measures proportional to associated data risk.
Limitation: collecting and using data solely for legitimate purposes.
Accuracy: knowing data completeness, currency and accuracy.
Why is Data Governance essential?
Data governance is important for every organisation for two key reasons. The first is obviously the value of data. Organisations are increasingly moving towards data driven decision making and increasing the use of AI with its deep dependence on data. Organisations gain competitive advantage with data by better understanding potential customers, stakeholders, partners and critically, their own operation.
The second shouldn’t be a surprise but often is. Data is a liability. Every byte of data held by an organisation is a liability with associated costs for storage and management. Data is subject to legal restrictions on what can be held, for how long and what may not be lost. To be valuable, data must be accessible to those who should have access and protected from those without rights. Data needs to be findable to avoid duplication and promote efficiency. It often requires additional information or metadata to record its origin, age, licence and quality. Data theft or data loss can expose an organisation to substantial regulatory and reputation risks.
Without appropriate governance, the opportunities and risks associated with data cannot be managed in a systematic way.
DATA GOVERNANCE FRAMEWORK
There are many data governance frameworks available including those by some of the larger consultancies. However, understanding the purpose and importance of each governance element is critical. Without this understanding, effective application of governance is unlikely.
Successful data governance needs to be part of an encompassing data strategy. The strategy will elucidate the value and risk of data and the organisation's appetite to that risk and its approach to liberating the value in the data. Without a strategy, there is nothing to govern against and decisions become matters of opinion.
Assuming we have a sound data strategy, what aspects should the data governance framework cover?
Purpose
Perhaps the single most important aspect of data and its governance is to define and record the purpose of the data. It is remarkable how many organisations are maintaining data stores for data they have no purpose for or don’t even know about. Data purpose provides essential context for governance decisions, ensuring alignment of data acquisition, storage and use to the purpose. It also informs non-functional data storage requirements, including resilience and accessibility.
The purpose of data justifies its cost. The purpose identifies the value, avoided cost or mitigated risk for the organisation. Purposes change or disappear which we discuss below under “roadmap”. The purpose is key in selecting an appropriate owner for the data.
Ownership
The owner of the data is the person accountable for the day to day decisions regarding the data. This is not the data storage system owner but the business owner of the data asset. Their role is to ensure the purpose is up-to-date and that the decisions about the use of the data is aligned to the provenance, rights, licences and quality of the data. They understand the security risk and implications of the data. They own the data, the roadmap for the data and the cost of managing it. In the UK Government terminology, they are the Information Asset Owner. If there are changes to the sources or sinks of the data, it is the owner who approves the source or accepts the new sink.
The owner understands the data and is accountable for its quality and ensuring it is fit for purpose. They should be familiar with its sources and the terms of use and familiar with the sinks, ensuring that the data’s use fits within its licence.
Without an identified owner the value and liability of the data is not understood. This can lead to issues including ongoing costs of unused data, out-of-date access controls, breach of data rights or licences, inappropriate quality, duplication and nugatory work.
Roadmap
All data in an organisation needs a roadmap. It should include major changes in purpose or scope and include plans for the retirement of the data. The roadmap is managed by the owner and helps to communicate the anticipated future state of the data store. This is particularly useful in sink and application planning.
The roadmap should anticipate changes in organisational strategy, legislation and supplier agreements. It provides useful input into negotiating supplier agreements and licences and allows proactive decision making to support future needs.
Sources
Understanding data sources offers valuable insights into data provenance. Some data is directly created from a measurement or human input but most data is in some sense derived. It may be the combination of multiple sets of data, the result of some processing or often both.
Data sources are the suppliers of data. They may be internal or external suppliers but require management regardless. Data sources can have planned and unplanned outages, changes in format, frequency, latency, licence or rights that could affect the use, value or risk of the data.
Different data sources have different attributes in terms of rights, licences, availability and data latency, the latter being particularly important for real time data feeds. Sourcing data typically has an ongoing cost, whether charged by external suppliers or incurred internally. Understanding these costs is critical for meaningful cost attribution. Managing and cataloguing data sources avoids duplication and improves the efficiency of data acquisition.
Sinks
Just as a good understanding of the sources of data is a critical element of data governance, so too is understanding data sinks. Occasionally data is stored without further use for regulatory or risk mitigation purposes or the data may be used in situ. Often the data is passed to a destination or sink for further processing, analysis or retrieval.
The data sinks of a data store may constitute part or all of the purpose of the data store. The sinks will require specific data with specific age or currency, rights and accuracy. Management of the store and management of the sources must be aligned to the requirements of the sinks. Data may also need to be correlated with data from other sources.
Rights and licences
All data has associated rights, either owned by the organisation or by a third party, with open or restricted usage rights and associated licences. A contract may have been entered into to acquire or make use of the data. In a large number of cases, the rights and licence are restrictive and legally binding. Data, particularly personal data, is also subject to specific legislation depending on the locality. Breach of the rights, licences or legislation can result in significant penalties for an organisation and its officers.
It is therefore essential that all data acquired and stored has its rights and licences recorded appropriately. These records should be kept up to date and reviewed regularly. Rights expire and licences often have end dates. It should not be assumed that data can be used just because it is accessible.
The rights, licences and legislation can also be restrictive in how it is used. Personal data collected for one purpose will not be generally useable for other purposes under GDPR1. Organisations have a responsibility to understand and comply with the regulatory and commercial restrictions of data. The data owner is responsible for this compliance.
Management
Data management is necessary to support the owner in discharging all their responsibilities. The management of data encompasses a number of important activities including technology selection and maintenance, storage, administration service and delivery continuity. Data management also includes the removal and/or archiving of old data from the store.
Appropriate technology is essential in meeting the access needs of sources, sinks and users and allowing the platform and data to be administered. The technology needs to support the required level of service including availability and redundancy ensuring that critical data assets are available and not lost in the event of a failure.
The data needs to be accessible to be useful and the technology needs to ensure users with appropriate permissions can find and obtain the data they need easily. The technology is also used in all aspects of data maintenance up to and including deletion.
Cost
Data has storage, management and governance costs that should be aligned with its value. There are direct costs associated with physical storage, backups, management and networking costs for data movement. In addition, there are costs associated with the oversight, management and governance of the data as an asset and a liability for the organisation.
There are also licence costs which can be substantial in addition to the cost of physically acquiring, cleansing and managing the data. Finally, there is often some relatively small cost in disposing of data. It should also be remembered that the organisation carries a risk of additional costs in the unlikely event of breach of licence or legislation.
It is important for an organisation through the data owner to understand the full cost of data from acquisition through to disposal together with risk exposure all of which should be offset against the value of the data to the organisation to fully understand the return on investment.
Quality
Data quality is critical to any organisation. Its quality has a direct impact on the decisions made based on it, the standard of the AI models built on it, and the overall value of the products using it. Data governance focuses on two key aspects of data quality: measurement and improvement.
Measurement is important as it is the basis for detecting changes in quality and identifying when data quality could cause issues in use. It may be more appropriate to take a data service offline than to deliver low quality data. Measuring quality is also useful in holding sources and suppliers to account for the quality of delivered data.
Data quality issues include currency, gaps and erroneous values. Sometimes the currency, gaps or erroneous values are obvious, sometimes they are not. These defects may be addressable through cleansing.
WHY IS DATA GOVERNANCE A CRITICAL ISSUE FOR ORGANISATIONS?
Data is a significant asset and liability for organisations. Data governance is critical for all organisations to ensure decisions about data are appropriately made and should be an integral part of a data strategy.
Frameworks provide a structure for data governance and should include essential elements above. The purpose and roadmap of the data and an accountable owner are central to any approach which should also cover sources, sinks, rights and licences, management, cost and quality. Only by explicitly addressing each of these areas can assurance be given to accountable executives, the board and other stakeholders of an organisation.
1 GDPR: General Data Protection Regulation is a regulation for data protection and privacy in the European Union and European Economic Area.
Comments