Data quality: How do you quantify yours?
Being able to measure the quality of your data is a vital to the success of any data management programme. Here, Peter Eales, Chairman of KOIOS Master Data, explores how you can define what data quality means to your organization, and how you can quantify the quality of your dataset.
In the business world today, it is important to provide evidence of what we do, so, let me pose this question to you: how do you currently quantify the quality of your data?
If you have recently undertaken an outsourced data cleansing project, it is quite likely that you underestimated the internal resource that it takes to check this data when you are preparing to onboard it. Whether that data is presented to you in the form of a load file, or viewed in the data cleansing software the outsourced party used, you are faced with thousands of records to check the quality of. How did you do that? Did you start by using statistical sampling? Did you randomly check some records in each category? Either way, what were you checking for? Were you just scanning to see if it looked right?
The answer to these questions lies in understanding what, in your organization, constitutes good quality data, and then understanding what that means in ways that can be measured efficiently and effectively.
The Greek philosophers Aristotle and Plato captured and shaped many of the ideas we have adopted today for managing data quality. Plato’s Theory of Forms tells us that whilst we have never seen a perfectly straight line, we know what one would look like, whilst Aristotle’s Categories showed us the value of categorising the world around us. In the modern world of data quality management, we know what good data should look like, and we categorise our data in order to help us break down the larger datasets into manageable groups.
In order to quantify the quality of the data, you need to understand, then define the properties (attributes or characteristics) of the data you plan to measure. Data quality properties are frequently termed “dimensions”. Many organizations have set out what they regard as the key data quality dimensions, and there are plenty of scholarly and business articles on the subject. Two of the most commonly attributed sources for lists of dimensions are DAMA International, and ISO, in the international standard ISO 25012.
There are a number of published books on the subject of data quality. In her seminal work Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information™ (Morgan Kaufmann, 2008), Danette McGilvary emphasises the importance of understanding what these dimensions are and how to use them in the context of executing data quality projects. A key call out in the book emphasises this concept.
“A data quality dimension is a characteristic, aspect, or feature of data. Data quality dimensions provide a way to classify information and data quality needs. Dimensions are used to define, measure, improve, and manage the quality of data and information.
The data quality dimensions in The Ten Steps methodology are categorized roughly by the
techniques or approach used to assess each dimension. This helps to better scope and plan a project by providing input when estimating the time, money, tools, and human resources needed to do the data quality work.
Differentiating the data quality dimensions in this way helps to:
1) match dimensions to business needs and data quality issues;
2) prioritize which dimensions to assess and in which order:
3) understand what you will (and will not) learn from assessing each data quality dimension, and:
4) better define and manage the sequence of activities in your project plan within time and resource constraints”.
Laura Sebastian-Coleman in her work Measuring Data Quality for Ongoing Improvement, 2013 sums up the use of dimensions as follows:
“if a quality is a distinctive attribute or characteristic possessed by someone or something, then a data quality dimension is a general, measurable category for a distinctive characteristic (quality) possessed by data.
Data quality dimensions function in the way that length, width, and height function to express the size of a physical object. They allow us to understand quality in relation to a scale or different scales whose relation is defined. A set of data quality dimensions can be used to define expectations (the standard against which to measure) for the quality of a desired dataset, as well as to measure the condition of an existing dataset”.
Tim King and Julian Schwarzenbach in their work, Managing Data Quality – A practical guide (2020) include a short section on data characteristics, that also reminds readers that when defining a set of (dimensions) it depends on the perspective of the user; back to Plato and his Theory of Forms from where the phrase “beauty lies in the eye of the beholder” is derived. According to King and Schwarzenbach quoting DAMA UK, 2013, the six most common dimensions to consider are:
- Accuracy
- Completeness
- Consistency
- Validity
- Timeliness
- Uniqueness
The book also offers a timely reminder that international standard ISO 8000-8 is an important standard to reference when looking at how to measure data quality. ISO 8000-8 describes fundamental concepts of information and data quality, and how these concepts apply to quality management processes and quality management systems. The standard specifies prerequisites for measuring information and data quality and identifies three types of data quality: syntactic; semantic; and pragmatic. Measuring syntactic and semantic quality is performed through a verification process, while measuring pragmatic quality is performed through a validation process.
In summary, there is plenty of resource out there that can help you with understanding how to measure the quality of your data, and at KOIOS Master Data, we are experts in this field. Give us a call and find out how we can help you.
Contact us
In summary, there is plenty of resource out there that can help you with understanding how to measure the quality of your data, and at KOIOS Master Data, we are experts in this field. Give us a call and find out how we can help you.
+44 (0)23 9387 7599
About the author
Peter Eales is a subject matter expert on MRO (maintenance, repair, and operations) material management and industrial data quality. Peter is an experienced consultant, trainer, writer, and speaker on these subjects. Peter is recognised by BSI and ISO as an expert in the subject of industrial data. Peter is a member ISO/TC 184/SC 4/WG 13, the ISO standards development committee that develops standards for industrial data and industrial interfaces, ISO 8000, ISO 29002, and ISO 22745. Peter is the project leader for edition 2 of ISO 29002 due to be published in late 2020. Peter is also a committee member of ISO/TC 184/WG 6 that published the standard for Asset intensive industry Interoperability, ISO 18101.
Peter has previously held positions as the global technical authority for materials management at a global EPC, and as the global subject matter expert for master data at a major oil and gas owner/operator. Peter is currently chief executive of MRO Insyte, and chairman of KOIOS Master Data.
KOIOS Master Data is a world-leading cloud MDM solution enabling ISO 8000 compliant data exchange