6 Data Quality Issues Stopping You From Achieving Perfect Data (and how to fix them)
If you’re reading this, it’s probably because you’re experiencing data quality issues at this very moment in time. We can be fairly confident about this presumption because, the truth is, many organisations have problems with their data that are preventing them from reaching their goals.
Just look at some of the figures below from ZoomInfo that show how widespread the ‘dirty data’ issue is:
- 33% of businesses have over 100,000 records in their database
- 62% of organizations rely on prospect data that’s up to 40% inaccurate
- 34% of companies change their names annually
- 15% of leads contained duplicated data
- 7% of leads contained invalid email/physical addresses
- 40% of business objectives fail due to inaccurate data
- 50% of IT budgets are spent on data rehabilitation
- Bad data costs US businesses more than $611bn each year
So at least you’re not alone. The question is what can you do about it? What steps can you take to improve your data quality, and also fix your data quality management processes so you don’t suffer from the same problems further down the line?
The 6 most common data quality issues hampering your organisation
Below are the most common data quality issues that most organisations experience, those which will stop you from getting the most value out of your information:
1) Incompleteness: where crucial pieces of information are missing
2) Inaccuracy: all of the information may be ‘present’ (or the data fields filled in), but it could be entered in the wrong field, spelled incorrectly, or the field contains a junk value
3) Inconsistency: data that should be presented with the same value / format is inconsistent (e.g. using different currencies instead of the same one throughout)
4) Invalidity: the fields are complete, but with data that can’t possibly be correct in that context (e.g. “units available” displaying a minus value)
5) Redundancy: where the same data is entered multiple times but expressed in slightly different ways (e.g. entering the same company but with different names, entering a person’s name in different ways, etc)
6) Non-standard data: information that’s input using non-standard formats (or formats that can’t be processed by the system, for example percentage rather than %)
While these data quality problems are far from ideal, are they enough of a hindrance to justify going to the effort of making wholesale changes to the way your organisation manages its data? If the business can function ‘adequately’ without making such changes, should you just carry on as before and do the best with what you’ve got?
In other words, when should data quality processes be implemented (if at all)?
When should you implement data quality measures?
Generally speaking, data quality controls and measures should be put in place when there’s a business need and when you need to solve a specific problem. And as we all know, there’s always a business need in one form or another, and there’s always something to aim for. Otherwise, what are we all doing here?
Here are a few reasons why you’re likely to be interested in improving the quality of your data (or should be!):
1) Your data is a major strategic asset that will provide you with a competitive advantage if it is accurate and usable
2) You want to draw data from disparate sources into one central data warehouse or repository, which will be extremely difficult (if not impossible) to do if the information isn’t standardised
3) You want to manage your master data more effectively
4) You’re planning to implement a new system or carry out a system migration, for example from a legacy system or ERP to a cloud-based system
Once you’ve identified the business case for putting data quality measures in place – or convinced others in your organisation of the need to – then the data quality management process itself will need to be defined. But who’s responsible for that?
Who’s involved in the data quality management process?
Two types of role in particular are critical to the success of the data quality process, namely:
Data stewards – they’re involved in profiling the data and creating rules for data standardisation and cleansing
Developers – they collaborate with data stewards and play an important role in designing data quality rules and the development process
Both of these roles will need to work together closely throughout the implementation process, after which the data stewards will be responsible for monitoring the quality of the information.
What constitutes a data quality assurance / management process?
The process itself includes certain stages that data quality analysts and data stewards in particular will need to complete, including:
Data profiling – at this point, they will need to explore the data in order to gain an in-depth understanding and identify issues within it, such as the ones outlined earlier (incompleteness, inaccurate, etc) before summarising the issues.
Defining metrics – to get an idea of just how widespread the data problems are, while also establishing data quality benchmarks, they will need to record metrics such as how much of the data is currently complete (% complete), how much is consistent (% consistent), valid (% valid) and so on.
Fixing the data – at this point, after issues have been profiled and benchmarked, the process of cleansing the information and fixing the issues can begin.
However, making changes directly to the data obviously presents a risk if the suggested changes themselves are incorrect. This would lead to a very messy and confusing situation that’s even harder to fix! Therefore it’s best not to make any changes directly to the database straight away.
Instead, proposed changes should be listed and detailed before being passed to a data steward for review, after which they will either be approved or rejected.
Evolving data quality needs and questions to consider
One of the inescapable aspects of working with information – particularly when it comes to data quality management, stewardship and governance – is that it will never be a ‘one and done’ situation.
Instead, your organisation’s data quality needs are going to change over time, and as a result of this your defined rules will also be readjusted over time, especially as the data stewards gain a greater understanding of the data, common recurring issues and how to mitigate them.
What’s more, data itself does not stand still. The statistics at the start of this blog show how often information such as company names, addresses and email addresses changes and is updated, while new data sources will also be added as time goes by, meaning the need for stewardship and governance will continue.
Thanks to the changing needs of the organisation, and the changing nature of the data itself, you will also need to periodically ask yourself questions to make sure that complacency doesn’t creep in and that you’re being proactive, as opposed to reactive.
For example, these may include the following:
- Is the quality of your data actually improving over time, and therefore is the data management process working as intended?
- If the quality isn’t improving, do the rules need to be updated? Are they meeting your organisation’s current needs?
- If and when new data sources are added, do the existing data quality rules still apply, or will they need to be adapted accordingly?
If you found this blog useful, then you may want to check out our other detailed resources as well, covering different aspects of master data management, data cleansing, data governance and more. Click here to check them out!