Data Validation Using Tableau

Blog | July 5, 2021 | By Darian Pena, Damanpreet Kaur

Data analysis is widely used as a method of uncovering insights to areas where businesses can improve profitability. However, Data Analysis and the visualization tools we utilize can also serve to prevent revenue loss by examining obvious and, sometimes subtle, errors in data.

Data validation is a form of data analysis that allows firms to maintain and enhance the integrity of systems and the information they store. We can use tools like Tableau to perform several functions, such as identifying structural errors, handling missing data, and filtering unwanted or otherwise irrelevant observations. In the examples below, we will explore several tactics we can employ to maintain data integrity.

One way to uncover anomalies in a dimension analysis is to tally the number of records by category. In the visual generated from the superstore dataset below, we can see that there is a potentially disproportionate number of purchases for furniture.

Another way to find anomalies is to use a basic crosstab analysis to count unique records of various dimensions, while emphasizing the magnitude of larger numbers with bold font. A large divergence in the instances of certain column dimensions could raise questions for subject matter experts expecting to see different outcomes.

Similarly, we can conduct descriptive analysis on measures to understand basic metrics like minimum and maximum, averages, along with other descriptive values. This high level overview helps to level-set expectations about the data or immediately reveal potential irregularities with the data.

With a correlation matrix chart, you can explore the relationship between different variables. Some categories meet a certain profile, and some relationships are stronger than others. This view allows users to drill down on correlations between variables and examine whether the results are consistent with expectations.

A more advanced, but visually informative analysis of outliers can be done with a box and whisker plot. By breaking data into quartiles, box and whisker plots help to visually identify data points that stray from the general population. The bottom limit for the lowest quartile is the lowest horizontal line, while the upper limit is at the highest horizontal line. In the visual below, we see several data points in the “IT” and “Office Goods” categories that exceed the interquartile ranges for sales. This could be an indication of a naturally wide variance, or it could signify an issue with the data that requires immediate attention.

With the Tableau views provided, firms can use business intelligence strategies to validate and cleanse data and protect themselves from adverse operational and regulatory consequences of data integrity issues. Furthermore, to facilitate deeper dives with the source data in excel, USEReady offers Pixel Perfect to extract the specific information you drilled down into with your Tableau dashboard.

About the Author
Big Data, Cloud, and Business Intelligence professional with experience supporting large scale Data & Analytics initiatives.
Darian PenaSenior Analyst | USEReady
About the Author
Statistical, Analytics and Business Intelligence professional with experience in supporting Descriptive, Prescriptive and Predictive Data Analytics initiatives across Enterprises.
Damanpreet KaurSenior Business Intelligence Analyst | USEReady