Guide

Data Quality: Where Bad Data Comes From and How to Prevent It (2026)

Koray Çetintaş 10 February 2026 6 min read


What Is Data Quality? Four Core Dimensions

Data Quality Dashboard and Metrics

Data quality is a multidimensional assessment, not a single metric

Data quality is the degree to which data is usable for its intended purpose. High-quality data supports business decisions, while low-quality data leads to misdirection. Data quality is evaluated across four core dimensions:

1. Accuracy

The extent to which data correctly reflects real-world values. If a customer’s address is “Ataturk St. No:15,” the record in the system must match exactly. Accuracy errors typically manifest as:

  • Typographical errors: “Istnabul” instead of “Istanbul”
  • Value errors: Entering a unit price of 1000 instead of 100
  • Reference errors: An order linked to the wrong customer code
  • Measurement errors: Entering grams instead of kilograms

2. Completeness

Ensuring all necessary data fields are populated. Incomplete data misleads analysis and hinders business processes. Completeness issues include:

  • Empty mandatory fields: A customer record without an email address
  • Partial records: An address with a street name but no city/zip code
  • Relational gaps: A product card without an assigned category
  • Historical gaps: A product without a tracked price change history

3. Consistency

Ensuring data across different systems is compatible. When the same data appears differently in various locations, it leads to serious operational issues:

  • Format inconsistency: “TR-123456” in one system and “123456” in another
  • Value inconsistency: A customer marked as “Active” in the CRM but “Inactive” in the ERP
  • Temporal inconsistency: An order date occurring after the shipment date
  • Calculation inconsistency: Different total revenue figures across various reports

4. Timeliness

Ensuring data is current and accessible when needed. Outdated data means making decisions based on obsolete information:

  • Delayed updates: A price increase not reflected in the system
  • Delayed integration: Order information taking until the next day to sync to the ERP
  • Unarchived data: 10-year-old inactive records appearing as active
  • Delayed validation: Address changes discovered months after the fact

Tip

Data quality dimensions are not independent. For example, inconsistent data often stems from outdated information (a timeliness issue). Evaluate all dimensions together during root cause analysis.


Where Does Bad Data Come From? Root Cause Analysis

Data Analysis and Error Detection

Understanding the root causes of bad data is the prerequisite for creating permanent solutions

Data quality issues have three primary sources: People, Systems, and Processes. Each source requires different intervention strategies.

Human-Induced Errors

Wherever manual data entry occurs, there is potential for human error:

Data Entry Errors

  • Typing errors: Typing “1243” instead of “1234”
  • Copy-paste errors: Copying the wrong cell
  • Format errors: Entering the wrong date format
  • Unit errors: Entering quantity instead of weight

Lack of Knowledge

  • Lack of training: Not understanding the meaning of a field
  • Lack of procedures: Not knowing the correct entry method
  • Lack of reference: Not knowing valid values

Intentional Errors

  • Time pressure: Skipping mandatory fields for speed
  • Bypassing the system: Entering fake values to pass validation
  • Lazy entry: Using placeholder values like “X” or “.”

System-Induced Errors

Errors originating from technical infrastructure and software:

Integration Errors

  • Mapping errors: Incorrect field mapping
  • Transformation losses: Data loss during character set conversion
  • Synchronization errors: Timing-related incompatibilities
  • API errors: Partial data transfer after a timeout

Software Bugs

  • Bugs: Calculation or saving errors
  • Default values: Incorrectly assigned default values
  • Rounding errors: Inconsistencies caused by rounding

Process-Induced Errors

Issues stemming from business processes and management:

Design Flaws

  • Lack of validation: Fields without entry controls
  • Lack of standards: Undefined data formats
  • Lack of documentation: Absence of a data dictionary

Management Deficiencies

  • Unowned data: Fields without an assigned data steward
  • Unmonitored quality: Lack of metrics and reporting
  • Lack of audits: Failure to perform periodic checks

Attention

80% of data quality issues are process-related. Before investing in technology, review your data entry processes, standards, and governance model. Even the most expensive software cannot clean dirty data.


Validation Rules and Data Entry Controls

Data Validation Processes

Proactive validation prevents bad data from entering the system

Validation rules are the first line of defense preventing bad data from entering the system. They are defined at three levels:

Level 1: Format Validation

Checking the technical format of the data:

  • Email format: Checking for the “xxx@domain.com” structure
  • Phone format: “+90 5XX XXX XX XX” structure
  • IBAN format: Country code + check digit + BBAN
  • Date format: DD.MM.YYYY or YYYY-MM-DD
  • Tax ID: 10 or 11-digit numeric

Level 2: Business Rule Validation

Checking compliance with business logic:

  • Range checks: Unit price > 0, stock >= 0
  • List checks: Is the country code in the valid list?
  • Reference checks: Does the customer code exist in the system?
  • Logic checks: Discount rate <= 100%

Level 3: Cross-Field Validation

Checking relationships between fields:

  • Date relationships: Order date <= Shipment date
  • Quantity relationships: Shipped quantity <= Ordered quantity
  • Amount relationships: Total = Quantity x Unit Price
  • Code relationships: Compatibility of city code + district code

Validation Strategies

Inline Validation

Immediate feedback while the user enters data. The most effective method to prevent errors at the source.

On-Save Validation

Checking all fields when the save button is clicked. Displaying multiple errors simultaneously.

Batch Validation

Used during bulk data uploads or integration. Reporting erroneous rows to give the user a chance to correct them.

Scheduled Validation

Regular checks of existing data to identify data that has degraded over time.


Data Stewardship: Data Governance Model

Data quality is not a one-time task but a continuously managed process. The data stewardship model ensures this continuity.

What Is a Data Steward?

A data steward is a business unit representative responsible for the quality of a specific data field. It is a role led by the business unit, not IT. Responsibilities include:

  • Defining and documenting data standards
  • Setting quality rules
  • Investigating and resolving data issues
  • Approving requests for new records
  • Monitoring periodic quality reports

Data Steward Assignment Examples

  • Customer master: Sales Manager or CRM Manager
  • Supplier master: Purchasing Manager
  • Product master: Product Manager or R&D Manager
  • Financial data: Finance Manager or Chief Accountant
  • Employee data: HR Manager

Data Governance Process

1. Definition

  • Create a data dictionary (definition, format, and owner for every field)
  • Define quality rules
  • Determine measurement metrics

2. Measurement

  • Calculate automated quality scores
  • Visualize on dashboards
  • Perform trend analysis

3. Monitoring

  • Define threshold values (e.g., accuracy > 98%)
  • Set up alerts for deviations
  • Initiate corrective actions

4. Improvement

  • Perform root cause analysis
  • Implement process improvements
  • Update training materials

For more information on data governance, you can review our sector-based solutions.


Field Example: Manufacturing Firm Case Study

Real Case (Brand-Neutral) Manufacturing Facility Data Management

Situation

An electronics manufacturer with 85 employees. A data quality analysis was conducted before migrating to a new ERP system. The results were concerning: 18% duplication in the customer master, 12% format inconsistency in product codes, and 8% missing components in BOMs. The firm had planned to migrate to the ERP without resolving these issues.

Steps Taken

  1. Weeks 1-2: Data profiling and quality measurement. Current state analysis was performed across four core dimensions. An inventory of issues was created for each data category.
  2. Weeks 3-4: Data steward assignments. The Sales Manager was assigned for customer data, the Production Planning Chief for product data, and the Purchasing Manager for supplier data.
  3. Weeks 5-8: Cleaning and standardization. Duplicate records were merged, format standards were defined, and missing BOM components were verified with production.
  4. Weeks 9-10: Integration of validation rules into the system. 47 format, 23 business rule, and 12 cross-field validations were defined in the new ERP.
  5. Weeks 11-12: A quality monitoring dashboard and periodic reporting mechanism were established. Weekly quality scorecard meetings were initiated.

Result (Representative)

  • Customer duplication rate: 18% -> 0.5% (post-migration)
  • Product code format consistency: 88% -> 99.2%
  • BOM completeness rate: 92% -> 99.8%
  • ERP migration duration: Completed in 4.5 months instead of the planned 6 months
  • Post-go-live data-related support requests: 60% below industry average

7 Most Common Data Quality Mistakes

1. Viewing Data Quality as an IT Problem

Data quality is a business problem, not a technical one. IT provides the tools, but data quality decisions and ownership must reside in business units. Data stewards should come from Sales, Finance, or Operations, not IT.

2. Performing One-Time Cleaning

The “we cleaned it once, we’re done” approach. Data becomes dirty continuously. Cleaning remains temporary without validation rules, monitoring mechanisms, and periodic audits.

3. Accepting Data Without Validation

Allowing users to enter anything. The “we’ll fix it later” mindset. Preventing bad data at the source is much cheaper and more effective than cleaning it later.

4. Not Documenting Data Standards

Everyone using different formats. Having no answer to the question “How should we enter this?” Data dictionaries and entry standards must be documented and accessible.

5. Not Measuring Quality Metrics

Saying “our data quality is good” without measuring it. You cannot manage what you cannot measure. Accuracy, completeness, consistency, and timeliness metrics must be defined for every data category.

6. Intervening Only for Critical Errors

Ignoring small quality issues. Accumulated small errors turn into major crises. Proactive monitoring and early intervention are essential.

7. Skipping User Training

The system is installed, validations are defined, but users don’t know why or how to enter correct data. Without training, there is no behavioral change.

Data Quality Analysis

A systematic approach prevents errors


Data Quality Success Metrics

Key metrics you should measure to manage data quality (representative target values):

Metric Baseline Target Measurement Method
Accuracy Rate 85-90% >98% Sample verification + automated checks
Completeness Rate 70-80% >95% Empty field count / Total mandatory fields
Consistency Rate 80-85% >99% Cross-system comparison reports
Timeliness Rate 75-85% >95% SLA compliance rate + update delay time
Duplication Rate 5-15% <1% Duplicate detection via fuzzy matching
Validation Rejection Rate 10-20% <3% Rejected records / Total entry attempts
Data Issue Resolution Time 5-10 days <2 days Average time between issue detection and resolution
Data Steward Coverage 30-50% 100% Owned data fields / Total data fields

Track these metrics on a weekly or monthly basis and perform trend analysis.


Data Quality Checklist

The checklist below is a comprehensive guide for your data quality program:

A. Governance and Organization
  • Data steward assigned for each master data category
  • Data quality policy written and approved
  • Quality goals and KPIs determined
  • Escalation procedure defined
B. Data Standards
  • Data dictionary created
  • Format standards (date, phone, address) documented
  • Coding standards (customer code, product code) defined
  • Reference data lists (country, sector, category) kept centrally
C. Validation Rules
  • Format validations integrated into the system
  • Business rule validations defined
  • Cross-field validations active
  • Validation report available for bulk data uploads
D. Measurement and Monitoring
  • Automated quality score being calculated
  • Quality dashboard created
  • Periodic quality reports being produced
  • Alarm mechanism in place for threshold breaches
E. Corrective Actions
  • Data issue reporting mechanism in place
  • Root cause analysis procedure defined
  • Corrective action tracking in place
  • Process improvement performed for recurring issues
F. Training and Awareness
  • Data entry standards training provided
  • Role training provided for data stewards
  • Data quality awareness program in place
  • Data training included in onboarding for new employees

Frequently Asked Questions (FAQ)

Data quality directly affects the reliability of business decisions. Low-quality data leads to faulty reporting, incorrect forecasts, customer churn, and operational inefficiency. According to representative research, 15-25% of an organization’s annual revenue can be lost to costs resulting from data quality issues.

There are four core dimensions of data quality: Accuracy (the extent to which data reflects real-world values), Completeness (ensuring all necessary fields are populated), Consistency (ensuring data across different systems is compatible), and Timeliness (ensuring data is current and accessible).

Bad data stems from three main sources: Human factors (manual data entry errors, copy-paste errors, incorrect format usage), System factors (integration errors, transformation losses, synchronization problems), and Process factors (missing validation rules, unclear data ownership, insufficient data governance).

Data quality is measured using dimension-based metrics: Accuracy rate (incorrect record count / total records), Completeness rate (empty field count / total fields), Consistency rate (inconsistent record count / total records), and Timeliness rate (outdated record count / total records). These metrics should be measured and reported regularly.

A data steward is a business unit representative responsible for the quality of a specific data field. They define data standards, set quality rules, resolve data issues, and manage procedures for creating new records. It is a role led by business units, not IT. Every master data category should have a data steward.

Validation rules are defined at three levels: Format validation (email format, phone format, date format), Business rule validation (unit price > 0, stock quantity >= order quantity), and Cross-field validation (country code + phone format compatibility, invoice date <= shipment date). Rules should be integrated into the system and checked at the moment of entry.


About the Author

Koray Cetintas is an advisor specializing in digital transformation, ERP architecture, process engineering, and strategic technology leadership. He applies a "Strategy + People + Technology" approach shaped by hands-on experience in AI, IoT ecosystems, and industrial automation.

Get Support for Your Project

I can help guide your digital transformation initiative. Book a free preliminary call to discuss your priorities.