Guide

Data Cleansing: How to Organize Master Data Before Implementation (2026)

Koray Çetintaş 10 February 2026 9 min read


What is Master Data and Why Is It Important?

Data Management Dashboard

Master data is the foundational information used as a reference across all enterprise systems

Master data represents the core reference information of an enterprise. Unlike transactional data (orders, invoices, inventory movements), master data changes infrequently and serves as a common source of truth across all systems.

Master Data Categories

  • Customer Master: Accounts, contact details, payment terms, credit limits
  • Vendor Master: Vendor details, bank accounts, tax IDs, payment terms
  • Product Master: Stock cards, unit definitions, price lists, category hierarchy
  • BOM (Bill of Materials): Product structures, recipes, semi-finished goods relationships
  • Employee Master: Personnel information, competencies, organizational structure
  • Asset Master: Fixed assets, machinery, vehicles, maintenance records

Why Is It Critical?

Master data is the boundary system of the enterprise. An incorrect customer address means a shipment goes to the wrong location. An incomplete BOM means a production line stoppage. A duplicate vendor record means paying the same invoice twice.

As we observe in our industry-specific solutions, data quality issues emerge similarly regardless of the sector. However, the solutions require a sector-specific approach.

Tip

Master data cleansing must begin before the start of the ERP project. It should be planned as a separate work package in the project schedule, and data owners must be assigned. The “we will clean it after the system is installed” approach is one of the most expensive mistakes you can make.


Master Data Cleansing Methodology

Data Analysis Process

A systematic methodology makes data cleansing scalable

Master data cleansing is not about making random corrections; it is a systematic process. The 5-step methodology below can be applied to projects of various scales.

Step 1: Discovery

First, understand the current state of your data:

  • How many different systems will provide data?
  • How many records are in each system?
  • How much of the data is active versus passive?
  • Who are the data owners?

Data Profiling

Generate statistics for each data field:

  • Completeness rate: Which fields are empty?
  • Uniqueness: Where does duplication exist?
  • Format consistency: Are phone numbers and dates standardized?
  • Value distribution: Are there abnormal or extreme values?

Step 2: Standardize

Define data quality rules:

  • Naming conventions: A single format instead of “Ltd.”, “Limited”, or “LTD”
  • Address format: Standardized address structure
  • Phone format: Country code, area code, number
  • Coding standards: Rules for product codes and customer codes

Step 3: Match

Identify duplicate and similar records:

  • Exact matching: One-to-one match (e.g., same tax ID)
  • Fuzzy matching: Matching based on similarity scores (“ABC Textile” vs “A.B.C. Textile”)
  • Phonetic matching: Matching based on sound similarity

Step 4: Merge & Cleanse

Resolve the identified issues:

  • Merge duplicate records (selection of the surviving record)
  • Complete or flag missing fields
  • Correct erroneous values
  • Archive or delete passive records

Step 5: Validate & Sustain

Maintain quality after cleansing:

  • Integrate data entry rules into the system
  • Create regular data quality reports
  • Define approval processes for data owners
  • Establish procedures for creating new records

Customer Master Cleansing

Customer Data Management

Customer master forms the foundation of sales, finance, and logistics processes

Customer master data is usually the most voluminous and “dirtiest” data category. Records added over the years through different channels (field sales, web, dealers) create significant duplication and inconsistency.

Customer Master Cleansing Steps

1. Active/Passive Separation

Identify customers who have not transacted in the last 24-36 months. These records can be:

  • Archived (not included in the migration)
  • Marked as passive
  • Verified by the sales team before deletion

2. Duplication Detection

Check these fields to identify duplicate customer records:

  • Tax ID: The most reliable matching field for legal entities
  • Phone number: Comparison in a normalized format
  • E-mail: Domain-based grouping
  • Address: Matching via address normalization

3. Mandatory Field Completion

Identify missing data for fields that will be mandatory in the new system:

  • Contact information (phone, e-mail)
  • Billing address
  • Shipping address
  • Payment terms

4. Customer Segmentation

Perform customer segmentation before migration:

  • Customer type (corporate, individual, dealer)
  • Industry code
  • Region/account manager assignment
  • Price group

Caution

Prefer “archiving” over “deleting” during customer master cleansing. Due to legal requirements (accounting records, data privacy regulations), access to historical data for some customers may be necessary. Instead of deleting them entirely, you can move them to a passive/archive category and exclude them from the migration.


Vendor Master Cleansing

Vendor master data forms the foundation of procurement, payment, and supply chain processes. Dirty vendor data leads to incorrect payments and audit issues.

Vendor Master Cleansing Steps

1. Vendor Verification

  • Tax ID verification: Matching with official records
  • Bank account verification: IBAN format and vendor matching
  • Contact information: Current phone and e-mail

2. Active Vendor Analysis

Identify vendors that have not received orders in the last 12-24 months:

  • Those to be merged with alternative vendors
  • Those to be potentially marked as passive
  • Those to be archived entirely

3. Standardization of Payment Terms

  • Standardize payment term codes
  • Check currency definitions
  • Verify discount conditions

Product and BOM Cleansing

Product master and BOM (Bill of Materials) cleansing is critical for production and inventory management. An incorrect BOM leads to material shortages or surpluses on the production line.

Product Master Cleansing Steps

1. Product Code Standardization

  • Create a mapping table for old and new codes
  • Define the code structure standard (length, format)
  • Renumber meaningless or inconsistent codes

2. Product Hierarchy

  • Create or update the category tree
  • Assign products to the correct categories
  • Identify uncategorized products

3. Unit Conversion

  • Check the alignment of sales unit, stock unit, and purchasing unit
  • Verify unit conversion factors
  • Correct inconsistent unit definitions

BOM Cleansing

1. BOM Structure Verification

  • Detect circular references
  • Identify missing sub-components
  • Review phantom BOMs

2. Quantity and Scrap Rates

  • Verify component quantities with production
  • Update scrap rates
  • Define alternative components

7 Most Common Mistakes in Master Data Cleansing

1. Leaving Cleansing to the End of the Project

The “we will clean it after the system is installed” approach is the most expensive mistake. Tests and training conducted with dirty data undermine user confidence and extend the project timeline.

2. Not Assigning a Data Owner

Every data field must have an owner. Data without an owner means data for which no one takes responsibility, leading to declining quality.

3. Trying to Migrate All Data

Migrating 10+ years of passive data by saying “let’s not lose the past” clutters the new system. Migrate only the necessary data through active/passive separation.

4. Manual Cleansing

Trying to clean 50,000+ records one by one in Excel is both slow and prone to error. Use data cleansing tools and automation.

5. Underestimating Duplication

The thought that “a few duplicates won’t hurt” creates a problem that grows over time. Customer duplication risks incorrect reporting, while vendor duplication risks double payments.

6. Not Defining Standards

Cleansing data without defining rules to prevent the same contamination in the future is futile. Without data entry standards, cleansing is only temporary.

7. Leaving It to IT

Data cleansing is not a technical task; it must be led by business units. IT provides the tools, but data quality decisions must come from the Sales, Procurement, and Finance departments.

Data Quality Analysis

A systematic approach prevents errors


Master Data Cleansing Checklist

The following checklist is a comprehensive guide for master data cleansing. Check each category in order:

A. Planning and Organization
  • Data owners assigned for each category
  • Cleansing schedule and milestones defined
  • Source systems and data volumes inventoried
  • Target system data structure and mandatory fields determined
B. Customer Master
  • Active/passive customer separation performed
  • Duplicate customer records identified
  • Tax ID verification completed
  • Contact information (phone, e-mail) format standardization performed
  • Address standardization completed
  • Customer segmentation (type, industry, region) updated
C. Vendor Master
  • Active/passive vendor separation performed
  • Tax ID and IBAN verification completed
  • Payment terms and due date codes standardized
  • Duplicate vendor records merged
D. Product and BOM
  • Product code standardization completed
  • Product category hierarchy created
  • Unit conversion factors verified
  • BOM structural verification (circular reference) performed
  • Scrap rates and quantities verified with production
E. Data Quality and Sustainability
  • Data entry standards documented
  • Validation rules defined in the target system
  • Data quality reports created
  • New record creation procedure established

This checklist can also be adapted for use in your industry-based projects.


Frequently Asked Questions (FAQ)

Cleaning dirty data after it has been migrated to a new system is much more difficult and costly. Furthermore, training and testing conducted with incorrect data shake user confidence in the system. Projects that begin with clean data are typically completed 40-60% faster.

The duration depends on data volume and the rate of contamination. As a representative estimate, it takes 4-8 weeks for a medium-sized company (10,000-50,000 records). However, this time can double or triple if data ownership is unclear or if data is coming from multiple source systems.

Fuzzy matching algorithms are used. For example, variations in customer names such as ‘ABC Ltd.’, ‘ABC Limited’, and ‘A.B.C. Ltd’ can be matched using a similarity score. The combination of EXACT and VLOOKUP in Excel is sufficient for simple duplicate detection; large volumes require specialized data cleansing tools.

Basic master data categories include: Customer master (accounts, contact details), Vendor master (vendor details, payment terms), Product master (stock cards, prices), BOM (product structures, recipes), Employee master (personnel information), and Asset master (fixed assets, machinery). In addition to these, there may be company-specific master data.

For small-scale projects, Excel (Power Query, VLOOKUP, fuzzy matching add-ins) is sufficient. For medium and large-scale projects, tools such as OpenRefine (free), Talend Data Quality, Informatica Data Quality, or Microsoft DQS are used. Most ERP systems also have their own built-in data cleansing modules.

Every data field must have an owner. For example, the Sales Manager might be the owner of customer master data, while the Procurement Manager owns vendor master data. Data without an owner is data for which no one takes responsibility, leading to low quality and inconsistency. Data ownership is also critical for maintaining quality after cleansing.


About the Author

Koray Cetintas is an advisor specializing in digital transformation, ERP architecture, process engineering, and strategic technology leadership. He applies a "Strategy + People + Technology" approach shaped by hands-on experience in AI, IoT ecosystems, and industrial automation.

Get Support for Your Project

I can help guide your digital transformation initiative. Book a free preliminary call to discuss your priorities.