Data Cleansing: How to Organize Master Data Before Implementation (2026)
What is Master Data and Why Is It Important?
Master data is the foundational information used as a reference across all enterprise systems
Master data represents the core reference information of an enterprise. Unlike transactional data (orders, invoices, inventory movements), master data changes infrequently and serves as a common source of truth across all systems.
Master Data Categories
- Customer Master: Accounts, contact details, payment terms, credit limits
- Vendor Master: Vendor details, bank accounts, tax IDs, payment terms
- Product Master: Stock cards, unit definitions, price lists, category hierarchy
- BOM (Bill of Materials): Product structures, recipes, semi-finished goods relationships
- Employee Master: Personnel information, competencies, organizational structure
- Asset Master: Fixed assets, machinery, vehicles, maintenance records
Why Is It Critical?
Master data is the boundary system of the enterprise. An incorrect customer address means a shipment goes to the wrong location. An incomplete BOM means a production line stoppage. A duplicate vendor record means paying the same invoice twice.
As we observe in our industry-specific solutions, data quality issues emerge similarly regardless of the sector. However, the solutions require a sector-specific approach.
Tip
Master data cleansing must begin before the start of the ERP project. It should be planned as a separate work package in the project schedule, and data owners must be assigned. The “we will clean it after the system is installed” approach is one of the most expensive mistakes you can make.
Master Data Cleansing Methodology
A systematic methodology makes data cleansing scalable
Master data cleansing is not about making random corrections; it is a systematic process. The 5-step methodology below can be applied to projects of various scales.
Step 1: Discovery
First, understand the current state of your data:
- How many different systems will provide data?
- How many records are in each system?
- How much of the data is active versus passive?
- Who are the data owners?
Data Profiling
Generate statistics for each data field:
- Completeness rate: Which fields are empty?
- Uniqueness: Where does duplication exist?
- Format consistency: Are phone numbers and dates standardized?
- Value distribution: Are there abnormal or extreme values?
Step 2: Standardize
Define data quality rules:
- Naming conventions: A single format instead of “Ltd.”, “Limited”, or “LTD”
- Address format: Standardized address structure
- Phone format: Country code, area code, number
- Coding standards: Rules for product codes and customer codes
Step 3: Match
Identify duplicate and similar records:
- Exact matching: One-to-one match (e.g., same tax ID)
- Fuzzy matching: Matching based on similarity scores (“ABC Textile” vs “A.B.C. Textile”)
- Phonetic matching: Matching based on sound similarity
Step 4: Merge & Cleanse
Resolve the identified issues:
- Merge duplicate records (selection of the surviving record)
- Complete or flag missing fields
- Correct erroneous values
- Archive or delete passive records
Step 5: Validate & Sustain
Maintain quality after cleansing:
- Integrate data entry rules into the system
- Create regular data quality reports
- Define approval processes for data owners
- Establish procedures for creating new records
Customer Master Cleansing
Customer master forms the foundation of sales, finance, and logistics processes
Customer master data is usually the most voluminous and “dirtiest” data category. Records added over the years through different channels (field sales, web, dealers) create significant duplication and inconsistency.
Customer Master Cleansing Steps
1. Active/Passive Separation
Identify customers who have not transacted in the last 24-36 months. These records can be:
- Archived (not included in the migration)
- Marked as passive
- Verified by the sales team before deletion
2. Duplication Detection
Check these fields to identify duplicate customer records:
- Tax ID: The most reliable matching field for legal entities
- Phone number: Comparison in a normalized format
- E-mail: Domain-based grouping
- Address: Matching via address normalization
3. Mandatory Field Completion
Identify missing data for fields that will be mandatory in the new system:
- Contact information (phone, e-mail)
- Billing address
- Shipping address
- Payment terms
4. Customer Segmentation
Perform customer segmentation before migration:
- Customer type (corporate, individual, dealer)
- Industry code
- Region/account manager assignment
- Price group
Caution
Prefer “archiving” over “deleting” during customer master cleansing. Due to legal requirements (accounting records, data privacy regulations), access to historical data for some customers may be necessary. Instead of deleting them entirely, you can move them to a passive/archive category and exclude them from the migration.
Vendor Master Cleansing
Vendor master data forms the foundation of procurement, payment, and supply chain processes. Dirty vendor data leads to incorrect payments and audit issues.
Vendor Master Cleansing Steps
1. Vendor Verification
- Tax ID verification: Matching with official records
- Bank account verification: IBAN format and vendor matching
- Contact information: Current phone and e-mail
2. Active Vendor Analysis
Identify vendors that have not received orders in the last 12-24 months:
- Those to be merged with alternative vendors
- Those to be potentially marked as passive
- Those to be archived entirely
3. Standardization of Payment Terms
- Standardize payment term codes
- Check currency definitions
- Verify discount conditions
Product and BOM Cleansing
Product master and BOM (Bill of Materials) cleansing is critical for production and inventory management. An incorrect BOM leads to material shortages or surpluses on the production line.
Product Master Cleansing Steps
1. Product Code Standardization
- Create a mapping table for old and new codes
- Define the code structure standard (length, format)
- Renumber meaningless or inconsistent codes
2. Product Hierarchy
- Create or update the category tree
- Assign products to the correct categories
- Identify uncategorized products
3. Unit Conversion
- Check the alignment of sales unit, stock unit, and purchasing unit
- Verify unit conversion factors
- Correct inconsistent unit definitions
BOM Cleansing
1. BOM Structure Verification
- Detect circular references
- Identify missing sub-components
- Review phantom BOMs
2. Quantity and Scrap Rates
- Verify component quantities with production
- Update scrap rates
- Define alternative components
7 Most Common Mistakes in Master Data Cleansing
1. Leaving Cleansing to the End of the Project
The “we will clean it after the system is installed” approach is the most expensive mistake. Tests and training conducted with dirty data undermine user confidence and extend the project timeline.
2. Not Assigning a Data Owner
Every data field must have an owner. Data without an owner means data for which no one takes responsibility, leading to declining quality.
3. Trying to Migrate All Data
Migrating 10+ years of passive data by saying “let’s not lose the past” clutters the new system. Migrate only the necessary data through active/passive separation.
4. Manual Cleansing
Trying to clean 50,000+ records one by one in Excel is both slow and prone to error. Use data cleansing tools and automation.
5. Underestimating Duplication
The thought that “a few duplicates won’t hurt” creates a problem that grows over time. Customer duplication risks incorrect reporting, while vendor duplication risks double payments.
6. Not Defining Standards
Cleansing data without defining rules to prevent the same contamination in the future is futile. Without data entry standards, cleansing is only temporary.
7. Leaving It to IT
Data cleansing is not a technical task; it must be led by business units. IT provides the tools, but data quality decisions must come from the Sales, Procurement, and Finance departments.
A systematic approach prevents errors
Master Data Cleansing Checklist
The following checklist is a comprehensive guide for master data cleansing. Check each category in order:
- Data owners assigned for each category
- Cleansing schedule and milestones defined
- Source systems and data volumes inventoried
- Target system data structure and mandatory fields determined
- Active/passive customer separation performed
- Duplicate customer records identified
- Tax ID verification completed
- Contact information (phone, e-mail) format standardization performed
- Address standardization completed
- Customer segmentation (type, industry, region) updated
- Active/passive vendor separation performed
- Tax ID and IBAN verification completed
- Payment terms and due date codes standardized
- Duplicate vendor records merged
- Product code standardization completed
- Product category hierarchy created
- Unit conversion factors verified
- BOM structural verification (circular reference) performed
- Scrap rates and quantities verified with production
- Data entry standards documented
- Validation rules defined in the target system
- Data quality reports created
- New record creation procedure established
This checklist can also be adapted for use in your industry-based projects.
Frequently Asked Questions (FAQ)
Get Support for Your Project
I can help guide your digital transformation initiative. Book a free preliminary call to discuss your priorities.