The statistics around data cleansing are overwhelming and there are mountains of discussions, white papers and tweets available pertaining to Data Quality, Data Profiling and Master Data Management. I think we need to take a step back and try to understand how and why data cleansing has become such a hot topic. You may have realized that business data typically isn’t as streamlined and efficiently maintained as we thought it was. Your organization may have shipped purchased items back because they were not what you thought you had ordered. In some cases another department was found to have the item in inventory, even though we have the item on urgent delivery status from a supplier because the item is set up under a different number or description, you couldn’t have possibly known the item was actually available from existing inventory.
The data quality issues that industries around the world are experiencing have occurred as a result of many years of manual inventory and purchasing record maintenance, through mergers and acquisitions of companies and business units as well as data migrations from various legacy systems into new fangled ERP black holes. There are a number of reasons why.
A common data trap frequently fallen into is assuming that just because you are implementing a new ERP system your organization will now have quality data. Remember the old computer motto – “Garbage In, Garbage Out”. Let me tell you based on first hand experience that there is nothing “sexy” about bad data when the production line is down or any other time.
Data Cleansing and Data Profiling is a very tedious and detailed oriented service. There are a number of key rules to follow whether the profiling and cleansing work is done internally or outsourced to someone who specializes in data cleansing. Here are some rules to consider before a project is started:
1) Conduct a detailed and comprehensive data mapping through all internal systems including engineering, purchasing, asset management, plant inventory management, etc. The goal is to standardize and document all data sources within the enterprise one time and ensure that each department is accounted for and determines what data elements are required to complete their business required tasks.
2) Build a central data cleansing database and make sure all locations using each item are referenced. This ensures that updated information will be passed back to the various legacy systems. You will need old information and updated information for this stage of the process.
3) The data cleansing database should include a balance of electronic scripting for data corrections and manual auditing. A solid process for answering questions needs to be set up. My preference is that the system should use a web utility that tracks data change history and other data related information such as contact information, issue resolution status, classification, questions and answers, etc.
4) The data needs to be referenced to a classification schema and a standard implemented for descriptions and properties. The schema can be designed within your company, priority purchased from another vendor or you can opt for using an open classification dictionary for public use such as the ECCMA eOTD.
5) Free text is not our friend in the data standardization world. If all possible use a system that has built in data rules and ensure anyone entering data into the system understands the standards and the importance of quality data in addition to the high cost to businesses using bad data.
6) Data Cleansing and Profiling the proper way is not “cheap”, but the cost of cleaning the bad data is always less than the expenditures incurred by cleansing your data multiple times or continuing to operate your organization based on erroneous information generated from one or multiple dirty databases.
Cleansed data permits the removal of duplicated inventory items, an internal purchase philosophy that puts a priority on inventory sharing before issuing supplier purchase orders, standardizing inventory with predefined stocking levels, identifying critical pieces of inventory, identifying functionally equivalent items, use of engineering component standardization libraries and facilitates purchasing analytics as well as enhanced vendor management.


