Those of us that work around or manage the day to day operations of an MDM, data governance, or data cleansing projects understand the challenges and efforts needed to transform “raw” data though multiple stages of analytics and processes to achieve information quality to be used in our customer’s CRM, CMMS, PIM and ERP systems. The result of an un-cleansed product record can cause a production line to stay off line because an inventory item wasn’t ordered due to incomplete information or added inventory cost of ordering an incorrect item (we can be talking about a $10,000 motor) or multiple entries and setups in the material master due to data duplication.
Data vs. Information definition: to simplify the concept, data is managed by a combination of a team of analysts and software to achieve the goal of a cleansed record or useable information. Data is imported and profiled, classified, structured, verified, enriched, translated and reports generated; we create useable information from low quality data for use in decision making related to engineering, purchasing, maintenance, marketing, sales, etc. The data that is exported into client systems is information that will meet a predetermined set of data governance rules and information quality requirements.
Data Quality Experts, let have a discussion on the definitions of data quality, does an address or a product detail meet the requirement if only classified? Or should verification at source (contact for address or manufacturer / supplier for product) be required at initial setup of the data in the system or maintenance scheduled as part of the data governance program? Is the data incomplete? Does the MDM process include a question / answer scenario to complete the data?
MDM software designers and developers can we also have a discussion on the software’s ease of use to manage the stages of data cleansing to support a MDM philosophy and using advanced techniques to automate the management, add intelligence in processing data imports, workflows and data cleansing stages of classifying, profiling, matching, translation, data audit analytics, exception reports and status reporting of a data record?
I believe these are great discussion points and will serve as great blog topics.
Tags: Business Intelligence, data, Data Cleansing, Data Profiling, data quality, DATAForge, dataquality, linkedin, maintenance, masterdata, mdm

1. “does an address or a product detail meet the requirement if only classified?” – not in my opinion due to the fact that without verification the data cannot become information.
2. should verification at source (contact for address or manufacturer / supplier for product) be required at initial setup of the data in the system or maintenance scheduled as part of the data governance program? – yes, otherwise what are we governing and by what laws/rules.
3. Is the data incomplete? – yes, often address data is incomplete or invalid. Either is insufficient to become information.
4. Does the MDM process include a question / answer scenario to complete the data? – I believe so. This is the voting portion of the governance initiative, functioning as the business’ chance to manage its own information.
Great post Jackie!! Keep it up! I’ll be mulling these questions over and forming a blog post soon!
William, I believe that the topic of verification or sometimes referred to a “catalog @ source” is a critical step of data quality but is usually overlooked during the traditional data migration phase of a software implementation which results in bad data moved from a legacy system to a new system . . . only providing the business with limited data accuracy for reporting.
Hi Jackie,
I completely agree with your reply to William.
The attitude too often taken is: “Our job is to implement the new system, not clean up someone else’s mess”.
Ken
@Jackie & @Ken … Were you guys on the same projects I was a few years ago?
I was an ETL developer on several data warehousing projects when I developed this philosophy. I’ve even been asked to reto-ETL a broken warehouse were the original ETL developers forgot to add — primary keys! I’ve learned to never rule anything out as far as data is concerned. I’ve seen every special character on the keyboard in a dataset before. Which leads me to believe someone loaded their delimiters.
I think people take data for granted like we do hardware. Not to mention, how often and long do people talk about the UI features they want? It’s more tangible to business users, somehow?
In the end, it’s always going to be there and we need to be slick in the ways we try to prevent it and clean it up when it does happen.
[...] Data Cleansing to Achieve Information Quality – Jackie Roberts raises some interesting questions regarding the efforts needed to cleanse data though multiple stages of analytics and processes to achieve appropriate information quality. [...]