Archive for March, 2010

Data Cleansing to Achieve Information Quality

Wednesday, March 10th, 2010

Those of us that work around or manage the day to day operations of an MDM, data governance, or data cleansing projects understand the challenges and efforts needed to transform “raw” data though multiple stages of analytics and processes to achieve information quality to be used in our customer’s CRM, CMMS, PIM and ERP systems. The result of an un-cleansed product record can cause a production line to stay off line because an inventory item wasn’t ordered due to incomplete information or added inventory cost of ordering an incorrect item (we can be talking about a $10,000 motor) or multiple entries and setups in the material master due to data duplication.

Data vs. Information definition: to simplify the concept, data is managed by a combination of a team of analysts and software to achieve the goal of a cleansed record or useable information. Data is imported and profiled, classified, structured, verified, enriched, translated and reports generated; we create useable information from low quality data for use in decision making related to engineering, purchasing, maintenance, marketing, sales, etc. The data that is exported into client systems is information that will meet a predetermined set of data governance rules and information quality requirements.

Data Quality Experts, let have a discussion on the definitions of data quality, does an address or a product detail meet the requirement if only classified? Or should verification at source (contact for address or manufacturer / supplier for product) be required at initial setup of the data in the system or maintenance scheduled as part of the data governance program? Is the data incomplete? Does the MDM process include a question / answer scenario to complete the data?

MDM software designers and developers can we also have a discussion on the software’s ease of use to manage the stages of data cleansing to support a MDM philosophy and using advanced techniques to automate the management, add intelligence in processing data imports, workflows and data cleansing stages of classifying, profiling, matching, translation, data audit analytics, exception reports and status reporting of a data record?

I believe these are great discussion points and will serve as great blog topics.

View Jackie Roberts's profile on LinkedIn

It Is Not So Easy to Build a Data Cleansing Logic

Tuesday, March 2nd, 2010

During my morning data quality, MDM and data cleansing reading, I happened upon this on a help site and the million $$ question:

I have a scenario to build a data flow task for Data Cleansing.

Logic 1 to be build:
Source data would be like 1050 and I should convert it to 1.050
Source data would be like 085 and I should convert it to 0.85

Profiling, structuring or normalizing data without any referential information risks errors in business use, especially if the data is use for purchasing or maintenance. If the goal is to automate the data normalization, the data needs to be referenced to metadata, 1050 could be a part number? Or a quantity? It could be an attribute representing a measurement such as length or diameter. Is it an inch or foot or meter?

View Jackie Roberts's profile on LinkedIn

Data Quality Open Issues and Questions?

Tuesday, March 2nd, 2010

Now that we have determined that MDM, Data Governance, Data Cleansing and Data Quality are important as well as the new trend for blogging, tweeting and discussion in general, I ask the most important question . . . HOW?  When do we get to the discussions on the content?

I am a very detail oriented person; I have to be as one of my largest accounts requires me to participate in the day to day deployment of global MDM processes for one the largest automotive manufacturers! I am very interested to learn how businesses in other industries manage their data. I would hope that sharing of information and best practices among industry partners will be a win-win situation. At a minimum the discussion will be refreshing; the sharing of innovative information the will spawn the creative improvement needed to create truly efficient knowledge driven business processes, data classifications, metadata and definitions and translation. . . is anyone interested in discussing the logistics of managing translation as part of Master Data Management?

Is anyone interested in discussing my struggles and sharing yours trying to find standard global translations for ISO UOM (Unit of Measures)?

Is anyone interested in discussing what fields should be included in a MDM Data Governance Program for MRO data; UNSPSC, warranty, term of warranty, lead time, estimated price, ECCN, etc.

What Schema or classification structures are you using for spare parts and maintenance items? What about a discussion on using a public vs. priority classification system?

What are some best practices for migrating, profiling, structuring, mismatching and re-verifying legacy system data?

We have a nifty data mismatch process for manufacturer contact information; will this be easily implemented for a CRM data project? What about patient contact information in the healthcare industry?

There are a few bloggers out there that continually add content to their writings but it is starting to appear to be a small group, anyone out there interested in achieving data quality want to discuss “real” life best practices, lesson learned or discuss HOW of MDM, data quality or data cleansing.

View Jackie Roberts's profile on LinkedIn