Posts Tagged ‘Data Profiling’

We Had a Data Cleansing Project and It Did NOT Work

Thursday, December 16th, 2010

Lately I have had a number of meetings with material and purchasing managers and I have come to two distinct conclusions from the feedback. First, businesses recognize the importance of data quality and have attempted to work on improving their information with either implementing an internal program or hiring a company to provide data cleansing services. The second conclusion is that the activity of Data Cleansing has such an incomplete and broad definition, I reference the blog post by Koa Beck in Gartner Releases Its Magic Quadrant for Master Data Management, “while we continue to monitor the aggregate MDM market, we still believe that it is premature.”

 A key component for Master Data Management (MDM) is data cleansing which has multiple disciplines such as address cleansing or PIM (product information management). My expertise is in PIM, therefore my meetings have been focusing on data in the ERP and Inventory system.

My latest meeting was with an informed Material Manager, he understood the concepts of master data management, after the introduction meeting, he stated that “We had a data cleansing project and it did not work, I ended up going back and correcting the data.” Through the discussion, I came to believe that the data cleansing company, extracted the data and attempted to auto classify a half million records. As a purchaser of these types of services, I asked what was the process for mapping and quality checks?

The business issue is the buying team’s inability to utilize spend analytics and the solution is that the data needs to be referenced to the UNSPSC® (The United Nations Standard Products and Services Code®). The scope of the project is mapping the purchasing data to the UNSPSC®. In my experience, I have identified four general levels of PIM data cleansing, 1) auto mapping 2) auto mapping with a manual review 3) verification and 4) enrichment. The cold hard facts are “buyer beware”.

The detail of the levels are:

  1. Auto mapping: if you have a large collection of data, automation is a requirement however there are some issues. First, auto mapping incorrect, incomplete and inconsistent data will result in a system that will still have incorrect, incomplete and inconsistent data. The quality of the auto mapping is dependent on the structure of the data. If the data is structured to a noun or class, the auto mapping process will have high quality rate. If the data is set up as “free text”, the results will be dismal. This method will not address duplication or data quality in your system.
  2. Auto mapping with a manual review: this process will take the results of the auto-mapping process and add a step of a manual review of the data. The question of the review, will all records be audited in the review, or is the process to review just the records that when the auto mapping just failed? How will consistency of the audit be managed? Again there are still the inherent issues as described in the auto mapping process.
  3. Verification: In order to improve data quality, the data cleansing process requires verification with the manufacturer (service or product). The verification process assures that the purchasing record is set up to the correct manufacturer (referenced to the supplier via the contract), part number for restock ordering, UOM (Purchasing Unit of Measure), description with correctly classified i.e. BEARING, TAPER and the UNSPSC®. Our process is to request the manufacturer to provide the UNSPSC®. If the manufacturer cannot provide the UNSPSC®, the item is correctly classified; the auto map to the UNSPSC® will be successful. The verification process positions the data to identify duplication, manufacturer obsolescence and inaccurate data requiring additional information from the business to reconcile.
  4. Enrichment: The fourth level of data cleansing quality, in addition to verifying, the data is enriched, this can be obtaining a price, warranty with the terms, additional description attributes, ECCN (Export Control Classification Number), recommended repair spare part information, eCl@ss, NSN (National Stock Number) or any other data element your business requires.

The conclusion is asking the right questions of how my data cleansing project will be implemented and managed are essential to making it a successful data cleansing project.

What do you say to . . . I get all the spend details from the supplier and quote this on occasion.

Thursday, July 29th, 2010

And he continued to say “That’s the area where we would need the least amount of help given that we’ve outsourced these parts ten years ago and the low hanging fruit is not around any longer. What do you say to the outsourced scenario of the management of use, cost and inventory out of control the buying teams”?

My first question is “how you would get information when it’s not in your system”? Does your supplier manage inventory for all of your plants and facilities resulting in a global view of spend? Does your supplier manage your data to the OEM or to suppliers so you have duplicate inventory costs?

Just considering the MRO items, the information could come from engineering or the integrated supplier. Logically, the integrated supplier would have been provided the part information from your company in order to setup and purchase the items in the first place. It is likely that they have the records as they were given them and they are linked to item setup in the purchasing system. The top level source would have been engineering who would have either had the equipment constructed or been responsible for the equipment purchase and the parts along with them. If after or during the purchasing activity the “key” item record is setup in the purchasing system using the part supplier information versus the OEM information, this will lead to item duplication. Duplication then will create overstock, variant pricing, variant lead times and other inconsistencies that add unnecessary cost.

Based on what you are saying it sounds like items in your system are based on either the part supplier data or specifically identified by the integrated supplier (their item number). The best scenario is when the OEM part is what is setup as the key item, having the purchase action to the OEM directly (OEM setup as a supplier) removing the “middle man” cost. Second after that is having the OEM part as the item, linked to the specific supplier(s) for purchase. Local purchase suppliers are still linked to the same item also. Having the same item record used across the enterprise is optimum.

I would also add that there should be a means to discover OEM part information as a reactive purchase need comes from maintenance. Parts are typically identified physically with OEM information. For example an Allen Bradley/Rockwell module with have the Allen Bradley part number physically stenciled on it. If a part breaks and maintenance needs one, there must be a way to find out if that part is in stock and a way to buy it if is not.  We believe that enterprise wide viewable, verified and standardized OEM part information will reduce the cost for maintenance by eliminating the time consuming discovery of part information in your systems and the correct parts are stocked. This approach also enables part sharing between facilities that is limited without common data. Part sharing in turn reduces overall cost through reduction of inventory.  With plants here in the U.S. and worldwide, this type of advanced planning is where the true brunt of the savings come through.

Obviously, much depends on the specific agreements with your integrated supplier. But consider the following questions. If the data stored in your system is not the OEM information then it’s logical to assume that it is data created by the integrated supplier from the OEM data.

    1) How does your company know that the information is accurate? Are there any checks between the data given to the integrated supplier and what you have in your system?

    2) How does your company know if they have the correct parts setup in the system and stocked appropriately? It seems that there is an opportunity for the integrated supplier to setup and stock items which aren’t necessary and would only be discovered through data transparency.

    3) How does your company know that you are getting the best price on parts? Even if there is a cost savings agreement with the integrated supplier, if there are duplicates the opportunity for piece cost reduction is lost when the true usage is not known because of part duplication. 

My second question in this. It seems from your response that everything is running quite smoothly. But is that true in Manufacturing? Do they ever experience loss of production because a vital part could not be found or was out of stock? How about Maintenance? Inventory management? Engineering? These are the departments that should be surveyed because there is a benefit for them too.

Hey baby, what is your material type and material status . . .

Tuesday, June 15th, 2010

You would never believe the discussions around the “ho-hum” or “don’t sweat the small details” elements of a data cleansing project. Believe it or not, understanding your material type and material status is critical to be able to automate system updates. I have a firm belief that data updates to legacy systems should be completed as a night job or direct feed based a series of programmed templates. In one recent example we created an Oracle system update process for a new item referencing a material type template or another update process if the item is already set up for another location of use but is new to the requesting location, this is sometimes referred to as a location setup or purchasing organization update. You can start to imagine the amount pre-planning work and data mapping that is required for a data cleansing program.

The first fundamental rule is that the customer business doesn’t stop. For all you data purists out there that believe that one day a switch to turn on the cleansed database is in the near future, please include me, I would like to see it. Most master data management projects included years and years of legacy data; therefore there is an acceptance to draw a line in the database by last used date. When I design a data cleansing project, I will have a new item setup process referenced to legacy items, this way the client business continues and as the new items are analyzed and setup, we can reference and update the legacy item information. Independently, we will always have the legacy data cleansing parallel the new set up process.

As the data cleansing project is designed, let’s start to explore the data elements and classifications. Every client will have their material types and material status set up but generally during the data / systems assessment there should be a thorough review of industry standards vs. company processes. I find that our clients appreciate the opportunity to bench mark their processes and data structure elements such as material types and status.  We will start with material type and material status.

Material Type

Material types can be as simple as goods and services or as complicated as service, critical spare, spare part, commodity, generic, blueprint, etc. The material type is a critical element to classify which template is used for setup in the downstream legacy systems with an inventory stocking strategy applied.

Obviously a service can be standardized by the class type to describe the service where a cost for the service can be standardized. The definition of the service is described by the properties, for instance a service class of CLEANING, OFFICE can be set up with descriptive elements such as 10,000 square feet, light cleansing (dusting / vacuuming), etc. From a purchasing perspective, the buyer can run the reports globally to determine how much is spent for office cleaning then evaluate the costs and utilize best practice sourcing strategies and other global supply chain processes to lower costs. The purpose of the standard naming conventions of classes and property are to provide enough standardize information to provide the ability to compare and cost services or products.

If a critical spare is being set up for sourcing and inventory, then the part has been evaluated by maintenance or engineering and determined that the spare is critical for production uptime. An inventory plan is developed for stocking the critical spare including an initial buy quantity, plan for stores (inventory) setup of item’s unit of measure (each, assembly, package, etc.), min / max, reorder quality, stocking location, etc.

Material Status

In addition to applying a “material type” to the item records, due to the longevity of materials used in the manufacturing operation, a material status should be utilized as a long term data maintenance process. In dealing with component manufacturers and suppliers, a component may be active from a plant use perspective; however the component manufacturer no longer manufactures the item. How is that possible? A piece of equipment can have a 10 year or a 50 year life span, to maintain a piece of equipment, a list of recommended spare parts is identified and set up for equipment maintenance. If the spare part component is obsolete by the manufacturer but the piece of equipment is still in use on the production line, the material status would be “obsolete active”. A different buy / stock strategy would be implemented, such as purchase all available stock from the manufacturer or another alternative is to source with unconventional methods such as through eBay or maybe contract the item to be built by a local shop.

Typical material statuses that I have experienced are active, inactive item referenced to an active item, obsolete active, obsolete inactive (typically the status to start the disposal process) and archive. The archive status is a classification used by the analysts to allow the viewing of the item information but is not visible to the client or the item record is not exported to the client systems.

I would appreciate any input or better yet a discussion of the different material types and material status used in Product Information Management (PIM) or Master Data Management (MDM). As an industry we inherited material types and material status used in a purchasing system or maintenance systems designed to meet business function but not from the data quality or master data management perspective. What are the proper data requirements for a material type or material status? The MDM or PIM software companies and data quality consultants need to provide input from the data management perspective to provide long term data management functionality.

View Jackie Roberts's profile on LinkedIn

Did we forget the old adage “Garbage In, Garbage Out” I mean Garbage Extracted, Garbage Migrated

Friday, April 23rd, 2010

When it comes to Master Data Management, the implied definition is an à la carte of detailing and normalizing activities including data cleansing, data verification, data profiling, data governance, de-duplication, data enrichment and data provenance among other tasks. If you are managing or participating in the activities of a Master Data Management program, you are progressing in the right direction of achieving data quality. If you are NOT participating in the activities of MDM then you are part of a company wide initiative of “Garbage In, Garbage Out (GIGO)”. By the way, GIGO, in this case is not environmentally responsible or a “green” behavior.

Wikipedia’s definition for “Garbage In, Garbage Out, is a phrase in the field of computer science or information and communication technology. It is used primarily to call attention to the fact that computers will unquestioningly process the most nonsensical of input data (Garbage in) and produce nonsensical output (Garbage out).”

If you enter “garbage in” to a computer system, having the data passed through some very expensive ERP or CMMS software, isn’t going to change the data quality, the business results are equivalent to “garbage out”, which will be apparent in the day to day business activities and subsequent reporting used to determine the health of your business. Is it obvious that data should just not be moved from one system to a new system without a MDM program?

Let us now explore the concept of data migration. Wikipedia’s definition for Data Migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge.

If an MDM program is not in process when implementing a new software or upgrading an existing software, the project should include an evaluation of the data and/or an evaluation of the additional functionality of the “to be” model of the new software identifying the new data required for improved business processes, reporting and the plan for legacy data clean up. A data migration project needs to be more than moving data from a legacy system to the new system.

I asked the question to one user of a maintenance software implemented a number of years earlier as I had the opportunity during a site visit at a plant. The software had awesome abilities to create and manage the relationships between equipment and spare parts, supplier contacts as well as the potential to improve processes, reporting and streamlining the information required for a maintenance organization. The company invested in the software / hardware, understood the ROI but lack the understanding of the data needs or management. The software was implemented however the majority of the functionality was not used, therefore the ROI was never achieved. When I asked why, I was told “no data and we don’t have time to add the data.”

Another scenario I came across, purchasing moved data from a legacy system to a new ERP system. The data wasn’t set up to a data governance or MDM procedure, legacy data riddled with duplication, obsolete information, unstructured descriptions and so forth. Different system, same legacy data quality and the ROI was never achieved.

I have one simple question, why invest in a software product if the data is not going to be treated as an asset? The results of a successful implementation are that the business processes are streamlined; simplified and reporting capabilities are enhanced through enabling both Master Data Management and Software functionality.

Garbage In, Garbage Out or Garbage Extracted, Garbage Migrated as we are moving to the next generation of technology. Are we relying on a skewed nonsensical output based on low quality data to make our critical business decisions?

View Jackie Roberts's profile on LinkedIn

ECCMA’s 11th Annual Data Quality Conference Oct 12-14, 2010

Friday, April 23rd, 2010

Whether you are new to data quality or a seasoned professional, this conference will provide you with a unique opportunity to discuss the latest trends, technologies and software available to the data quality industry. You’ll experience top level speakers discussing how to manage, catalog, clean and standardize your data. It will introduce you to the international standard for data quality , ISO 8000-110. An exhibition will showcase the latest data quality software from companies not only in the U.S. but all over the world.

PROGRAM OF EVENTS  

Tuesday October 12, 2010

  • Pre-conference ISO 8000-110:2009 Master Data Quality Certification Workshop 
  • Welcome Reception (includes open bar and hors d’ oeuvres)

Wednesday October 13, 2010

  • Opening Address
    Overview: The critical need to maintain the quality of master data.
  • Panel PresentationsFundamental updates on the progress of the practical application of the eOTD (ECCMA Open Technical Dictionary), ISO 22745 and ISO 8000 for the collection, validation, and distribution of master data in support of the procurement of goods and services as well as inventory and asset management initiatives. The panels will address the importance to using the standards to define and manage data requirements as well as the latest trends in spend analysis, cataloguing at source and data cleansing and rendering.
  • Exhibition
    A unique opportunity to see the latest offerings from leading data service and software application providers.
  • Annual Awards Dinner
    Celebrate and share achievements with colleagues and friends.

Thursday October 14, 2010

  • Workshops

 

Workshops will cover new technology and practical examples of vendor specific data quality application software and data cleaning services.

*Content subject to change.

Data Cleansing to Achieve Information Quality

Wednesday, March 10th, 2010

Those of us that work around or manage the day to day operations of an MDM, data governance, or data cleansing projects understand the challenges and efforts needed to transform “raw” data though multiple stages of analytics and processes to achieve information quality to be used in our customer’s CRM, CMMS, PIM and ERP systems. The result of an un-cleansed product record can cause a production line to stay off line because an inventory item wasn’t ordered due to incomplete information or added inventory cost of ordering an incorrect item (we can be talking about a $10,000 motor) or multiple entries and setups in the material master due to data duplication.

Data vs. Information definition: to simplify the concept, data is managed by a combination of a team of analysts and software to achieve the goal of a cleansed record or useable information. Data is imported and profiled, classified, structured, verified, enriched, translated and reports generated; we create useable information from low quality data for use in decision making related to engineering, purchasing, maintenance, marketing, sales, etc. The data that is exported into client systems is information that will meet a predetermined set of data governance rules and information quality requirements.

Data Quality Experts, let have a discussion on the definitions of data quality, does an address or a product detail meet the requirement if only classified? Or should verification at source (contact for address or manufacturer / supplier for product) be required at initial setup of the data in the system or maintenance scheduled as part of the data governance program? Is the data incomplete? Does the MDM process include a question / answer scenario to complete the data?

MDM software designers and developers can we also have a discussion on the software’s ease of use to manage the stages of data cleansing to support a MDM philosophy and using advanced techniques to automate the management, add intelligence in processing data imports, workflows and data cleansing stages of classifying, profiling, matching, translation, data audit analytics, exception reports and status reporting of a data record?

I believe these are great discussion points and will serve as great blog topics.

View Jackie Roberts's profile on LinkedIn

It Is Not So Easy to Build a Data Cleansing Logic

Tuesday, March 2nd, 2010

During my morning data quality, MDM and data cleansing reading, I happened upon this on a help site and the million $$ question:

I have a scenario to build a data flow task for Data Cleansing.

Logic 1 to be build:
Source data would be like 1050 and I should convert it to 1.050
Source data would be like 085 and I should convert it to 0.85

Profiling, structuring or normalizing data without any referential information risks errors in business use, especially if the data is use for purchasing or maintenance. If the goal is to automate the data normalization, the data needs to be referenced to metadata, 1050 could be a part number? Or a quantity? It could be an attribute representing a measurement such as length or diameter. Is it an inch or foot or meter?

View Jackie Roberts's profile on LinkedIn