Posts Tagged ‘masterdata’

Data Quality: An essential tool to facilitate national security?

Monday, June 28th, 2010

When I think of how to best protect the nations infrastructure two broad categories of national security come to mind, physical security and electronic security.

 Physical protection of our nation’s infrastructure happens at many levels, namely, National which includes all the United States armed forces protecting our land, sea and airspace borders. At the state level which includes state police forces, state highway patrols, and the National Guard forces. The local protection is handled by each and every local police force in addition local community civilian efforts to monitor suspicious activity around neighborhoods more commonly known as neighborhood watches.

 These amounts to hundreds of thousands of human resources that need to be analyzed managed and deployed properly to ensure the nation’s entire critical “brick and mortar” infrastructure is protected adequately and to ensure those individuals are given the proper tools to execute their assigned tasks. The question becomes how do we effectively inventory all of the nation’s human and physical resources in order to facilitate real-time assessments of our ability to protect our critical infrastructure?

 That question presents several seemingly insurmountable problems… Potentially tens of thousands of local level sources of data, thousands of sources for state level data, thousands of sources of federal data in addition to thousands of data sources contained within the firewalls and control of private and publicly owned companies as is the case with major power distribution and communication corporations. To make matters worse there could be hundreds of different software vendors that support the collection, storage, maintenance, and reporting of this information. Each software package has a unique and often proprietary data model and most likely a unique and proprietary meta data schema used to tag the meaning of each database field and values within those fields. Without consistent and easily discernable database schema from each data source, integrating the mass data becomes an impossible task to do in a way that the resulting information can be trusted to make decisions in what amounts to life or death situations.

 One potential solution to this problem is a combination approach which would include a National Master Data Management initiative with the ultimate goal of achieving a level of data quality sufficient enough to accurately derive infrastructure intelligence information to be used to assess the vulnerability any given infrastructure asset has to attack and the potential damage and casualty fallout if an attack were to occur at a location such as any bridges that carry large amount of cargo across the Mississippi. Not just any data will do when allocating resources to secure and protect such important national resources; the data needs to be quality data, data which consists of profiled and standardized vocabulary, the syntax or format of the data, the provenance or source of the data as well as the accuracy and completeness of data. Data that does not meet minimum requirements for the metrics I listed previous can not be used to make decisions that are supported by accurate data.

 My proposal of a National Master Data Management Program would in its most basic description include the creation of a national mashup of these thousands of data sources into one Infrastructure Security Data Warehouse that would be used to govern, analyze and report the readiness of the nation’s infrastructure if an attack were to occur.       

 A mashup is a relatively new term used to describe a database application that pulls information from tens to thousands of data sources and integrates the data together so it can be analyzed over and over again. At the same time would not require the replacement of the thousands of legacy systems the local, state, federal, and private entities use to run their day to day operations.

 I have oversimplified the problem and solution in this explanation, the actual solution is very technical and requires professionals who have managed data integration, data cleansing, data governance and master data management programs in the past. An initiative like we are proposing is not a short term project with a defined beginning and end date, in reality the project began more than 30 years ago with the collection of electronic data. A master data management program for something as critical as the United States infrastructure is not simply a program; it is a complete change in the thinking and the way we interact with the data we spend so much time, money and resources archiving and cataloging. It is a cultural change.

 More technically speaking the idea is to create a classified open technical dictionary which will contain all the terms and definitions needed to describe, at their most atomic level, every single data field we require to generate the information needed to assess and prioritize potential infrastructure targets. We will then tag or associate one of the classified numbers and terms to every single piece of asset data (Master Data) determined necessary to implement the infrastructure protection strategy. We will use a combination of publicly available meta data and newly created meta data to tag all the data elements. This allows us to store and report on them quickly and accurately from a central location with a centralized team, resulting in a true Infrastructure Intelligence Master Data Program. Once all the target data is in a standard format we can further national security goals of increasing the level of quality related to information and reporting, increase the level of interaction between local state and federal authorities and provide threat advisories to citizens and the law enforcement community. As well as providing data that can be used in a variety of training exercises and computer simulations of potential attacks.

 Core to the proposed data quality/master data management program are the business processes used to carry out the data cleansing and enrichment processes. The methods used NEED to be vetted and thoroughly tested with large datasets and algorithms comparable to the complexity of the algorithms that will need to be developed to predict things such as; How many people will be effected if a particular power substation is physically or cyber attacked?, Which railroad infrastructure is key to keeping the flow of military supplies to ports for distribution to our forces worldwide?, Once a threat is identified how long will it take to mobilize enough manpower to properly defend any given location?, Which power generation facilities are large enough that if damaged would results in blackout for 5% or more of the population?, Which open air water sources, if poisoned will cause the most death?. The processes and methods used to execute the project in its initial phase and on an ongoing basis are just as if not MORE important than the quality of the initial data sets, they are mutually independent on each other.

 The implementation of an ongoing and permanent national Infrastructure Master Data Management initiative would be of great benefit and could be essential to the long term growth and protection of the United States of America.

Hey baby, what is your material type and material status . . .

Tuesday, June 15th, 2010

You would never believe the discussions around the “ho-hum” or “don’t sweat the small details” elements of a data cleansing project. Believe it or not, understanding your material type and material status is critical to be able to automate system updates. I have a firm belief that data updates to legacy systems should be completed as a night job or direct feed based a series of programmed templates. In one recent example we created an Oracle system update process for a new item referencing a material type template or another update process if the item is already set up for another location of use but is new to the requesting location, this is sometimes referred to as a location setup or purchasing organization update. You can start to imagine the amount pre-planning work and data mapping that is required for a data cleansing program.

The first fundamental rule is that the customer business doesn’t stop. For all you data purists out there that believe that one day a switch to turn on the cleansed database is in the near future, please include me, I would like to see it. Most master data management projects included years and years of legacy data; therefore there is an acceptance to draw a line in the database by last used date. When I design a data cleansing project, I will have a new item setup process referenced to legacy items, this way the client business continues and as the new items are analyzed and setup, we can reference and update the legacy item information. Independently, we will always have the legacy data cleansing parallel the new set up process.

As the data cleansing project is designed, let’s start to explore the data elements and classifications. Every client will have their material types and material status set up but generally during the data / systems assessment there should be a thorough review of industry standards vs. company processes. I find that our clients appreciate the opportunity to bench mark their processes and data structure elements such as material types and status.  We will start with material type and material status.

Material Type

Material types can be as simple as goods and services or as complicated as service, critical spare, spare part, commodity, generic, blueprint, etc. The material type is a critical element to classify which template is used for setup in the downstream legacy systems with an inventory stocking strategy applied.

Obviously a service can be standardized by the class type to describe the service where a cost for the service can be standardized. The definition of the service is described by the properties, for instance a service class of CLEANING, OFFICE can be set up with descriptive elements such as 10,000 square feet, light cleansing (dusting / vacuuming), etc. From a purchasing perspective, the buyer can run the reports globally to determine how much is spent for office cleaning then evaluate the costs and utilize best practice sourcing strategies and other global supply chain processes to lower costs. The purpose of the standard naming conventions of classes and property are to provide enough standardize information to provide the ability to compare and cost services or products.

If a critical spare is being set up for sourcing and inventory, then the part has been evaluated by maintenance or engineering and determined that the spare is critical for production uptime. An inventory plan is developed for stocking the critical spare including an initial buy quantity, plan for stores (inventory) setup of item’s unit of measure (each, assembly, package, etc.), min / max, reorder quality, stocking location, etc.

Material Status

In addition to applying a “material type” to the item records, due to the longevity of materials used in the manufacturing operation, a material status should be utilized as a long term data maintenance process. In dealing with component manufacturers and suppliers, a component may be active from a plant use perspective; however the component manufacturer no longer manufactures the item. How is that possible? A piece of equipment can have a 10 year or a 50 year life span, to maintain a piece of equipment, a list of recommended spare parts is identified and set up for equipment maintenance. If the spare part component is obsolete by the manufacturer but the piece of equipment is still in use on the production line, the material status would be “obsolete active”. A different buy / stock strategy would be implemented, such as purchase all available stock from the manufacturer or another alternative is to source with unconventional methods such as through eBay or maybe contract the item to be built by a local shop.

Typical material statuses that I have experienced are active, inactive item referenced to an active item, obsolete active, obsolete inactive (typically the status to start the disposal process) and archive. The archive status is a classification used by the analysts to allow the viewing of the item information but is not visible to the client or the item record is not exported to the client systems.

I would appreciate any input or better yet a discussion of the different material types and material status used in Product Information Management (PIM) or Master Data Management (MDM). As an industry we inherited material types and material status used in a purchasing system or maintenance systems designed to meet business function but not from the data quality or master data management perspective. What are the proper data requirements for a material type or material status? The MDM or PIM software companies and data quality consultants need to provide input from the data management perspective to provide long term data management functionality.

View Jackie Roberts's profile on LinkedIn

Enterprise Information Management 2010 via DAMA Management International

Friday, June 11th, 2010

Presentation proposals are now being accepted for the second Enterprise Information Management Conference scheduled for September 21-23, 2010 at the Hilton Toronto in Toronto, Canada.

Speaker submission guidelines can be found here: Online Proposal Form.

All questions regarding speaking may be directed to Wilshire Conferences at maya@wilshireconferences.com. The deadline for submitting your proposal is June 4, 2010, and we anticipate being able to notify accepted speakers by June 14, 2010.

Thanks and we look forward to hearing from you!

Did we forget the old adage “Garbage In, Garbage Out” I mean Garbage Extracted, Garbage Migrated

Friday, April 23rd, 2010

When it comes to Master Data Management, the implied definition is an à la carte of detailing and normalizing activities including data cleansing, data verification, data profiling, data governance, de-duplication, data enrichment and data provenance among other tasks. If you are managing or participating in the activities of a Master Data Management program, you are progressing in the right direction of achieving data quality. If you are NOT participating in the activities of MDM then you are part of a company wide initiative of “Garbage In, Garbage Out (GIGO)”. By the way, GIGO, in this case is not environmentally responsible or a “green” behavior.

Wikipedia’s definition for “Garbage In, Garbage Out, is a phrase in the field of computer science or information and communication technology. It is used primarily to call attention to the fact that computers will unquestioningly process the most nonsensical of input data (Garbage in) and produce nonsensical output (Garbage out).”

If you enter “garbage in” to a computer system, having the data passed through some very expensive ERP or CMMS software, isn’t going to change the data quality, the business results are equivalent to “garbage out”, which will be apparent in the day to day business activities and subsequent reporting used to determine the health of your business. Is it obvious that data should just not be moved from one system to a new system without a MDM program?

Let us now explore the concept of data migration. Wikipedia’s definition for Data Migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge.

If an MDM program is not in process when implementing a new software or upgrading an existing software, the project should include an evaluation of the data and/or an evaluation of the additional functionality of the “to be” model of the new software identifying the new data required for improved business processes, reporting and the plan for legacy data clean up. A data migration project needs to be more than moving data from a legacy system to the new system.

I asked the question to one user of a maintenance software implemented a number of years earlier as I had the opportunity during a site visit at a plant. The software had awesome abilities to create and manage the relationships between equipment and spare parts, supplier contacts as well as the potential to improve processes, reporting and streamlining the information required for a maintenance organization. The company invested in the software / hardware, understood the ROI but lack the understanding of the data needs or management. The software was implemented however the majority of the functionality was not used, therefore the ROI was never achieved. When I asked why, I was told “no data and we don’t have time to add the data.”

Another scenario I came across, purchasing moved data from a legacy system to a new ERP system. The data wasn’t set up to a data governance or MDM procedure, legacy data riddled with duplication, obsolete information, unstructured descriptions and so forth. Different system, same legacy data quality and the ROI was never achieved.

I have one simple question, why invest in a software product if the data is not going to be treated as an asset? The results of a successful implementation are that the business processes are streamlined; simplified and reporting capabilities are enhanced through enabling both Master Data Management and Software functionality.

Garbage In, Garbage Out or Garbage Extracted, Garbage Migrated as we are moving to the next generation of technology. Are we relying on a skewed nonsensical output based on low quality data to make our critical business decisions?

View Jackie Roberts's profile on LinkedIn

ECCMA’s 11th Annual Data Quality Conference Oct 12-14, 2010

Friday, April 23rd, 2010

Whether you are new to data quality or a seasoned professional, this conference will provide you with a unique opportunity to discuss the latest trends, technologies and software available to the data quality industry. You’ll experience top level speakers discussing how to manage, catalog, clean and standardize your data. It will introduce you to the international standard for data quality , ISO 8000-110. An exhibition will showcase the latest data quality software from companies not only in the U.S. but all over the world.

PROGRAM OF EVENTS  

Tuesday October 12, 2010

  • Pre-conference ISO 8000-110:2009 Master Data Quality Certification Workshop 
  • Welcome Reception (includes open bar and hors d’ oeuvres)

Wednesday October 13, 2010

  • Opening Address
    Overview: The critical need to maintain the quality of master data.
  • Panel PresentationsFundamental updates on the progress of the practical application of the eOTD (ECCMA Open Technical Dictionary), ISO 22745 and ISO 8000 for the collection, validation, and distribution of master data in support of the procurement of goods and services as well as inventory and asset management initiatives. The panels will address the importance to using the standards to define and manage data requirements as well as the latest trends in spend analysis, cataloguing at source and data cleansing and rendering.
  • Exhibition
    A unique opportunity to see the latest offerings from leading data service and software application providers.
  • Annual Awards Dinner
    Celebrate and share achievements with colleagues and friends.

Thursday October 14, 2010

  • Workshops

 

Workshops will cover new technology and practical examples of vendor specific data quality application software and data cleaning services.

*Content subject to change.

Data Cleansing to Achieve Information Quality

Wednesday, March 10th, 2010

Those of us that work around or manage the day to day operations of an MDM, data governance, or data cleansing projects understand the challenges and efforts needed to transform “raw” data though multiple stages of analytics and processes to achieve information quality to be used in our customer’s CRM, CMMS, PIM and ERP systems. The result of an un-cleansed product record can cause a production line to stay off line because an inventory item wasn’t ordered due to incomplete information or added inventory cost of ordering an incorrect item (we can be talking about a $10,000 motor) or multiple entries and setups in the material master due to data duplication.

Data vs. Information definition: to simplify the concept, data is managed by a combination of a team of analysts and software to achieve the goal of a cleansed record or useable information. Data is imported and profiled, classified, structured, verified, enriched, translated and reports generated; we create useable information from low quality data for use in decision making related to engineering, purchasing, maintenance, marketing, sales, etc. The data that is exported into client systems is information that will meet a predetermined set of data governance rules and information quality requirements.

Data Quality Experts, let have a discussion on the definitions of data quality, does an address or a product detail meet the requirement if only classified? Or should verification at source (contact for address or manufacturer / supplier for product) be required at initial setup of the data in the system or maintenance scheduled as part of the data governance program? Is the data incomplete? Does the MDM process include a question / answer scenario to complete the data?

MDM software designers and developers can we also have a discussion on the software’s ease of use to manage the stages of data cleansing to support a MDM philosophy and using advanced techniques to automate the management, add intelligence in processing data imports, workflows and data cleansing stages of classifying, profiling, matching, translation, data audit analytics, exception reports and status reporting of a data record?

I believe these are great discussion points and will serve as great blog topics.

View Jackie Roberts's profile on LinkedIn

It Is Not So Easy to Build a Data Cleansing Logic

Tuesday, March 2nd, 2010

During my morning data quality, MDM and data cleansing reading, I happened upon this on a help site and the million $$ question:

I have a scenario to build a data flow task for Data Cleansing.

Logic 1 to be build:
Source data would be like 1050 and I should convert it to 1.050
Source data would be like 085 and I should convert it to 0.85

Profiling, structuring or normalizing data without any referential information risks errors in business use, especially if the data is use for purchasing or maintenance. If the goal is to automate the data normalization, the data needs to be referenced to metadata, 1050 could be a part number? Or a quantity? It could be an attribute representing a measurement such as length or diameter. Is it an inch or foot or meter?

View Jackie Roberts's profile on LinkedIn

Open Letter to Gartner

Thursday, February 4th, 2010

Dear Andrew White,

Thank you for your comments in “Something beyond MDM is coming your way – would MDM 2.0 fly?” and starting the discussion to expand the definition of MDM to include data integrity, data quality, entity resolution, matching, data integration, governance, metrics and analysis. The topics discussed should also include work flow (management of data and analysts), translation management, data structuring, data profiling, duplication removal, data change management, verification contact management, etc.

The MDM and PIM software industry needs to take a step back to understand actual day to day business requirements of data management to achieve Master Data Quality. Lesson one is that data is created and supplied by many sources in many different formats at various quality levels. Data is created by engineering, submitted by integrators, manufacturers and suppliers. To add to the complexity of the information flow, data is introduced into businesses systems in different departments (engineering or purchasing or maybe plant from maintenance) with different data requirements to meet the needs of that job function. Now the next dynamic is mashing new data to existing legacy data in a number of systems to ensure no duplicates are created, managing obsolete / recommended use and functional equivalents. The old philosophies of a PIM or MDM software to “hold, provide search functionality and maybe a shopping cart” isn’t going to meet the true requirements of the new definitions of Master Data Management.

To meet the new definitions the MDM or PIM software needs to provide horse power to electronically and intelligently processing data to identify exceptions for manual intervention by an analyst. Data should be processed one time to ensure that the data record will be enriched to meet the requirements of the enterprise and then the record is moved to a maintenance program (managed also by the MDM or PIM software). The processing of data needs to be efficient and cost effective, from my perspective the cost of data management should be covered by the cost saving achieved by MDM management.

I look forward to the discussions as the definition of MDM is expanded to include data quality, data governance, data provenience as the software industry provides the intelligence, functionality and business processes to cleanse, enrich and management data for my client to ensure their ability to make confident business decisions based on data integrity and accuracy.

Here is to the future of PIM and MDM!

Jackie Roberts

View Jackie Roberts's profile on LinkedIn

New Data Management System Implementation Common Sense

Friday, January 8th, 2010

With the ever increasing emphasis on finding ways to reduce cost, one of the clear targets is IT and more specifically data management systems. On the surface it can seem like there is real fat to trim, and many times this is true. But it is easy to become lost in the details and eliminate or negate some of the potential savings. Some of these ideas may seem obvious but are often forgotten. The evidence is clear with missed timing and over budget issues seen.

If we’re talking about a large company then inevitably with this new system comes the monolith project with whole organizations of people and processes, projects and documentation. The compulsion is to be sure that everyone, everywhere who has any relationship to it has their input and their needs accounted for. Along the way, the cost of implementation and other peripheral indirect costs have likely negated a great deal of at least any short term savings. Not to mention the potential increase in continuous maintenance costs and loss in performance. These are a few things I’ve learned from experience and I welcome yours.

Always have a specific objective when planning for development or evaluating software to purchase that overrides all others. Start with something like a mission statement, “We need this new system for….”

Determine the Real Needs. Try to separate the “must haves” from the “nice to haves”. Bells and whistles are great but there needs to be a true benefit. Seek a balance between development time, software performance, hardware performance and user experience. I always try to put special emphasis on the user group which stands to benefit the most. Having many users who can do their job faster and more efficiently can add up to real savings versus the few users who have a special need which bogs down the project and performance.

Change is inevitable. If some requests for additional features come along, evaluate them against the mission objective. There is nothing wrong with listening and investigating ideas for project add-ons as long as the benefits outweigh the costs in time and money, but there needs to be a limit or you’ll never complete the project. Good ideas can always be implemented later if it makes sense then you’ll have the benefit of the research already done, but be quick with the research. Evaluate the impact for doing it now or waiting. Here are some good questions to start with: 1) How much more money?  2) Would this be faster/cheaper for programming to do it now versus waiting and doing a more complicated enhancement?  3) Is the impact to the users great enough to warrant it?

Know the roles. Good ideas can come from anyone. Every project must have a project champion who makes the final decisions (and live with them) and also eliminate roadblocks. You need a user advocate who has done the job and knows what it takes. Have programmers who possess both talent and vision, not just code crunchers, and listen to them.

Have good documentation, and “Good” is subject to interpretation. This is another area where the KISS principle is very often not utilized. If you have to hire ten people to sit in meetings just to maintain your documentation you’ve probably overcomplicated it and certainly increased your project cost. I try to start with these principles:

  1. Document the people on the project and their responsibilities. Let there be no question as to who does what.
  2. Everyone who has a job to do needs to understand what they need to do and have the documentation to reference.
  3. Keep the language simple. Focus on getting the point across. If it takes a rocket scientist to understand it you’ve failed.
  4. Of course, document the issues, decisions made, by whom etc. but be sensible. Document enough to cover for the “he said/she said” but content is most important. No bonus points for flash.
  5. Know who is supposed to have what done and when. Another obvious one here but I see too often where target dates are determined top down with little or no thought to cost or the tasks. Don’t let the tail wag the dog. Pushing hard to get the job done is fine but be realistic. Listen to the people who know before making bold predictions.

Data Quality: Classify and Describing

Wednesday, December 2nd, 2009

As the Master Data Management industry matures, the industry focus is not only on the development of software to collect product records but software to implement the data quality process solutions supporting data governance and provenance including record history, structure, completeness and accuracy to ensure our customers are able to make confident, informed and accurate business decisions based on data accuracy. The first step of implementing a data governance program is implementing a naming classification system.

I have had experience working with single business home-grown classification structures and third party developed structures for purchase, currently I have chosen an open and public classification structure provided by ECCMA (www.eccma.org). This is beneficial to the customers that I support ensuring that they will always have access to the classification structure sometime referred to as the schema used to classify their data.

Implementing a classification requires setting up Identification Guide (IG) to establish the template definition to technically describe the product or service with enough information to support engineering, maintenance or purchasing while recognizing the limitation of software short and long description required character lengths. The IG template supports and simplifies the required information request to the manufacturer and suppliers to verify all information by our analysts to standardize the description.

To create an IG, we search the ECCMA class list; fortunately many of the classes are established. As the IG is set up we will use the ECCMA established class name convention; this will ensure that every item will be setup with the same name and format, every ball bearing item submitted will be classified as a BEARING, BALL.

The next step is to set up the properties required to describe the BEARING, BALL and for each property designated the data type requirements such as numeric, text string or designated unit of measure. The property value requirements for a BEARING, BALL might include TYPE, BORE DIAMETER, OUTSIDE DIAMETER, WIDTH, DYNAMIC LOAD CAPACITY, STATIC LOAD CAPACITY, MATERIAL and so forth. Our analysts will verify the data to the original manufacturer sometimes using xml to exchange the product information referred to as “Cataloging at Source”, the information requests are standardized and remove much of the quality issues commonly found in a non-standardized data verification or description process.

The property value description build is controlled by the sequence number of each property Item data that will make it’s way into a length restricted description field we place the most important information in the begin of the auto generated description.

Setting up the Identification Guides requires upfront strategic planning and detailed work, as you can imagine that a classification schema can be up to 10,000 classes depending on the industry but it provides a multitude of benefits including standardized requirements, a road map for our analysts to facilitate the process, improved data management reporting / metrics and enhances language translation for the global organization.

View Jackie Roberts's profile on LinkedIn