Archive for June, 2010

Data Quality: An essential tool to facilitate national security?

Monday, June 28th, 2010

When I think of how to best protect the nations infrastructure two broad categories of national security come to mind, physical security and electronic security.

 Physical protection of our nation’s infrastructure happens at many levels, namely, National which includes all the United States armed forces protecting our land, sea and airspace borders. At the state level which includes state police forces, state highway patrols, and the National Guard forces. The local protection is handled by each and every local police force in addition local community civilian efforts to monitor suspicious activity around neighborhoods more commonly known as neighborhood watches.

 These amounts to hundreds of thousands of human resources that need to be analyzed managed and deployed properly to ensure the nation’s entire critical “brick and mortar” infrastructure is protected adequately and to ensure those individuals are given the proper tools to execute their assigned tasks. The question becomes how do we effectively inventory all of the nation’s human and physical resources in order to facilitate real-time assessments of our ability to protect our critical infrastructure?

 That question presents several seemingly insurmountable problems… Potentially tens of thousands of local level sources of data, thousands of sources for state level data, thousands of sources of federal data in addition to thousands of data sources contained within the firewalls and control of private and publicly owned companies as is the case with major power distribution and communication corporations. To make matters worse there could be hundreds of different software vendors that support the collection, storage, maintenance, and reporting of this information. Each software package has a unique and often proprietary data model and most likely a unique and proprietary meta data schema used to tag the meaning of each database field and values within those fields. Without consistent and easily discernable database schema from each data source, integrating the mass data becomes an impossible task to do in a way that the resulting information can be trusted to make decisions in what amounts to life or death situations.

 One potential solution to this problem is a combination approach which would include a National Master Data Management initiative with the ultimate goal of achieving a level of data quality sufficient enough to accurately derive infrastructure intelligence information to be used to assess the vulnerability any given infrastructure asset has to attack and the potential damage and casualty fallout if an attack were to occur at a location such as any bridges that carry large amount of cargo across the Mississippi. Not just any data will do when allocating resources to secure and protect such important national resources; the data needs to be quality data, data which consists of profiled and standardized vocabulary, the syntax or format of the data, the provenance or source of the data as well as the accuracy and completeness of data. Data that does not meet minimum requirements for the metrics I listed previous can not be used to make decisions that are supported by accurate data.

 My proposal of a National Master Data Management Program would in its most basic description include the creation of a national mashup of these thousands of data sources into one Infrastructure Security Data Warehouse that would be used to govern, analyze and report the readiness of the nation’s infrastructure if an attack were to occur.       

 A mashup is a relatively new term used to describe a database application that pulls information from tens to thousands of data sources and integrates the data together so it can be analyzed over and over again. At the same time would not require the replacement of the thousands of legacy systems the local, state, federal, and private entities use to run their day to day operations.

 I have oversimplified the problem and solution in this explanation, the actual solution is very technical and requires professionals who have managed data integration, data cleansing, data governance and master data management programs in the past. An initiative like we are proposing is not a short term project with a defined beginning and end date, in reality the project began more than 30 years ago with the collection of electronic data. A master data management program for something as critical as the United States infrastructure is not simply a program; it is a complete change in the thinking and the way we interact with the data we spend so much time, money and resources archiving and cataloging. It is a cultural change.

 More technically speaking the idea is to create a classified open technical dictionary which will contain all the terms and definitions needed to describe, at their most atomic level, every single data field we require to generate the information needed to assess and prioritize potential infrastructure targets. We will then tag or associate one of the classified numbers and terms to every single piece of asset data (Master Data) determined necessary to implement the infrastructure protection strategy. We will use a combination of publicly available meta data and newly created meta data to tag all the data elements. This allows us to store and report on them quickly and accurately from a central location with a centralized team, resulting in a true Infrastructure Intelligence Master Data Program. Once all the target data is in a standard format we can further national security goals of increasing the level of quality related to information and reporting, increase the level of interaction between local state and federal authorities and provide threat advisories to citizens and the law enforcement community. As well as providing data that can be used in a variety of training exercises and computer simulations of potential attacks.

 Core to the proposed data quality/master data management program are the business processes used to carry out the data cleansing and enrichment processes. The methods used NEED to be vetted and thoroughly tested with large datasets and algorithms comparable to the complexity of the algorithms that will need to be developed to predict things such as; How many people will be effected if a particular power substation is physically or cyber attacked?, Which railroad infrastructure is key to keeping the flow of military supplies to ports for distribution to our forces worldwide?, Once a threat is identified how long will it take to mobilize enough manpower to properly defend any given location?, Which power generation facilities are large enough that if damaged would results in blackout for 5% or more of the population?, Which open air water sources, if poisoned will cause the most death?. The processes and methods used to execute the project in its initial phase and on an ongoing basis are just as if not MORE important than the quality of the initial data sets, they are mutually independent on each other.

 The implementation of an ongoing and permanent national Infrastructure Master Data Management initiative would be of great benefit and could be essential to the long term growth and protection of the United States of America.

Hey baby, what is your material type and material status . . .

Tuesday, June 15th, 2010

You would never believe the discussions around the “ho-hum” or “don’t sweat the small details” elements of a data cleansing project. Believe it or not, understanding your material type and material status is critical to be able to automate system updates. I have a firm belief that data updates to legacy systems should be completed as a night job or direct feed based a series of programmed templates. In one recent example we created an Oracle system update process for a new item referencing a material type template or another update process if the item is already set up for another location of use but is new to the requesting location, this is sometimes referred to as a location setup or purchasing organization update. You can start to imagine the amount pre-planning work and data mapping that is required for a data cleansing program.

The first fundamental rule is that the customer business doesn’t stop. For all you data purists out there that believe that one day a switch to turn on the cleansed database is in the near future, please include me, I would like to see it. Most master data management projects included years and years of legacy data; therefore there is an acceptance to draw a line in the database by last used date. When I design a data cleansing project, I will have a new item setup process referenced to legacy items, this way the client business continues and as the new items are analyzed and setup, we can reference and update the legacy item information. Independently, we will always have the legacy data cleansing parallel the new set up process.

As the data cleansing project is designed, let’s start to explore the data elements and classifications. Every client will have their material types and material status set up but generally during the data / systems assessment there should be a thorough review of industry standards vs. company processes. I find that our clients appreciate the opportunity to bench mark their processes and data structure elements such as material types and status.  We will start with material type and material status.

Material Type

Material types can be as simple as goods and services or as complicated as service, critical spare, spare part, commodity, generic, blueprint, etc. The material type is a critical element to classify which template is used for setup in the downstream legacy systems with an inventory stocking strategy applied.

Obviously a service can be standardized by the class type to describe the service where a cost for the service can be standardized. The definition of the service is described by the properties, for instance a service class of CLEANING, OFFICE can be set up with descriptive elements such as 10,000 square feet, light cleansing (dusting / vacuuming), etc. From a purchasing perspective, the buyer can run the reports globally to determine how much is spent for office cleaning then evaluate the costs and utilize best practice sourcing strategies and other global supply chain processes to lower costs. The purpose of the standard naming conventions of classes and property are to provide enough standardize information to provide the ability to compare and cost services or products.

If a critical spare is being set up for sourcing and inventory, then the part has been evaluated by maintenance or engineering and determined that the spare is critical for production uptime. An inventory plan is developed for stocking the critical spare including an initial buy quantity, plan for stores (inventory) setup of item’s unit of measure (each, assembly, package, etc.), min / max, reorder quality, stocking location, etc.

Material Status

In addition to applying a “material type” to the item records, due to the longevity of materials used in the manufacturing operation, a material status should be utilized as a long term data maintenance process. In dealing with component manufacturers and suppliers, a component may be active from a plant use perspective; however the component manufacturer no longer manufactures the item. How is that possible? A piece of equipment can have a 10 year or a 50 year life span, to maintain a piece of equipment, a list of recommended spare parts is identified and set up for equipment maintenance. If the spare part component is obsolete by the manufacturer but the piece of equipment is still in use on the production line, the material status would be “obsolete active”. A different buy / stock strategy would be implemented, such as purchase all available stock from the manufacturer or another alternative is to source with unconventional methods such as through eBay or maybe contract the item to be built by a local shop.

Typical material statuses that I have experienced are active, inactive item referenced to an active item, obsolete active, obsolete inactive (typically the status to start the disposal process) and archive. The archive status is a classification used by the analysts to allow the viewing of the item information but is not visible to the client or the item record is not exported to the client systems.

I would appreciate any input or better yet a discussion of the different material types and material status used in Product Information Management (PIM) or Master Data Management (MDM). As an industry we inherited material types and material status used in a purchasing system or maintenance systems designed to meet business function but not from the data quality or master data management perspective. What are the proper data requirements for a material type or material status? The MDM or PIM software companies and data quality consultants need to provide input from the data management perspective to provide long term data management functionality.

View Jackie Roberts's profile on LinkedIn

Enterprise Information Management 2010 via DAMA Management International

Friday, June 11th, 2010

Presentation proposals are now being accepted for the second Enterprise Information Management Conference scheduled for September 21-23, 2010 at the Hilton Toronto in Toronto, Canada.

Speaker submission guidelines can be found here: Online Proposal Form.

All questions regarding speaking may be directed to Wilshire Conferences at maya@wilshireconferences.com. The deadline for submitting your proposal is June 4, 2010, and we anticipate being able to notify accepted speakers by June 14, 2010.

Thanks and we look forward to hearing from you!