When I think of how to best protect the nations infrastructure two broad categories of national security come to mind, physical security and electronic security.
Physical protection of our nation’s infrastructure happens at many levels, namely, National which includes all the United States armed forces protecting our land, sea and airspace borders. At the state level which includes state police forces, state highway patrols, and the National Guard forces. The local protection is handled by each and every local police force in addition local community civilian efforts to monitor suspicious activity around neighborhoods more commonly known as neighborhood watches.
These amounts to hundreds of thousands of human resources that need to be analyzed managed and deployed properly to ensure the nation’s entire critical “brick and mortar” infrastructure is protected adequately and to ensure those individuals are given the proper tools to execute their assigned tasks. The question becomes how do we effectively inventory all of the nation’s human and physical resources in order to facilitate real-time assessments of our ability to protect our critical infrastructure?
That question presents several seemingly insurmountable problems… Potentially tens of thousands of local level sources of data, thousands of sources for state level data, thousands of sources of federal data in addition to thousands of data sources contained within the firewalls and control of private and publicly owned companies as is the case with major power distribution and communication corporations. To make matters worse there could be hundreds of different software vendors that support the collection, storage, maintenance, and reporting of this information. Each software package has a unique and often proprietary data model and most likely a unique and proprietary meta data schema used to tag the meaning of each database field and values within those fields. Without consistent and easily discernable database schema from each data source, integrating the mass data becomes an impossible task to do in a way that the resulting information can be trusted to make decisions in what amounts to life or death situations.
One potential solution to this problem is a combination approach which would include a National Master Data Management initiative with the ultimate goal of achieving a level of data quality sufficient enough to accurately derive infrastructure intelligence information to be used to assess the vulnerability any given infrastructure asset has to attack and the potential damage and casualty fallout if an attack were to occur at a location such as any bridges that carry large amount of cargo across the Mississippi. Not just any data will do when allocating resources to secure and protect such important national resources; the data needs to be quality data, data which consists of profiled and standardized vocabulary, the syntax or format of the data, the provenance or source of the data as well as the accuracy and completeness of data. Data that does not meet minimum requirements for the metrics I listed previous can not be used to make decisions that are supported by accurate data.
My proposal of a National Master Data Management Program would in its most basic description include the creation of a national mashup of these thousands of data sources into one Infrastructure Security Data Warehouse that would be used to govern, analyze and report the readiness of the nation’s infrastructure if an attack were to occur.
A mashup is a relatively new term used to describe a database application that pulls information from tens to thousands of data sources and integrates the data together so it can be analyzed over and over again. At the same time would not require the replacement of the thousands of legacy systems the local, state, federal, and private entities use to run their day to day operations.
I have oversimplified the problem and solution in this explanation, the actual solution is very technical and requires professionals who have managed data integration, data cleansing, data governance and master data management programs in the past. An initiative like we are proposing is not a short term project with a defined beginning and end date, in reality the project began more than 30 years ago with the collection of electronic data. A master data management program for something as critical as the United States infrastructure is not simply a program; it is a complete change in the thinking and the way we interact with the data we spend so much time, money and resources archiving and cataloging. It is a cultural change.
More technically speaking the idea is to create a classified open technical dictionary which will contain all the terms and definitions needed to describe, at their most atomic level, every single data field we require to generate the information needed to assess and prioritize potential infrastructure targets. We will then tag or associate one of the classified numbers and terms to every single piece of asset data (Master Data) determined necessary to implement the infrastructure protection strategy. We will use a combination of publicly available meta data and newly created meta data to tag all the data elements. This allows us to store and report on them quickly and accurately from a central location with a centralized team, resulting in a true Infrastructure Intelligence Master Data Program. Once all the target data is in a standard format we can further national security goals of increasing the level of quality related to information and reporting, increase the level of interaction between local state and federal authorities and provide threat advisories to citizens and the law enforcement community. As well as providing data that can be used in a variety of training exercises and computer simulations of potential attacks.
Core to the proposed data quality/master data management program are the business processes used to carry out the data cleansing and enrichment processes. The methods used NEED to be vetted and thoroughly tested with large datasets and algorithms comparable to the complexity of the algorithms that will need to be developed to predict things such as; How many people will be effected if a particular power substation is physically or cyber attacked?, Which railroad infrastructure is key to keeping the flow of military supplies to ports for distribution to our forces worldwide?, Once a threat is identified how long will it take to mobilize enough manpower to properly defend any given location?, Which power generation facilities are large enough that if damaged would results in blackout for 5% or more of the population?, Which open air water sources, if poisoned will cause the most death?. The processes and methods used to execute the project in its initial phase and on an ongoing basis are just as if not MORE important than the quality of the initial data sets, they are mutually independent on each other.
The implementation of an ongoing and permanent national Infrastructure Master Data Management initiative would be of great benefit and could be essential to the long term growth and protection of the United States of America.
