The Business Data Foundation for the Digital Enterprise
Published 30 June 2020
Abstract
Organizations are investing heavily in moving many traditional systems and services to digital; becoming what we call the Digital Enterprise. One of the biggest challenges as organizations face in this journey is having the right data foundation; having consistent, accurate and business-relevant data readily accessible. This report looks at the data foundation starting with business data needs and how to architect, design and build systems to consistently deliver on these accelerating business data requirements.
This report is designed to help organizations build a solid data foundation so critical in building and maintaining your Digital Enterprise. Getting the data right is an area that is often ignored or overlooked, but we believe this is one of the most important areas any large enterprise focus on. There is nothing more foundational than data. Many “data” programs try to collect the horses after they have left the barn with AI/ML/data modeling…but TechVision’s approach puts business data needs first, then focuses on understanding the data, its meaning and business relevance. We then focus on systematically aligning data with business needs.
There are 6 aspects of building this solid and sustainable data foundation we’ll cover in this report. They are:
- Understanding Data
- The Business – Data Connection
- Data Usage
- Data Creation
- Data Within Technology
- Business Data Assets
We’ll conclude with a pragmatic set of short- and long-term enterprise recommendations.
Executive Summary
Data has been one of the most underrepresented areas in IT organizations over the past 30 years. Most enterprise IT executives and Line of Business (LOB) leaders give it lip service, but data ends up simply being a function of collecting and trying to understand the data applications generate. The current theme is to take all of this aggregated data from multiple sources and dump in data lakes or other repositories and subsequently use Artificial Intelligence (AI), Machine Learning (ML) or various tools to extrapolate meaningful, business relevant information from this trough.
This current model we find in many large enterprises is a misguided and reactive approach that is actually backwards; we should start with the business data needs and build from there. What we’ll describe in this document and follow-on research is a proactive approach to start with business needs, analyze existing and required data and design a model for generating, aggregating and being able to access data that has been architected around the needs of the business.
TechVision Research describes a model in this report that starts with the business and how data supports business goals and builds from there. We describe a 6-step process that has successfully been implemented over the past 30 years in many large organizations in industries ranging from transportation to retail to financial services to government agencies.
This process starts with understanding the nature of data; we often conflate data with the technology that generates data, but data is its own category. Data simply captures and represents characteristics and relationships in the “real world”. This data also needs to be understood throughout its lifecycle to be useful. This data also needs to be understood in the business context to form what we refer to as the “business-to-data connection”. The right context requires that we first understand the business and then create the data that correctly represents that business, which naturally includes its proper context. We also need to maintain the data’s context and meaning through its computer processing, storage, and usage.
Next, organizations need to understand how this business data will be used; the same data may be repurposed in different ways by various business functions. A holistic business view (Business Blueprint) is used to document the business-data connection facilitates as well as to provide an understanding of how all the various usage views integrate into this holistic view. After understanding usage, organizations will need to define processes and structure for creating new data in ways that maximize business value.
The model we describe here has been successfully implemented in many large, complex organizations. As enterprises move to more and more digital engagement and as more and more data is generated this is the time most organizations should take a hard look at their data foundation.
Introduction and Background
According to the dictionary, fundamentals are: “central or primary rules and principles on which something is based,” thus assumed to be true. Fundamentals do not change. For example, math fundamentals include the basics of addition, subtraction, multiplication, and division. A lack of knowledge or understanding of one or more of these building blocks makes it is almost impossible to perform algebra or understand physics. In any field, the fundamentals must be mastered first and the same is true for data. We need to start this process by understanding that data is a field to be mastered.
The fundamentals of data are the underlying data principles, or beliefs that are often taken for granted and rarely questioned. These principles and beliefs form the foundation for as to how we view and approach data. As Leonardo da Vinci stated, “Practice should always be based on sound knowledge.” In much the same way all of our data practices depend on solid data fundamentals. When the fundamentals are broken everything that depends on them is compromised. This is particularly critical for business data in that the Digital Enterprise is heavily dependent on having a solid, secure and sustainable data foundation.
Based on TechVision’s consulting experience and primary research, we put the “stake in the ground” that many-to-most large organizations have broken data fundamentals or at least have major missing pieces. The six critical fundamentals that are often broken and should be directly addressed in most organizations are:
- Understanding Data
- The Business – Data Connection
- Data Usage
- Data Creation
- Data Within Technology
- Business Data Assets
We’ll now look at each of these areas, assess the current state and describe our recommended going-forward approach to getting these data fundamentals right. We’ll start with the basics of understanding data.
Understanding Data
Data is a field of study of its own, and much like any other field, requires a basic understanding providing a solid foundation that all the various data practices and data usage depend on. For the Field of Data, the basics start with the definition of data. In exploring the concepts and the definition of data as generally embraced in the data industry (and/or within the broader technology space), TechVision believes our collective understanding of data is incomplete and therefore misleading and misaligned with business goals.
Let’s start at the beginning; data existed long before computers. What computer technology brought to the table was to facilitate and automate the collection, processing, and utilization of data. Both the scale and the importance of data continue to grow exponentially as technology evolves and organizations our data needs accelerate as organizations move towards becoming Digital Enterprises.
Figure 1: Recording Data is not New
The “data industry” focuses on data from a technology perspective at its atomic level, and defines data as raw, unorganized, meaningless facts (alphabets, numbers, or symbols.) Data at this level easily translates into a set of 0s and 1s (binary) that computers understand and process. This atomic-level definition focuses on the individual pieces of raw data as seen from a technologist’s viewpoint. This is the data that is used within applications, transmitted across networks, stored in devices, and processed by computers…simply ones and zeros. But this technology view isn’t enough.
While the technology-based definition of data is accurate from an atomic-level viewpoint, it is void of any context or meaning; just a string of numbers. But context is needed to achieve a real understanding of the data—this real-world perspective of what that data means. Data is and always has been a way to represent the real world—its things, events, and relationships. It is important that we start this journey with this concept in mind.
As we view data as representing the real world we can now think of data as including the interrelationships of things and events. In the real world, almost everything is interconnected and interrelated, thus relevant to its surroundings and its relationship to its surroundings. A fundamental theory of physics actually supports this concept: nothing exists on its own in the real world; everything derives meaning from its relationships. Removing or changing relationships, in turn, changes the meaning. We are seeing this concept on display as organizations identify, secure, integrate and orchestrate IoT devices and sensors; the context and relationships are critical and representing these relationships are critical to understanding the meaning. For example, in the IoT world context such as who owns an automobile, who is driving the car, the manufacturer, the location…may all be factors in understanding the generation and use of data.
Since the real world is highly complex and relational, capturing the context (including relationships) of the real-world things and events is essential. Along with relationships, time is also a key dimension of the real world is an aspect of the data that represents it. We’ve covered the value of contextual data before, but much of our previous research was more specific to Identity and Access Management and Security.
Data as a representation of the real world has always facilitated human understanding and knowledge. We do this by recording the past to analyze and understand history as input towards predicting the future. This is what history is all about. How we use data to represent the real world provides information and thus knowledge about the real world. The validity and usefulness of the data is dependent on the accurate alignment of the data to the real world it is representing. This point of data/real-world synergy is critical to the understanding of data.
Technologists often view data as random, unorganized facts throughout its lifespan. We believe this technology-based definition is misleading as it is missing context. Consider a simple scenario—a raw piece of data such as “03181956”, inside a computer, is a random set of digits. It could be anything: a date, a dollar amount, or a count. When we give this random set of digits the context of a birthdate for a specific customer, it becomes John Smith’s birthday: 03/18/1956. The raw data deep within technology is unusable to humans without context.
Data as a representation of the real world is naturally in context–the many related data facts about a real-world item or event. Using our previous example of a customer birthday; the random, raw data (03181956) must include the context of customer identification, name, birthdate, and other data facts when it is created. This scenario is true for all data created by an organization. So why would we view data that represents the real world as meaningless, random, unorganized, individual raw facts? That is how technologists often view data and it needs to be combined with context to have real meaning.
The current state in many organizations is that we store much of our structured data in “named” fields within structured data systems. The problem is that we often do not use these structured systems as they were intended. A deep analysis of this lack of context is a discussion for another article, but the point is that structured data often lacks any business meaning. This means that when a technologist attempts to define a field, the definition may not a business-based definition that, in turn, leads to meaningless and/or inaccurate labels.
The overuse and multiple uses of the term data likely contributes to our limited understanding of the definition of data.Data is one of those terms used for many things and used at different levels. The term data can be a verb or noun in a singular or plural form, at many levels of reference from the atomic level to a world level. Adding to the confusion of the term information versus data, and it is clear we need to work on our basic terminology. Perhaps the ISO 860 standard can help.
We may be using the incorrect term for our real-world data, or we may need two different terms for data, one for the technology view and one for the real-world view. Regardless of the terminology and its correctness, the concept here is essential to understand—we must create and capture the data that represent the real world in context, understand data in context, managed data in context, govern data in context, and used data in context. In other words, throughout its life data needs to be in context.
Unfortunately, the avalanche of technology buried the real-world understanding of data throughout the history of computerization, where the focus has been on the automation of processes and the atomic-level view of data. The technology-based definition and atomic-level view of data influence the way we understand and treat real-world data; as disorganized, random facts, rather than as an important business asset. Treating real-world data as meaningless, unorganized, raw facts causes many of our data issues. The technology-based definition of data undermines the importance of the business context and the meaning of data. Data without its business meaning or context results in failed analytics and many of our data literacy challenges.
Data originates from the real world and then is used in the real world—but between its origination and usage, this real-world data is transformed into and out of the technology (atomic data) world. To summarize data is not a technology component; it is a representation of the real world and until it is viewed in this way data will continue to be “broken”. Understanding and treating data as this representation of the real world, outside the confines of computer technology, is core to fixing the fundamentals of data and “getting the data right.”
The Business to Data Connection
Before computers, the business and its data were inseparable—businesses created data, collected data, processed data, stored data, and used data—all on paper in filing cabinets. It was physical and real and directly represented business needs. The same person or department often created the data, used the data, and oversaw the data. Computerization initiated the separation of the data from the business and created an environment where the business no longer controls the data. The separation of the data from the real world grew slowly, unnoticed by most. Technologists, rather than Line of Business (LOB) staff are now largely responsible for all the data functions, including data creation and management. The separation has grown so large that the connection between the business and its data is broken. This connection is foundational to data, so we must restore the connection between the business and its data to move this forward.
The fact that data is a representation of the real-world business organization requires a clear understanding of the connection of the data to what the data represents (its meaning in context) in the real-world business organization. From a business perspective, data is an effective way to represent the real-world business organization—its things, events, and relationships, tangible or intangible. Data captures the characteristics considered important as it “stands in for” whatever it represents. In a way data is the business. Data serves to capture the essence and account for the known and inferred properties of the real-world business organization.
The human mind does not have the capacity to capture, hold and process all aspects of reality. This is especially true in organizations encompassing a large number of people, places, things, and events. A representation presents a much simpler view; one that is much easier to grasp and use for operations, management, analysis, reporting, and planning. Representations do not include every aspect of the real world; that would be impossible. If they did, it would no longer be a representation, but the real world itself.
Business organizations are holistic systems much like living organisms; made up of things, events, behaviors, functions (marketing, finance, etc.), people, and ideas. There is often no design or logic to the arrangement of the parts, except for the relationships they have. These relationships are what make an organization a system and establish their boundaries. Organizations operate from their dynamic relationships and interactions. These relationships are critical to an organization’s effective operations, performance, and survival—where the change in one thing can affect many other things. With analytics relationships are essential where we ask questions based on the result of a change, such as adding a new product or market. People use the answers to decide the next steps and optimal direction. Useful analytics requires an accurate business understanding of the data based on a holistic view of the business.
The definition of data as raw, unorganized, meaningless facts is so well ingrained within the data industry today that there is a challenge in changing this perception. Especially when the technologist is responsible for the data and is “programmed” to look at from this analytical perspective. We often use Artificial Intelligence (AI) and Machine Learning (ML) as a “crutch”; basically, collecting raw data to look for patterns and then sort it out later.
Establishing a business-to-data connection requires a change in the way we understand and treat data. We first need to understand data as a representation of the real world, and then gain a comprehensive business understanding of the data that represents our real-world business. This documented understanding we call a “Business Blueprint” as it bridges the gap between the business and its data, which is the key to turning data into a real business asset.
This process starts with the understanding that data is a representation of the business organization—the business then can use this data to better understand itself. However, the data must accurately represent the business to facilitate an accurate understanding of the business. Data consumable by people must be in context to be accurate, and not just any context, but the right context. The right context requires that we first understand the business and then create the data that correctly represents that business, which naturally includes its proper context. We also need to maintain the data’s context and meaning through its computer processing, storage, and usage.
The validity and usefulness of our data depend on its accurate alignment to the real-world business it represents. Consider, how does one know if the data is “right” if there is little or no understanding of what the data represents in the real-world organization? The accurate alignment of the data to the real world is the foundation of the business-data connection, and the business-data connection is the foundation for everything data. It is critical that we maintain the business-data connection throughout the data’s’ lifespan. An accurate connection between real-world business and its data is a fundamental principle of data.
Data Usage
Organizations use their data for many things; from operations and management to recording history, analyzing the past, reporting, decision-making, and predicting the future. Software applications automate most of these functions. Data is often closely associated with (collected and defined for) a specific software application: that is a usage view of that data. Understanding what data represents in the real world and its subsequent usage are not the same. This is an important concept to understand; there may be many use cases (or usage views) for information about a data subject, for example age/birthdate to determine access. Consider that the business organization can use any of its physical things in multiple ways and within many business functions or applications. Therefore, a single usage of data only gives a partial view of what that data represents in the real-world business organization.
Each function in an organization has its unique view of data. This is a fundamental concept within Object Oriented design for example. One usage view of data may be very different from other usage views of the exact same data. Different views of the same data are why many believe there are multiple versions of the truth when it comes to data, and from that standpoint, there are. However, when it comes to analytics and decision-making requiring an accurate understanding of the ramifications across an organization, it is essential to have the holistic, interconnected view of what the data represents in the real-world business. The holistic, interconnected view of data is missing in most analytics systems today, which handicaps these systems, and this, in turn causes many of issues enterprises face with analytics. AI/ML won’t replace baking in this holistic view from the onset.
Data models and their resulting data stores are usage perspectives of data for the specific applications they support. Although an application needs a usage view of the data, a data model, and a data store; it is critical to understand that a usage view of data is not a complete view of that data or the real-world thing it represents. Multiple versions of data, often justified for different usages, result in redundant and often disparate data. This is especially true in the analytic space where it is difficult to put Humpty Dumpty back together again (even with AI/ML) if there are missing or overlapping pieces.
Attempting to understand how the data represents the real-world business organization using one or more usages views is like trying to understand a person by the roles they play. A person is much more than the sum of everything that they do. There is a difference between what something is and what something does. Fundamental to data is to understand the difference between what the data represents in the real world versus how the data is used in the real world.
A holistic business view (Business Blueprint) to document the business-data connection facilitates the understanding of how all the necessary usage views integrate into the holistic view. TechVision provides workshops, consulting and other services to help organizations figure out this Business Blueprint. It provides a map for the consistent integration of data. This holistic business map that documents the business-data connection is critical for proper analytic functionality, consistent KPIs, accurate reporting, information security, confidence in accounting numbers, and accurate projections/forecasts. Along with these improvements, a holistic business map of the business-data connection can also support strategic data initiatives—artificial intelligence, data monetization, and the emerging Digital Enterprise.
Once an organization has a solid understanding of our business data view, we’ll next examine the data collection process in the context of better understanding the data we want to collect.
Data Creation
Thus far we’ve taken a hard look at what data is, how it supports the business and represents the real world. We’ve also separated data from how the data is used. This was centered on understanding and representing the data we have; now we’ll consider how to create new or acquire data in a way that supports our business data goals.
We’ll again start with the fundamentals; data existed long before computers. Prior to computers, business stakeholders manually created data using paper-based files and records. With computerization we automated data capture and the generation of this captured data into electronic formats. Regardless of the media used, proper data creation/acquisition is foundational to ensuring the meaning, accuracy, and usability of our data, throughout its lifespan.
Data creation is a fundamental process that brings data into existence. Since data creation is where data begins, it lays the foundational for everything data. Again, remember that Data is a representation of the real world and by understanding the fundamental principles of the real world, we can better identify the data representing it. This concept is core to assuring an accurate and meaningful data representation. It makes perfect sense that the same things that give context and meaning in the real world give context and meaning to the data that represents the real world. Location, identity, time, and relationships are fundamental principles of the real world that can be used to provide context and meaning for data.
The real world or reality encompasses everything that exists. Matter, space, and time make up the basic framework of reality. It doesn’t get much broader than that. The physical things (matter) that exist in space (location) constitute most of what we can observe. Real-world things have a location defined by the three spatial dimensions of reality; height, depth, and width. A location gives context or meaning to things in the real world. Thus, location is a core principle to data. As long as we continue to keep in mind that data is simply being used to represent the real world, these factors fall into alignment.
Real world things, including concepts and events, have a unique identity that gives them meaning or context. A verbose form of identity is a description. Closely tied to identity are properties that help define and describe the things and events in the real world, thus enhance their identity or uniqueness. We generally call these descriptions attributes. However, these properties can change over time or space (location). Real-world identity gives context to data; therefore, identity is a core principle to data.
The things and events in the real world exist within the dimension of time—the fourth dimension of reality. Due to the nature of time, nothing stays the same—everything in the real-world changes as time advances. Real-world events happen at a moment in time. Time differentiates the various states of the real world. Therefore, time is also a critical point of reference for understanding changes in an organization.
Data representing the real world is a “snapshot” of facts at a point in time. Data is relevant to time, regardless of whether we record time with the data or not. However, when we do not include a relevant time, we can compromise the data because time affects the data’s meaning, and, therefore, it’s validity. This scenario is especially relevant for analytics. Time is fundamental to reality, therefore a fundamental principle of data.
Relationships are fundamental to reality and to the physics that defines reality. The real world is highly complex and relational; the meaning of things in the real world depends on their relationships. There are often multiple relationships between things in the real world along with different types of relationships. As David Bohm, a quantum physicist states: “Everything is connected to everything else.” Everything that exists in the real world derives meaning from its relationships where things have meaning relative to other things. Thus, a change in anything affects many other things.
Data derives much of its meaning from its relationships to other data. Relationships are fundamental to data and information. The elimination of or change in a relationship changes the meaning of the data. All relationships give data meaning and context, some more than others. This concept is fundamental for data and analytics. Disjointing data by treating it as a laundry list of individual data items is one of the biggest mistakes we make with data. This practice causes significant harm to our data assets because when data is disjointed, it loses context and, therefore loses fidelity.
Ignoring any one of these fundamental principles of the real world can negatively impact the quality and usefulness of the data. Unfortunately, many organizations are not aware of the impact that these principles have on the data because they fail to understand the concept of data as this representation of the real world. Instead, organizations all-too-often view data inside the technology box as individual raw facts and lose the meaning or context of the data.
When we understand data as a representation of the real world and understand the importance of the fundamental principles of the real world, data creation takes on a whole new meaning. Data creation is critical as everything is built upon this initial process. Thus, it requires the rigor of a formal strategy with policies, standards, processes, and a data representation methodology. The Methodology addresses the fundamental real-world principles for the data to effectively and consistently represent the real world and retain its fidelity.
TechVision’s formal Data Creation Strategy guide and Data Creation Processes provide structure and consistency to data across the organization. The Data Creation Process includes three steps: Conception, Formation, and Capture we’ll describe briefly in this report. Conception focuses on the what and the why of data creation, while the Formation and the Capture focus on the how. We’ll now further describe each step.
Data Creation Step 1: Conception
Conception is the first step in the process, and it starts with gaining a thorough understanding of the real-world business we intend to represent using data. This knowledge is not only foundational to the creation of the data, but also to that data throughout its lifespan. Based on our knowledge of real-world people, places, things, and events as well as our knowledge of the fundamental principles of the real world, we can effectively identify the observable or measurable facts. Conception is a business-based step that defines the optimal facts necessary to represent the business objects of interest. It is important to note that Conception does not address the implementation concerns such as technology limitations.
Human perception is an important consideration in the Data Creation Process. Humans create data; even the data automatically generated by computers had a person at some level design the software that created that data. Consistently capturing reality is never an easy task due to the nature of human perception. Every person’s perception is unique. Perception is an individual’s sensory experience or interpretation of reality. Many say: “perception is everything.” There is an external reality and there are many internal realities. They are rarely, if ever, the same. A data creation strategy does not eliminate all the effects of perception, but it seeks to mitigate its adverse effects.
Data Creation Step 2: Formation
Formation is the second step in the data creation process and defines how the facts identified during Conception are transformed into data. In this transformative step, we analyze the feasibility of creating the optimal data facts. Often the cost of transformation may not justify the creation of some of the desired data facts. We also may not be able to create some of the desired data facts due to ownership, logistic, privacy or capability issues. That said, our goal in this phase is to achieve “optimal data”. Any deviation from optimal data requires a mitigation plan for the loss of data fidelity due to any compromises.
Note that data formation is the second step in the process by which we take the data we want to create and optimally create it. In many cases, there is a large inventory of existing and new data (big data) that isn’t being created from scratch but needs to be properly captured. We are seeing a lot of this for example with data generated by IoT devices. Data capture is the next step in the Data Creation process and covered next.
Data Creation Step 3: Data Capture
The third step of the data creation process is Data Capture. This is often the only step practiced today, but it is informally handled in many diverse ways. The Data Capture process involves physically obtaining the identified data or the adopted data (existing data sourced from legacy systems or purchased). Although we can capture and record data manually or electronically, most data capture is electronic using a variety of technology-based techniques.
Organizations with a technology view of data naturally begin with data capture. They are often unaware of the need for a formal data creation strategy and process. These organizations typically overlook the most critical step that begins with the real-world business. Instead, they capture data on an ad-hoc as-needed basis as each application (data usage) captures data from their limited usage view of that data. This approach results in a multitude of “methods” which equates to no method.
Business data requirements are rare. Most start with a laundry list of raw data fields from existing systems. Even when an organization uses a “data model,” it is typically a diagram of the physical implemented data structure obtained by reverse engineering a data store; it is not a data model that defines and describes what the data means. This data model that defines and describes needed data is a major missing link most enterprises should be addressing. The typical ad-hoc approach to data capture from multiple application sources results in data redundancy, disparity, failed integration, and the multitude of data analytics frustrations.
Errors in capturing reality have ramifications throughout data’s lifespan. The errors result in an erroneous picture of the past and present reality that leads to a distorted prediction of the future. Capturing an accurate picture of reality is critical to the usefulness of data. Data incorrectly created (inaccurate data) is more of a detriment to the information user than not having the data.
Clearly, data accuracy depends upon creating the right business data, at the right level, with the proper context. A holistic understanding of the real world enables us to identify and capture the optimal data facts, including the real-world context (the fundamental principles). Data created using a formal data creation strategy in a complete and flexible form supports a variety of usages and will be more sustainable for future use cases. The formal Data Creation Strategy and the Data Creation Process not only solve many of our data challenges but is much more cost effective because it assures accurate data and minimizes redundant data and its cost.
Data creation is fundamental to everything data and is much too important to leave to chance. Data created using a formal data creation strategy results in much richer data sets that expand an organization’s horizon for data usage—the benefits far outweigh the effort. With digital transformation on our doorstep, organizations need a formal data creation strategy and process more than ever.
Data in Technology
Before computers, the business and its data were inseparable. Businesspeople created, updated, and stored their data on paper in filing cabinets. The business oversaw their data; its storage, movement, processing, and usage. With computerization, the business no longer controls its data. The technologist is responsible for the data during its entire lifespan. This change took place slowly over decades; unnoticed.
Remember that, the purpose of data is to serve as a representation of the real-world business organization that the business uses to operate, manage, analyze, and predict. Because technology facilitates the storage and processing of the data, the real-world data moves in and out of these technology-based silos. Therefore, it is critical that the data flows seamlessly and remains intact between the real-world business and the technology world and vice versa.
The challenge is that the real world is far different from how technology attempts to represent this reality. The real world is infinitely complex, while computer systems are finite. Humans and computers process data and information very differently. Humans process information through their senses, by induction (deriving general principles), and by deduction (the process of reasoning). A computer is an advanced machine created by humans as a tool to perform “thinking” work in a predetermined (programmed by humans) logical manner. As TechVision described in our AI level set report a few years ago, even third wave AI (cognitive/explainable) that can begin to mimic some aspects of human thinking is in its infancy. It is essential to understand the limitations of computer processing (thinking) and storage.
The discontinuity between technology and humans make it very difficult map the complexities of the real world when we transform the real-world data into and out of the technology platforms and applications. Getting this transition consistently right is fundamental to data. We need to recognize, understand, plan for, and manage these differences, to maintain the fidelity of real world in its data.
Unfortunately, we distort and disjoin our real-world data due to the limitations imposed by processing, I/O, and storage technology. Rather than plan for how the data can retain its real-world context and fidelity inside the technology world, we often limit the data to fit the technology limitations. Thus, data becomes technology-driven—the tail is wagging the dog. With technology-driven data, there is always a loss in context, thus fidelity, which results in a variety of our data challenges. We expect the business users and data scientists to magically make sense of this disjointed, out-of-context technology-driven data when we could have easily prevented it in the first place. Remember Machine Learning and Artificial Intelligence needs to be trained while discovering patterns. The more context and fidelity we start with, the more accurate and valuable AI and ML will be.
The transition flow begins in real-world business with the creation of the data. We translate the real-world data representation into the technology world. Eventually, the computer turns data into the machine level (0s and 1s) so that it can move, store, and process the data. The computer reassembles the zeroes and ones back into the data that we use to gain information and knowledge about the real world.
Human perception also plays an essential role in this transition flow and thus adds to the risk of loss of data fidelity and usability. No two people’s perception of reality is identical. We cannot change human perception, but we can influence it with definitions, standards, processes, and methodologies. Therefore, any lack of documentation, processes, and transition management guarantees a loss of data fidelity, causing many of our data challenges.
Retaining the context and fidelity of the data between the real world and technology requires a transition methodology: a systemic approach for how real-world data transforms into the technology world seamlessly. This methodology includes policies, standards, and processes. We need to understand, purposefully design, accurately document, and proactively manage the translation guided by a transition methodology. Our Methodology helps to maintain data fidelity throughout its storage, movement, and processing, as well as minimize the risk of data issues due to human perception. TechVision Consulting will help our clients work through this entire process or specific aspects as needed.
We need to both view and treat data independent of the technology—as its own entity, rather than as a part of technology. In the same manner, we need to optimize the technology rather than restrained and limited it by the data fidelity requirements. In short, we need to leverage both areas for what they do best.
Organizations that view data from a technology-first perspective fail to recognize the importance of the transition of real-world data into the technology world. Remember data is a representation of the real-world business and technology is used to facilitate the storage, movement, and processing of data for the business to utilize the data. The critical part of this process is the management of the transition into and out of the technology world (systems, storage, applications…) as this is necessary to maintain the fidelity and the business context required for optimal usage of that data. A well-managed transition provides accurate, reliable, identifiable, and usable data critical for all the data functions, especially analytics.
Data as a Business Asset
“Data is a business asset”—this statement has been repeated across the industry for many years…but actions speak louder than words and most organizations don’t treat data as a business asset. This is based on the investments we see, the lack of focus in developing a data strategy, data governance, and data management. So why do we continue to say, “data is a business asset,” yet fail to treat data as a business asset?
First, let’s define what we mean by a business asset. It is any item of economic value that an organization can identify, measure, manage and track. Assets can be physical such as cash, inventory, or property; or intangible such as patents, copyrights, or trademarks. Data has a substantial economic value to an organization because its protection and replacement (if destroyed, lost, or stolen) represents a significant cost to the organization.
Data and its resulting information are valuable business assets regardless of whether we recognize its value or not. Determining the value of data is essential to justify and manage its costs and how we protect these assets. All assets do not have the same value. Knowing its value assures that data has the proper level of management given to any other asset. The higher the value, the higher the level of management. Thus, the valuation of data is fundamental to the investment in data.
The definition of an asset includes the ability to measure and track that asset. Tracking an asset is part of managing that asset. We have the capability to measure and track data. Unfortunately, most organizations do not measure or track their data. To measure something, you must identify it, know what it is, and where it is. After all, you cannot manage what you do not know. We do not (in most cases) know our data. And remember, as we move to become Digital Enterprises with greatly expanded data assets, this challenge will only get worse.
To properly manage data assets, we must first identify and define them. If we create data as an accurate representation of the business (as we recommend), it naturally has identity and business meaning. Our biggest challenge to identifying data is due to the missing or broken connection between the business and the data described earlier in this report.
Interestingly, data is a reusable asset—data is not consumed when used and can grow as it moves through an organization. The reusability of data makes data different than typical business assets—in a positive way. It also means that special protection and security needs to be considered given the reusability of data. The value of data increases the more we use it. At the same time, the overall costs (creation, storage, maintenance) decrease each time we reuse data because the cost is expended at its first use. So, we want to reuse data where appropriate.
Although the decreasing cost of data with increasing usage is a significant benefit, most organizations rarely reuse their data. For data to be reusable, it must incorporate a holistic business view including all the necessary context so that the data supports multiple usages. With holistic data, its initial use requires additional analysis, architecture, and design.
Unfortunately, most organizations do not justify performing these strategic tasks. Instead, they have shortsighted views, always taking short cuts, all in the name of the speed of delivery and immediate cost-savings. Inevitably they replicate or recreate the data for each additional usage, leading to costly redundant silos of data. This is the same concept we’ve written about over the years in the Identity Management space; silos that make the services hard to manage, govern or secure. This practice eliminates many of the benefits from the inherent reusability of the data assets. Ironically, the replication of the data has a much higher overall cost than creating holistic data for multiple purposes. We see this all the time in the IAM area as we try to determine which version of the replicated or synchronized data is the “truth”.
If we truly understood and managed data as an important business asset, the replication of data and its additional cost would never take place. We would proactively manage the business data assets with accurate tracking of their value and costs. Unfortunately, most organizations do not view data in a business context, thus do not manage data as a business asset—a broken fundamental of data in many enterprises as we’ve described throughout this report.
The widely accepted technology view of data has made it difficult, if not impossible for most organizations to accept the concept of data as a business asset. The technology-based definition of data is correct from a technology viewpoint, so few question that belief. This is further supported when the technologist, rather than the business leaders are responsible for the data. This must change.
As we’ve said throughout this report, Data is a representation of the real-world organization; thus, a valuable business asset. Data as a business asset is a fundamental data principle that needs to be front and center in organizations today. If we truly understood data in a business context, we would treat data as a business asset—actions always reveal belief.
Data recognized in a business context as an important business asset is naturally accounted for, valued, owned, governed, and managed. Attempting any of the data asset management functions without understanding and treating data as a representation of the real-world business is doomed for failure. Organizations must embrace the broader understanding of data, the real-world view of data if they are ever going to treat data as an important business asset that it is. As this realization occurs organizations will begin to take the necessary steps towards viewing and using data as the key business asset it should be.
Conclusions/Recommendations
The basic data fundamentals and the right mindset must be fixed before we can consistently and efficiently get the data right, manage it as a business asset, and utilize its full potential. I had a basketball coach who would say “if we cannot get the basics right (dribble, pass, and shoot) then we cannot possibly pull off the fancy plays it takes to win the games.” This same principle holds true for data. If we do not fix the fundamentals, we will continue to waste time, money, and resources chasing each new data industry trend, only to fall short. This report is a starting point for understanding this and future reports and TechVision consulting support is available to help you through this journey.
Clearly, the fundamentals of data are broken in most organizations. Fixing the broken fundaments is the key to addressing our core data problems causing our data challenges and stopping the data deterioration. That said, organizations must be willing to do what it takes to fix the fundamentals. The effort required and costs are minimal relative to the opportunity cost of continuing on the existing technology first path. The more significant challenge we face is changing our thinking and approach to data. As Albert Einstein said it so well: “We cannot solve our problems with the same thinking we used when we created them.” We have created our data mess, and we need to change our thinking and approach to fit it.
We can no longer view data using only a technology vantage point; we need to see data as the representation of the real-world things and events that make up our business organizations. It seems simple, but this change starts by treating data as a business asset. We need to holistically understand the real-world business with all its interconnections and dependencies that the data needs to represent from a strategic perspective. The validity and usefulness of data directly depends on its accurate alignment to the real world it represents.
Fixing the fundamentals of data prevents and minimizes the data issues. This approach offers a much richer data set to take advantage of the full potential of our data assets required for risk avoidance, predictive analytics, big data, artificial intelligence, digital transformation, and data monetization. Information is powerful, but only if the data is right. We must get the fundamentals right first.
We’ll close with TechVision’s four recommended actions an organization can take to address the broken data fundamentals described in this article. All four areas focus on data as a business asset. We will cover each of these area in greater detail in future reports. The starting point for moving the needle towards a strong, sustainable data program starts with these four steps:
Establish a Business Data Architecture: A Business Data Architecture defines the connection between the data and the business—missing from most organizations. The connection between the data and the real-world organization is foundational to everything data. The Business Data Blueprint (aka Enterprise Data Model) documents the Business Data Architecture.
Create a Business Data Strategy Framework: A Business Data Strategy Framework establishes the “playbook” (the strategy, principles, policies, and rules) for the business data assets. The framework covers the data strategy components necessary to establish a solid foundation for all the data practices.
Establish a Business Data Practice: A Business Data Practice is a formal group within the business organization that oversees the business data assets from their creation and throughout their lifespan. The group supports the information needs of the business to assure high quality, accurate, consistent, reliable data. This practice is foundational for business data asset management.
Build a Business Data Asset Infrastructure: A Business Data Asset Infrastructure is like a technology infrastructure, but its focus is the business data assets rather than the data technology. The business data infrastructure supports everything needed to manage data as a business asset.
The topics we cover here may seem simple and obvious; of course, data is a key business asset and of course it should represent the business data needs of the enterprise—but, in our experience, most organizations fall into the traps we describe throughout this document. And as we continue to extend our digital relationships and our thirst for data, we want to make sure it is the right data, we understand it and can use it effectively, efficiently and securely.
About TechVision
World-class research requires world-class consulting analysts and our team is just that. Gaining value from research also means having access to research. All TechVision Research licenses are enterprise licenses; this means everyone that needs access to content can have access to content. We know major technology initiatives involve many different skillsets across an organization and limiting content to a few can compromise the effectiveness of the team and the success of the initiative. Our research leverages our team’s in-depth knowledge as well as their real-world consulting experience. We combine great analyst skills with real world client experiences to provide a deep and balanced perspective.
TechVision Consulting builds off our research with specific projects to help organizations better understand, architect, select, build, and deploy infrastructure technologies. Our well-rounded experience and strong analytical skills help us separate the “hype” from the reality. This provides organizations with a deeper understanding of the full scope of vendor capabilities, product life cycles, and a basis for making more informed decisions. We also support vendors in areas such as product and strategy reviews and assessments, requirement analysis, target market assessment, technology trend analysis, go-to-market plan assessment, and gap analysis.
TechVision Updates will provide regular updates on the latest developments with respect to the issues addressed in this report.
