CHAPTER ONE
The Wondrous and Perilous Properties of Data and Information in Organizations
Savvy managers recognize that data and information are strategic assets, possibly even the āultimate proprietary technology,ā in Nicholas Carrās terms.1 After all, they are the only asset that is uniquely your own. No other organization has, or can have, the same data that your organization has. Your data reflect your strategies, customers, products, employeesāeverything that matters in your organization. Competitors can copy your processes, buy the same equipment you do, steal your customers, and entice your employees with better offers. But unless you let them, they canāt have your data and information.
Further, data and information have properties and play role in organizations that have no good analogues in other assets. I cited sharing in the introduction, but data and information are not endowed with some sort of āsharing gene.ā Rather, they can be digitized, copied, and transported at extremely low cost. It is these properties that make them shareable and in turn offer the potential for people and departments to work together for leverage across the organization and to help management get everyone on the same page. Not without a price, however: these same properties also make it possible, in a careless moment, for the bad guys to steal your data without you even knowing it.
Data and information possess many properties that simultaneously promise enormous potential and pose unprecedented challenges. Managers must become adept at finding courses of action that take advantage of the upside and avoid the downside.
Data Multiplies
Ask any executive, in practically any industry or role, how much data his or her department has and you will likely get a short answer: āTons. We have tons of data!ā The reason is quite simple. All activities that use data create more in the process. Taking a customer order creates new data. So does making the next widget. So too financial reporting. Every operation, every decision, every strategic actionāall create more data.
Estimates of the doubling time (the time it takes the quantity of an organizationās data to double) vary from twelve to eighteen months.2 Take a typical doubling time of one year. If the organization currently stores one terabyte, it will store two terabytes a year from now, and four and eight terabytes two and three years from now, respectively. Impressive, but the most compelling statement is from Lou Gerstner, who as chairman of IBM remarked, āInside IBM, we talk about 10 times more connected people, 100 times more network speed, 1,000 times more devices, and a million times more data.ā3
Contributing to the explosion, organizations also acquire vast quantities of data from outside their borders. Most come into the organization through everyday commerce with customers, suppliers, and regulators. Other data are purchased. For example, the financial services industry is heavily dependent on market data provided by Bloomberg, Interactive Data, Morningstar, and others. Marketing departments in many industries purchase demographic data from Acxiom, ChoicePoint, and Dun & Bradstreet.
It is not just the quantities of data that are increasing; so too are the types of data. Only a few years ago, few would have foreseen the penetration of global positioning systems and the data they spawn. Or the human genome. Or radio frequency identification.
The upside is potentially enormous. First and foremost, the data you create are uniquely your own. And you create more every day. More data, in greater variety and detail, means there are more data to mine, more ways to informationalize, and more angles from which to view a problem. More insights into your competitorās intentions and novel twists can help you make better decisions.
The downside is that managers are already buried in data and the problem is growing worse. āMoreā does not imply āmore really good stuff,ā although it usually does imply that the task of sifting through the mass of data to find what you really need is larger and more complex. And āmoreā means that greater management attention is needed regarding both the internal and external sources of all these data. A bit more subtly, āmoreā also means a lot more time and effort figuring out how the data obtained from different sources relate to one another.
To get in front of this onslaught, managers and organizations must determine which data are most important and focus their efforts. I find that it is often easier, although somewhat less effective, to determine which data are never used for anything and to stop collecting them.
Data Are More Complex Than They Appear
Henry Petroski uses paper clips to illustrate the complexity of todayās world. Paper clips are so simple, basic, and inexpensive that they almost escape notice. But so many specialized disciplines must come together to make a paper clip that he doubts any human knows all there is to know about manufacturing one.4 Like a paper clip, a datum may seem simple, basic, and inexpensive, yet it is surprisingly complex. And also like a paper clip, many disciplines must come together if even the simplest datum is to prove useful. These include the following:
Data modeling, which is essentially the process of specifying what you want. Often surprisingly abstract and technical, the process includes defining the entities, attributes, and relationships of interest, assembling these into databases, optimizing performance, and creating needed metadata (see āData and Information Definedā).
Obtaining the data values via the organizationās business processes or suppliers or both.
Data and Information Defined
There are many approaches to defining data. I find the one that best reflects how data are created and used in organizations most effective. In it, ādataā consist of two components: a data model and data values. Data models are abstractions of the real world that define what the data are all about, including specifications of the things of interest (called entities in data lingo), important properties of those things (attributes or fields in the lingo), and relationships between them. As an example, an employee is an entity. His or her employer is interested in all its EMPLOYEES (an example of an entity class in the lingo), and attributes such as NAME, DEPARTMENT, SALARY, and MANAGER. REPORTS TO is an example of a relationship between two entities.
The Internal Revenue Service is also interested in the employee, as a TAXPAYER. It is interested in some of the attributes that interest the employer, but also in many others, such as INTEREST INCOME, that do not. The employee is, quite obviously, the same person, but each organization has distinct needs and interests, so their data models are different.
On its own, a data model is much like a blank meeting calendar: there is a structure, but no content. Data values complete the picture. They are assigned to attributes for specific entities. Thus, a single datum takes the form
< John Doe, DEPARTMENT = Research>
Here, John Doe is the entity, DEPARTMENT is the attribute, and Research is the department to which John Doe is assigned. Data are any collection of datum items of this form.
One last point on data. Clearly data, defined this way, are abstract. We do not actually see or touch them. What we actually see when we work with data are data records, which come in an almost unlimited number of forms: paper, computer applications, tables, charts, and so on. The practical importance is that the same data can be presented in many different ways. Choosing the right way to present the data is often as important as selecting the right data.
There are also many approaches to defining information. I find it most powerful to define information not in terms of what it is, but in terms of what it does. To illustrate, suppose you are playing a game of chance with one die. You bet a dollar and select a number, 1 to 6. A dealer then rolls a die, and you either lose your bet or are paid six dollars. You do not get to see the roll. Your chances of winning are roughly one in six, assuming the game is fair. Now consider the following āinformationā about the next roll:
-
Scenario A: Someone tells you that the die is loaded and the next roll will come up odd. You will pick 1, 3, or 5, and your chances of winning increase to one in three. You have been informed.
Scenario B: Someone tells you that the die is loaded and will come up odd when it will really come up even. You have been misinformed.
Scenario C: Someone tells you that the dealer is spinning a roulette wheel, not rolling a die. Your chances of winning are greatly reduced, but your understanding of the game comes closer to reality. You will almost certainly try to withdraw your bet. You have been informed.
Scenario D: Someone tells you that the die is red. Nothing changes. You have been neither informed nor misinformed.
Information, then, teaches you about the world. Sometimes it does so by reducing your uncertainty about future events, and other times by enlarging your perspective.5
Two subtleties are frequently important. First, although information can indeed be derived from data, it can arise in other ways as well. A train whistle that warns you of an approaching train is certainly informative, but it is hardly a datum. This book uses the catchall term signals to refer to data, train whistles, and anything else that may be informative. Second, information is intensely personal. For example, the person standing next to you, having seen the approaching train, might view the whistle as an annoying blast, not information.
Recording the data, which involves putting them in the desired place and form, such as on a company database, a laptop computer, in a special paper file, or even a scrap of paper. (Note: One could make the case that facts filed in a personās head count as data, but memory is too ephemeral to qualify as a valid data record.)
Selecting, finding, and accessing the right data needed to complete the specific operation or answer the question at hand.
Presenting data in ways that make it easy for customers to understand and use them. Only in this last step do data and information contribute to internal operations and decisions or come to market.
Thus, like paper clips, data are more complex than they may first appear. This complexity means there are many different ways to make your data uniquely your own and, in so doing, make imitation more difficult. The flip side is that many activities have to go right. Any foul-ups and the data may be completely useless. Further, different people, often in different departments, complete the five jobs just listed. Savvy managers realize that excellence in all five departments is necessary but not sufficient. The work must be coordinated toward a common end.
Data and Information Are Subtle and Nuanced and Have Become the Organizationās Lingua Franca
One of the first things schoolchildren learn is that each subject has its own special language. They learn about themes in English class, square roots in math, and atoms in science. Some terms are commonāa final is the same everywhere. It is only natural that the same thing occurs in business as well. Retailers use the terms SKU and UPC to refer to products. A SKU is a āstock-keeping unitā and refers to an internal numbering system, whereas UPC is the āUniversal Product Codeā and refers to an external numbering system set up and administered by the Global Data Synchronization Network. SKUs and UPCs aim to make it easier for merchants and their suppliers to keep track of the items they sell. Thus, i...