CHAPTER 1
Introduction
1.1 What Is a Computer Model?
A simple question that deserves a simple answer. My answer is that a computer model is three essential things: a mathematical model and solution approach, an implementation of this model in software, and the data necessary to instantiate the model so as to make it of practical value.
A computer model starts out as a mathematical idea, a way of framing a real-world problem in terms of mathematical objects that we can manipulate to obtain results. A queueing model is a good example. The dynamics of how queues are formed and evolve in any real-world situation are highly complex. Take, for example, a traffic jam on a highway or the queue at an airport check-in counter. But if we abstract the notion of customers arriving to a system as a stochastic process and similarly abstract the service of these customers as another stochastic process, we can begin to model the dynamics of queues using the language of probability theory. By doing so and by making assumptions about the arrival and service processes, we can obtain useful formulas that predict statistical properties of queues such as the average waiting time and the average number in queue. Or when formulas are not available, we can utilize computer simulation to evaluate the performance of these systems.
Another example is a linear programming model of a production system. Modern manufacturing facilities are highly complex and the effective operation of them requires balancing competing objectivesāfor example, the desire to keep lead times and work-in-process (WIP) low and the desire to make sure that bottleneck work centers are fully utilized. Suppose we wish to provide a plant manager with a tool that will help him decide the right mix of products to produce given a forecast for products and constraints on material and capacity. If we make a series of simplifying assumptions about the production process, we can formulate the decision problem that the plant manager is faced with as a linear programming problem where the objective is, for example, to maximize profit subject to the constraints on material and capacity. The simplifying assumptions require stating the objective and the constraints as linear functions of the decision variables. But once these assumptions are made, the resulting linear programming model can be solved using well-established algorithms, such as the simplex algorithm. The solution to the linear program can then be interpreted as production plan that the plant manager can understand and use.
The mathematical formulation of the problem is one of the key first steps in creating a computer model. But it is not the computer model itself. The mathematical model exists separately from its realization in software. In rare cases, a mathematical model by itself without being implemented in software may provide practical insight in business contexts. For example, the approximate formula for the average waiting time in a single server queue can provide useful managerial insight into operations without resorting to a computer program. But in the vast majority of cases today, the mathematical model only becomes useful when it is transformed into software. That processāof taking an abstract mathematical model and converting it into a useful software programāis the next major step in the construction of a computer model. The challenges of implementing the model in software are not in any way less significant than the mathematical modeling itself. There are a host of issues that the software developer needs to pay attention to, not least the efficiency of the code.
But the computer model is not just the software implementation of a mathematical model. The key final ingredient in a computer model is the data needed to populate it. Without accurate data, the model is just an abstraction. The collection of data is an integral component in bringing a computer model to life and making it truly useful. In this view, data collection is not an ancillary activity to be undertaken after the mathematical model has been formulated and the computer code has been written. It is a core part of the model and needs to be undertaken in concert with the other model-building activities.
1.2 The Process of Computer Modeling
The process of creating a computer model is highly iterative. The basic steps are illustrated in Figure 1.1. The first two stepsāformulate and designāinvolve understanding the business problem that needs to be solved, formulating a mathematical model and solution approach, and designing the software solution. Build/test/collect data involves coding, debugging, verification, and data collection. The validation/analyze step focuses on using the model to analyze the business problem, first to validate the model and then to examine alternative scenarios. The final phase, synthesize, requires interpretation of the model results in the broad business context and layering on additional considerations that the model may not have explicitly addressed.
Figure 1.1 The steps in creating and utilizing a computer model. The process is highly iterative
Formulation and Design
The first step is to be clear on what business questions the model is intended to address. It is not too strong a statement to say that all modeling decisions rest on this understanding, so it is worth spending a good deal of time thinking this through before starting the model building process. It often becomes clear in this process that more than one model may be needed to address all of the questions. That is often a better choice than trying to expand the scope and/or detail of a model so that it can address all questions, and in so doing, creating a model that is too difficult and complex for people to understand and use.
Once the questions the model is intended to address have been articulated, two principal choices need to be made: the scope of the model and the level of detail captured in the model. By scope, I mean the choice of the physical and logical components of the business which we are going to include within the model and those which we are going to exclude. By level of detail, I mean at what level of granularity the physical, logical, and temporal components of the business will be represented in the model. The physical components of the model are those objects that represent physical infrastructure, such as manufacturing locations, distribution centers, or retail outlets. The logical components of the model include operating constraints, decision rules, and behavioral logic. An example of an operating constraint is that a given product must be sourced from a given manufacturing site. An example of a decision rule is: When a piece of equipment in a factory becomes free, what product does it process next? An example of behavioral logic is: When a piece of equipment fails, how long does it fail for, what other equipment does it impact, and what happens when the equipment is repaired? The temporal components of the model include the time granularity represented in the model (e.g. days, weeks, months, or years), the time duration of the model, and the determination of which components of the model will vary with time and which will remain constant.
When describing the scope of a model, it is important to describe not only what will be included and excluded from the model, but the boundary conditions at the interface between what is modeled and what is not. In particular, it is necessary to make clear the events and triggers that occur at the boundaries that serve as sources and sinks for the model. For example, if one were constructing a model of port operations, there are two obvious boundaries to be considered: arrival of tractor trailers to the port to discharge, or load containers and ships that arrive at berths to be loaded or unloaded with containers. Performance of the port will depend significantly on the assumptions made about how ships and tractor trailers arrive.
In reaching a decision on model scope and level of detail, the modeler must grapple with two fundamental tradeoffs that exist in constructing a useful decision model. First, the more detail that is incorporated into a model, the greater its fidelity, but the more data that the model requires as input, the more complicated the physical and logical descriptions of the model become. Second, the greater the scope of the model, the more questions that it can be used to address and the larger the potential number of users, but the more complicated the model becomes to design. In general, the price of greater model accuracy and scope is greater time and cost to develop, use, and maintain. There is a point at which a model becomes so complex and difficult to understand that it loses its usefulness. There is also a point, usually reached much earlier, at which adding detail to a model results in diminishing returns in terms of increased model accuracy. Experienced modelers accept this almost as a truism but there is little beyond anecdote to demonstrate this phenomenon.
The other important point to emphasize is that model design should take data availability into account. Too often, I see models developed where the data required to drive the model is just an afterthought, as though data collection is subordinate to the model development process. In this view, the development of the model can occur before or in isolation to the data collection activities. It is precisely such thinking that leads to so many models never being put into use. Without considering the availability of data necessary to populate a model, a model developer risks developing a model which no one can use because the data necessary to run the model does not exist or requires too much effort or cost to collect. In all of the modeling work I have done, the effort to construct the model has never been the major bottleneck; collecting data almost always is.
The design, development, and testing of the model itself in software, and the collection, cleaning, analysis, and processing of input data needed to validate and run the model are separate but closely intertwined and must be performed in concert. Model development and data collection should ideally follow a process of triangulation, where the model developer is made aware at the outset of data that is easily accessible, data that is more difficult but possible to gather, and data that is going to be impossible to collect, given time and budget constraints. This information should be used in designing the model. The model developer may realize that certain data that is critical to the model may not exist and may be difficult or impossible to gather and these constraints should be addressed early on in the development process. Usually some creative thinking will lead to an approach to circumvent the constraints. At the same time as the model is being designed and developed, efforts should be underway to collect, clean, analyze, and process the data that will be used to populate the model. By examining this data closely, understanding what is readily available and what is not, the model developer can incorporate this knowledge in the design of the model.
Once these basic design choices are made, the modeler can then begin to think about some of the more tactical questions about model building: what the best technology to construct the model is, what specific software should be used, what the user interface should look like, what data will be necessary to input into the model, and what the principal model outputs should be. There are trade-offs in these decisions as well and answers will depend on a host of factors, not least the model builderās familiarity with different modeling techniques and software.
Building the Model
A computer model is software and as such, its development should follow standard software development practices. My intention is not to rehash these practices, which the reader can find in many books devoted to this subject.1 Having said that, I will add that computer models are often developed on a shoe-string budget with a small development group and on a short timeline, so rigorously following software development practices becomes difficult or impossible. Shortcuts are inevitable. For example, defining software requirements and the creation of extensive test cases are activities that are often abbreviated or skipped altogether.2 I am not advocating that these steps be skipped, but just acknowledging that constraints often require us to impose shortcuts. The ideal model developer is one who is cognizant of best practices in software development but is not a stickler for following them and can judge which shortcuts are necessary or appropriate in a given situation.
The Necessity and Impossibility of Model Verification and Validation
Verification and validation are two critical steps in the development of a computer model. Verification is the process of confirming that the model as implemented in software does what the model designer intends. Validation is the process of confirming that the model is a reasonable representation of the real-world system being modeled. Verification and validation are always necessary but also impossible to perform thoroughly for any complex system.
Model verification is nothing more or less than debugging a piece of software. This starts with the straightforward process of correcting syntactical errors and simple logical errors. But as anyone involved in software quality assurance knows, the process becomes much more challenging after this stage. The problem is that, as with any complex piece of software, the number of possible combinations of inputs to the computer model is enormous. Beyond the combinatorial challenge of enumerating and testing all corner cases, there is the more vexing problem of being able to say with certainty whether the model output, for any given set of inputs, is correct or not. Making sure that the model works correctly in all situations becomes virtually impossible in any finite period of time with a finite number of resources.
Even more troubling is that for a complex model, there may be no easy way of confirming whether the model results are correct or not. Many years ago, I worked for a short period as a product manager for a small software company. The software the company sold was designed to calculate optimal inventory targets for complex manufacturing and distribution networks. If you gave an input of a description of a supply chain to the software, it would calculate supposedly optimal inventory targets for each stocking location in the network. I say supposedly because the problem the software was solving was a non-linear stochastic optimization problem, the general solution to which, for arbitrary supply chains, does not exist. To be sure, there are special cases for which optimal solutions can be found but for general networks, there is no closed-form analytical solution. We developed heuristics to solve the problem and implemented these heuristics in software. But verification became a particularly thorny problem.
For a given test case, if the solution to the problem was not known, how could we verify that the software was outputting a correct solution? Performing the calculation by hand was not an option. Writing separate code to independently confirm the calculations is time-consuming and also leads to a potential infinite regress: If the results of the two pieces of code donāt agree, then what? Write a 3rd piece of code? At the end of the day, the criteria we adopted for the many test cases in which the optimal solution was not known analytically, was this: Is the solution not obviously wrong? In other words, in examining the solution, could we find something that was obviously wrong with it? If not, the test case passed. (By the way, if the test case did not pass, we were still left with the question of figuring out whether it was the algorithm that was at fault or its implementation in software, another vexing challenge.)
These criteria were obviously wrong (who is to judge what is obviously wrong?) but in the absence of an alternative, this is what we were left with. Needless to say, this did not sit well with me. Consider a customer who uses the software on which he is basing multi-million dollar inventory investment decisions that will affect customer service levels and profitability. The software tells him the optimal way to allocate inventory in his supply chain. Our criterion for judging that the solution was optimal was that it is not obviously wrong. Caveat em...