1 Definition of chemometrics
The subject matter of this book is chemometrics, a term coined in 1972, which can be defined as the chemical discipline that uses mathematical, statistical, and other methods employing formal logic (a) to design or select optimal measurement procedures and experiments, and (b) to provide maximum relevant chemical information by analyzing chemical data.
Chemometrics has found widespread application in analytical chemistry and therefore that, essentially, is what this book is about. At the same time, it is also a book about the essentials of analytical chemistry. If one leaves out the words mathematical, etc. from the definition, one observes that chemometrics is really about what all analytical chemists try to do, namely to design optimal analytical procedures and to try and obtain as much information as possible from the results. Since chemometrics does this with the help of mathematical methods, it has evolved to the theoretical cornerstone of what we will call the analytical process, i.e. the reasoning followed by the analyst to select and optimize procedures, to carry them out in an efficient way, and to interpret the results correctly with a maximum of relevant information as the end product.
2 The intelligent laboratory concept
Figure 1 gives a general picture of the analytical process. Further detail is given in Fig. 2.
Fig. 1 The main steps in the analytical process. Chemometrics is concerned only with steps 1 and 3.
Fig. 2 A more detailed view of the analytical process. The numbers in the boxes relate to Fig. 1.
The analytical process starts with a problem that has to be solved and to solve it one needs chemical information. It is the task of the analytical chemist to provide this. His first step will be to select a method. Let us suppose he has to know the quality of a certain foodstuff and to do this he needs to determine the trace element content. He then has to decide whether he will use atomic absorption spectrometry (AAS), neutron activation analysis, inductively coupled plasma emission, or some other method. If he chooses AAS, his next decision will then be to decide on a flame or flameless method and if he selects the latter, what ashing temperature or gradient would be indicated. He will also have to decide on the pretreatment of the sample (wet ashing or low-temperature ashing, extraction or not, what kind of extraction, etc). All this will lead him to an initial procedure. Very probably, this procedure will not be the final one. The analytical chemist then tries to optimize the initial procedure by experimental optimization. He changes the pH of the buffer to obtain a more complete extraction and the drying temperature in the oven to obtain more reproducible results.
The procedure is now available and the analysis can begin. This is usually divided into two parts, the pretreatment and the actual determination. The pretreatment consists of operations such as weighing, extracting, drying, centrifugation, etc. This step is often the most difficult and time-consuming and determines the quality and efficiency of the method.
The result of the determination is a (usually electrical) signal and nowadays it is retrieved from the instrument by a computer. Very often, this signal is first treated to make it more useful by, for instance, reducing noise and it is then translated into chemical information. This means that a list of chemical identities and concentrations is now obtained. To achieve this translation, models describing the relationship between signal and concentration or identity, such as calibration models, are required. Some analysts finish here but one should remember, of course, that the analysis was carried out to solve a problem. This means that the chemical information should be translated into user or diagnostic information. Is the foodstuff acceptable for consumption?; does the analysis of an air sample indicate that a certain industry is responsible for air pollution at the collection point?; does the result of a patient’s blood tests indicate a certain disease?, etc. While the answer may be simple in some cases (for example, the foodstuff contains chemical X in excess of dose Y and therefore violates legal rules) it may be much more complex in certain cases and necessitate the application of certain mathematical techniques.
The analytical process described in this way can be considered as a system regulated by two feedback loops (Fig. 3). The first, internal to the laboratory, is the quality evaluation loop. Its purpose is to verify whether the performance of the method is good enough to achieve the analytical purpose for which it was developed and carried out. This loop requires the definition and evaluation of performance criteria (is the method “good” enough?) and the development of quality control schemes (does the method remain good enough when it is carried out repeatedly or continuously over a period of time?).
Fig. 3 The analytical process and its environment. The numbers and letters in the boxes relate to Fig. 2.
The second loop (the decision loop in Fig. 3) is the interaction with the outside world. The analytical results (hopefully) serve to solve a problem for, or to make a decision by, the person or organisation that asked for the results in the first place. This usually leads to new questions or, when the results did not bring the expected solution to the problem, to a better formulation of the question. In many instances, the analytical results also serve to control some process and the characteristics of the process determine the required characteristics of the analytical method.
Chemometrics is involved in steps 1a, 1b, 3a, 3b and 3c and in both control loops. Practical chemometrics is a matter of carrying out computations and this means that, in each of these steps, a computer is involved. This is certainly true also for steps 2a and 2b. More and more instruments are now attached to a computer (2a) and robots (2b), which are really computers with a hand, are often used to carry out the pretreatment step. In fact, we conclude that all the steps of Fig. 2 are computer-compatible. It is our belief that the separate functions of Figs. 2 and 3 will slowly be integrated and controlled by a central laboratory information system. When this integration has been achieved, an intelligent laboratory will have been developed. It will be able to select and optimize a procedure by itself, carry it out, extract the relevant information, check its own good functioning, and help in making decisions.
The integrated intelligent laboratory described in this way will rely heavily on software and the purpose of this book is to give the formal and mathematical background of the algorithms and techniques used. In fact, an alternative definition of chemometrics in analytical chemistry could be that it is the chemical discipline that studies mathematical, statistical and other methods employing formal logic to achieve the development of an integrated intelligent laboratory as described in Fig. 3. The technical software problems are not discussed. For instance, although we consider that robotics may become an important part of the intelligent laboratory, its development is mainly a question of hardware and software technology and therefore there will be no chapter on robotics in this book. There is also a lot of interest nowadays in expert systems. These will certainly be of use in those steps where the analytical chemist uses expertise, such as in the development of the analytical procedure or in the interpretation of spectroscopic data (structural analysis). However, again, this is mainly a problem of software and knowledge engineering, which we believe to be beyond the scope of a textbook on chemometrics today.