
- 309 pages
- English
- ePUB (mobile friendly)
- Available on iOS & Android
eBook - ePub
Joe Celko's Data, Measurements and Standards in SQL
About this book
Joe Celko has looked deep into the code of SQL programmers and found a consistent and troubling pattern - a frightening lack of consistency between their individual encoding schemes and those of the industries in which they operate. This translates into a series of incompatible databases, each one an island unto itself that is unable to share information with others in an age of internationalization and business interdependence. Such incompatibility severely hinders information flow and the quality of company data.Data, Measurements and Standards in SQL reveals the shift these programmers need to make to overcome this deadlock. By collecting and detailing the diverse standards of myriad industries, and then giving a declaration for the units that can be used in an SQL schema, Celko enables readers to write and implement portable data that can interface to any number of external application systems! This book doesn't limit itself to one subject, but serves as a detailed synopsis of measurement scales and data standards for all industries, thereby giving RDBMS programmers and designers the knowledge and know-how they need to communicate effectively across business boundaries.
- Collects and details the diverse data standards of myriad industries under one cover, thereby creating a definitive, one-stop-shopping opportunity for database programmers
- Enables readers to write and implement portable data that can interface to any number external application systems, allowing readers to cross business boundaries and move up the career ladder
- Expert advice from one of the most-read SQL authors in the world who is well known for his ten years of service on the ANSI SQL standards committee and Readers Choice Award winning column in Intelligent Enterprise
Frequently asked questions
Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weâve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere â even offline. Perfect for commutes or when youâre on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Joe Celko's Data, Measurements and Standards in SQL by Joe Celko in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.
Information
Part 1. History, Standards, and Designing Data
The first of this book discusses the principles of designing data encoding schemes and some of the history of standardization. This second part will give examples of actual standards used in a variety of industries.
In the first six chapters, I look at the foundations from the view point of a database designer who needs some understanding of the how and why.
I find it odd that database designers are very physical about their data and do not work with many abstractions. They were never taught the theory of scales and measurements. They have only a minimal knowledge of validation, verification, and risk of error as a part of the data.
Check digit algorithms are taught as single âprogramming tricksâ in undergraduate computer science classes rather than a mathematical discipline.
To the best of my knowledge, I am the only person who teaches Data Encoding Schemes in an orderly fashion.
The failure of cowboy coders to use standards leads to problems. The homegrown encoding schemes have to be maintained internally. A standard is maintained for you. It is usually maintained by an organization devoted to that standard and with subject area experts who you could not hire.
The days of isolated databases are long gone. You can exchange data with other organizations or buy it from companies when it is standardized. Would you rather buy census data on magnetic tapes or conduct the census yourself?
You can read data and understand it because you know the units of measure.
Chapter 1. Scales and Measurements
âIn physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.â
PLA, Vol. 1, Electrical Units of Measurement, 1883-05-03
Before you can put data into a database, you actually need to think about how it will be represented and manipulated. Most programmers have never heard of measurement theory or thought about the best way to represent their data. They either use whatever was there before or invent their own schemes on the fly. Most of the time, the data is put into the database in the units in which it was collected without regard to even a quick validation. It is assumed the input is in an appropriate unit, with appropriate scale and precision. In short, application programmers and users are perfect. This tendency to believe the computer, no matter how absurd the data, is called the âGarbage In, Gospel outâ principle in IT folklore.
This unwillingness to do validation and verification is probably the major reason for the lack of data quality.
1.1. Measurement Theory
âMeasure all that is measurable and attempt to make measurable that which is not yet so.â
âGalileo (1564â1642)
Measurement theory is a branch of applied mathematics that is useful in data analysis and database design. Measurements are not the same as the attribute being measured. Measurement is not just assigning numbers to things or their attributes so much as it is finding a property in things that can be expressed in numbers or other computable symbols. This structure is the scale used to take the measurement; the numbers or symbols represent units of measure.
Strange as it might seem, measurement theory came from psychology, not mathematics, statistics, or computer science. S. S. Stevens originated the idea of levels of measurement and classification of scales in 1946 for psychology testing. This is more recent than you would have thought. Scales are classified into types by the properties they do or do not have. The properties with which we are concerned are the following.
1. There is a natural origin point on the scale. This is sometimes called a zero, but it does not literally have to be a numeric zero. For example, if the measurement is the distance between objects, the natural zero is zero metersâyou cannot get any closer than that. If the measurement is the temperature of objects, the natural zero is absolute zeroânothing can get any colder. However, consider time; it goes from an eternal past into an eternal future, so you cannot find a natural origin for it.
2. Meaningful operations can be performed on the units. It makes sense to add weights together to get a new weight. Adding temperatures has to consider mass. Dates can be subtracted to give a duration in days. However, adding names or shoe sizes together is absurd.
3. There is a natural ordering to the units. It makes sense to speak about events occurring before or after one another in time or a physical object being heavier, longer, or hotter than another object.
But the alphabetical order imposed on a list of names is arbitrary, not naturalâa foreign language, with different names for the same objects, would impose another alphabetical ordering. And that assumes the other language even had an alphabet for an ordering; Chinese, for example, does not.
4. There is a natural metric function on the units. A metric function has nothing to do with the âmetric systemâ of measurements, which is more properly called SI, for âSystemĂ© International dâunitsâ in French. Metric functions have the following three properties:
a. The metric between an object and itself is the natural origin of the scale. We can write this in a notation as M(a, a) = 0.
b. The order of the objects in the metric function does not matter. Again in the semimathematical notation, M(a, b) = M(b, a).
c. There is a natural additive function that obeys the rule that M(a, b) + M(b, c) > = M(a, c), which is also known as the triangular inequality.
This notation is meant to be more general than just arithmetic. The âzeroâ in the first property is the origin of the scale, not just a numeric zero. The third property, defined with a âplusâ and a âgreater than or equalâ sign, is a symbolic way of expressing general ordering relationships. The âgreater than or equalâ sign refers to a natural ordering on the attribute being measured. The âplusâ sign refers to a meaningful operation in regard to that ordering, not just arithmetic addition.
The special case of the third property, where the âgreater than or equal toâ is always âgreater than,â is very desirable to people because it means that they can use numbers for units and do simple arithmetic with the scales. This is called a strong metric property. For example, human perceptions of sound and light intensity follow a cube root lawâthat is, if you double the intensity of light, the perception of the intensity increases by only 20% (Stevens 1957). The actual formula is âPhysical intensity to the 0.3 power equals perceived intensityâ in English. Knowing this, designers of stereo equipment use controls that work on a logarithmic scale internally but that show evenly spaced marks on the control panel of the amplifier.
It is possible to have a scale that has any combination of the metric properties. For example, instead of measuring the distance between two places in meters, you can measure it in units of effort. This is the old Chinese system, which had uphill and downhill units of distance, so you could estimate the time required to make a journey on foot.
Does this system of distances have the property that M(a, a) = 0? Yes; it takes no effort to get to where you are already located. Does it have the property that M(a, b) = M(b, a)? No; it takes less effort to go downhill than to go uphill. Does it have the property that M(a, b) + M(b, c) >= M(a, c)? Yes with the direction considered; the amount of effort needed to go directly to a place will always be less than the effort of making another stop along the way.
As you can see, these properties can be more intuitive than mathematical. Obviously, we like the more mathematical side of this model because it fits into a database, but you have to be aware of the intuitive side.
1.1.1. Range, Granularity, and Your Instruments
âThe only man who behaves sensibly is my tailor; he takes my measurements anew every time he sees me, while all the rest go on with their old measurements and expect me to fit them.â
âGeorge Bernard Shaw
Range and granularity are properties of the way the measurements are made. Since we have to store data in a database within certain limits, they are very important to a database designer. The type of scales is unrelated to whether you use discrete or continuous variables. While measurements in a database are always discrete due to finite precision, attributes can be conceptually either discrete or continuous regardless of measurement level. Temperature is usually regarded as a continuous attribute, so temperature measurement to the nearest degree Celsius is a ratio-level measurement of a continuous attribute.
![]() |
| â Figure 1-1:. |
| Micrometer (http://www.design-technology.org/micrometer.webp). |
However, quantum mechanics holds that the universe is fundamentally discrete, so temperature may actually be a discrete attribute. In ordinal scales for continuous attributes, ties are impossible (or have probability zero). In ordinal scales for discrete attributes, ties are possible. Nominal scales usually apply to discrete attributes. Nominal scales for continuous attributes can be modeled but are rarely used.
Aside from these philosophical considerations, there is the practical aspect of the instrument used for the measurement. A radio telescope, surveyorâs transit, meter stick, and a micrometer are tools that measure distance. Nobody would claim that they are interchangeable. I can use a measuring tape to fit furniture in my house but not to make a mechanical wristwatch or to measure the distance to the moon.
From a purely scientific viewpoint, measurements should be reduced to the least precise instrumentâs readings. This means that you can be certain that the final results of calculations can be justified.
From a practical viewpoint, measurements are often adjusted by statistical considerations. This means that final results of calculations will be closer to realityâassuming that the adjustments were valid. This is particularly true for missing data, which we will discuss later.
But for now consider the simple example of a database showing that Joe Celko bought 500 bananas this week. Unless I just started a gorilla ranch, this is absurd and probably ought to be adjusted to five bananas or less. On the other hand, if the Dairy Queen Company orders five bananas this week, this is absurd. They are a corporation that had about 6000 restaurants in the United States, Canada, and 20 foreign countries in 2007, all of which make a lot of banana splits every day.
1.1.2. Range
A scale also has other properties that are of interest to someone building a database. First, scales have a rangeâwhat are the highest and lowest values that can appear on the scale? It is possible to have a finite or an infinite limit on either the lower or the upper bound. Overflow ...
Table of contents
- Cover image
- Table of Contents
- Copyright
- Introduction
- Part 1. History, Standards, and Designing Data
- Chapter 1. Scales and Measurements
- Chapter 2. Validation
- Chapter 3. Data Encoding Schemes
- Chapter 4. Scales
- Chapter 5. Data with Ignorance
- Chapter 6. Keys
- Part 2. A Sampling of Standards
- Chapter 7. Dates
- Chapter 8. Sex Codes
- Chapter 9. Ethnicity and Race Codes
- Chapter 10. ISO-3166 and Other Country Codes
- Chapter 11. Language Codes
- Chapter 12. Currency Codes
- Chapter 13. National Identification Numbers
- Chapter 14. Occupations
- Chapter 15. Colors
- Chapter 16. Telephone Numbers
- Chapter 17. E-Mail Addresses
- Chapter 18. Universal Postal Union
- Chapter 19. Hierarchical Triangular Mesh
- Chapter 20. Shoe Sizes
- Chapter 21. International Clothing Sizes
- Chapter 22. ICD Codes
- Chapter 23. Vehicle Identification Number (VIN)
- Chapter 24. Freight Containers
- Chapter 25. Credit Card Numbers
- Chapter 26. SWIFT and Related Banking Standards
- Chapter 27. Data Universal Numbering System
- Chapter 28. Global Trade Item Number
- Chapter 29. Digital Object Identifier (DOI)
- Chapter 30. Audiovisual Media
- Chapter 31. ISIN and Related Securities Identifiers
- Chapter 32. Temperature Scales
- Chapter 33. National Animal Identification System (NAIS)
- Chapter 34. ISO 216 Paper Sizes (âA,â âB,â and âCâ Series)
- Chapter 35. Compass Points
- Chapter 36. Unicode
- Chapter 37. Driverâs Licenses
- Chapter 38. Currency Units and Near Money
- Chapter 39. Recipes and Food Preparation
- Chapter 40. Portable Document Format (PDF)
- Chapter 41. Temporal Data
- Chapter 42. Additive Congruential Generators
- Chapter 43. Traditional and Metric Typographic Units
- Index
