eBook - ePub

Joe Celko's SQL for Smarties

Name: Joe Celko's SQL for Smarties
Author: Joe Celko

Advanced SQL Programming

Joe Celko

Share book

840 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Joe Celko's SQL for Smarties

Advanced SQL Programming

Joe Celko

Book details

Book preview

Table of contents

Citations

About This Book

SQL for Smarties was hailed as the first book devoted explicitly to the advanced techniques needed to transform an experienced SQL programmer into an expert. Now, 10 years later and in the third edition, this classic still reigns supreme as the book written by an SQL master that teaches future SQL masters. These are not just tips and techniques; Joe also offers the best solutions to old and new challenges and conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance. In the third edition, Joe features new examples and updates to SQL-99, expanded sections of Query techniques, and a new section on schema design, with the same war-story teaching style that made the first and second editions of this book classics.

Expert advice from a noted SQL authority and award-winning columnist, who has given ten years of service to the ANSI SQL standards committee and many more years of dependable help to readers of online forums.
Teaches scores of advanced techniques that can be used with any product, in any SQL environment, whether it is an SQL-92 or SQL-99 environment.
Offers tips for working around system deficiencies.
Continues to use war stories--updated!--that give insights into real-world SQL programming challenges.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Joe Celko's SQL for Smarties an online PDF/ePUB?

Yes, you can access Joe Celko's SQL for Smarties by Joe Celko in PDF and/or ePUB format, as well as other popular books in Computer Science & Databases. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Morgan Kaufmann

Year

2010

ISBN

9780080460048

Edition

Topic

Computer Science

Subtopic

Databases

Index

Computer Science

CHAPTER 1

Database Design

THIS CHAPTER DISCUSSES THE DDL (Data Definition Language), which is used to create a database schema. It is related to the next chapter on the theory of database normalization. Most bad queries start with a bad schema. To get data out of the bad schema, you have to write convoluted code, and you are never sure if it did what it was meant to do.

One of the major advantages of databases, relational and otherwise, was that the data could be shared among programs so that an enterprise could use one trusted source for information. Once the data was separated from the programs, we could build tools to maintain, back up, and validate the data in one place, without worrying about hundreds or even thousands of application programs possibly working against each other.

SQL has spawned a whole branch of data modeling tools devoted to designing its schemas and tables. Most of these tools use a graphic or text description of the rules and the constraints on the data to produce a schema declaration statement that can be used directly in a particular SQL product. It is often assumed that a CASE tool will automatically prevent you from creating a bad design. This is simply not true.

Bad schema design leads to weird queries that are trying to work around the flaws. These flaws can include picking the wrong data types, denormalization, and missing or incorrect constraints. As Elbert Hubbard (American author, 1856–1915) put it: “Genius may have its limitations, but stupidity is not thus handicapped.”

1.1 Schema and Table Creation

The major problem in learning SQL is that programmers are used to thinking in terms of files rather than tables.

Programming languages are usually based on some underlying model; if you understand the model, the language makes much more sense. For example, FORTRAN is based on algebra. This does not mean that FORTRAN is exactly like algebra. But if you know algebra, FORTRAN does not look all that strange to you. You can write an expression in an assignment statement or make a good guess as to the names of library functions you have never seen before.

Programmers are used to working with files in almost every other programming language. The design of files was derived from paper forms; they are very physical and very dependent on the host programming language. A COBOL file could not easily be read by a FORTRAN program, and vice versa. In fact, it was hard to share files even among programs written in the same programming language!

The most primitive form of a file is a sequence of records, ordered within the file and referenced by physical position. You open a file, then read a first record, followed by a series of next records until you come to the last record to raise the end-of-file condition. You navigate among these records and perform actions one record at a time. The actions you take on one file have no effect on other files that are not in the same program. Only programs can change files.

The model for SQL is data kept in sets, not in physical files. The “unit of work” in SQL is the whole schema, not individual tables.

Sets are those mathematical abstractions you studied in school. Sets are not ordered, and the members of a set are all of the same type. When you perform an operation on a set, the action happens “all at once” to the entire membership of the set. That is, if I ask for the subset of odd numbers from the set of positive integers, I get all of them back as a single set. I do not build the set of odd numbers by sequentially inspecting one element at a time. I define odd numbers with a rule “If the remainder is 1 when you divide the number by 2, it is odd” that could test any integer and classify it. Parallel processing is one of many, many advantages of having a set-oriented model.

SQL is not a perfect set language any more than FORTRAN is a perfect algebraic language, as we will see. But if you are in doubt about something in SQL, ask yourself how you would specify it in terms of sets, and you will probably get the right answer.

1.1.1 CREATE SCHEMA Statement

A CREATE SCHEMA statement, defined in the SQL Standard, brings an entire schema into existence all at once. In practice, each product has very different utility programs to allocate physical storage and define a schema. Much of the proprietary syntax is concerned with physical storage allocations.

A schema must have a name and a default character set, usually ASCII or a simple Latin alphabet as defined in the ISO Standards. There is an optional AUTHORIZATION clause that holds a <schema authorization identifier> for security. After that the schema is a list of schema elements:

A schema is the skeleton of an SQL database; it defines the structures of the schema objects and the rules under which they operate. The data is the meat on that skeleton.

The only data structure in SQL is the table. Tables can be persistent (base tables), used for working storage (temporary tables), or virtual (VIEWs, common table expressions, and derived tables). The differences among these types are in implementation, not performance. One advantage of having only one data structure is that the results of all operations are also tables, you never have to convert structures, write special operators, or deal with any irregularity in the language.

The <grant statement> has to do with limiting user access to certain schema elements. The <assertion definition> is not widely implemented yet, but it works as a constraint that applies to the schema as awhole. Finally, the <character set definition>, <collation definition>, and <translation definition> deal with the display of data. We are not really concerned with any of these schema objects; they are usually set in place by the DBA (database administrator) for the users, and we mere programmers do not get to change them.

Conceptually, a table is a set of zero or more rows, and a row is a set of one or more columns. Each column has a specific data type and constraints that make up an implementation of an abstract domain. The way a table is physically implemented does not matter, because you only access it with SQL. The database engine handles all the details for you and you never worry about the internals, as you would with a physical file.

In fact, almost no two SQL products use the same internal structures. SQL Server uses physically contiguous storage accessed by two kinds of indexes; Teradata uses hashing; Nucleus (SAND Technology) uses compressed bit vector; Informix and CA-Ingres use more than a dozen different kinds of indexes.

There are two common conceptual errors made by programmers who are accustomed to file systems or PCs. The first is thinking that a table is a file; the second is thinking that a table is a spreadsheet. Tables do not behave like either, and you will get surprises if you do not understand the basic concepts.

It is easy to imagine that a table is a file, a row is a record, and a column is a field. This concept is familiar, and when data moves from SQL to the host language, it must be converted into host language data types and data structures to be displayed and used.

The big differences between working with a file system and working with SQL are in the way SQL fits into a host program. If you are using a file system, your programs must open and close files individually. In SQL, the whole schema is connected to or disconnected from the program as a single unit. The host program might not be authorized to see or manipulate all of the tables and other schema objects, but that is established as part of the connection.

Th...