Computer Science

GROUP BY SQL

GROUP BY SQL is a clause used in SQL queries to group rows that have the same values. It is used in conjunction with aggregate functions such as SUM, COUNT, AVG, etc. to perform calculations on the grouped data. The resulting output displays the grouped data and the calculated values.

Written by Perlego with AI-assistance

Related key terms

1 of 5

7 Key excerpts on "GROUP BY SQL"

eBook - PDF
Beginning SQL
- Paul Wilton, John Colby(Authors)
- 2005(Publication Date)
- Wrox
  (Publisher)
The GROUP BY clause is at its most powerful when used with SQL’s summarizing and aggregating functions, which are covered in the next section. The GROUP BY clause is also very useful with subqueries, a concept examined in Chapter 7. The aim of this section is to get a handle on how GROUP BY works; the next section shows you how to use it more effectively. Begin by looking at how GROUP BY can answer the question, “Which states do members of the film club live in?” The answer doesn’t require a list of every member and the state they live in; you simply want a list of the specific different states. Use the GROUP BY clause to answer this question, even though strictly speaking SELECT DISTINCT would work just as well: SELECT State FROM MemberDetails GROUP BY State; The GROUP BY clause must go after any FROM or WHERE clauses in the SELECT statement. All the columns you want to be grouped must be listed in the GROUP BY column list. For example, the preceding code groups by the State column. If you want to include more than one column in the GROUP BY clause, then separate the columns with commas, in the same way that you would separate columns in a SELECT statement’s column list. The preceding SQL produces the results shown in the following table. One of the values in the table is NULL, so you end up with one group that is NULL: State NULL Golden State Mega State New State When dealing with a GROUP BY clause, the database system first creates a temporary results set based on the FROM and WHERE clauses. So, in the preceding example, the results set is created from the following SELECT statement: SELECT State FROM MemberDetails; The DBMS then uses these results and looks for groups of identical records based on the column or columns specified in the GROUP BY clause. In this case, State is the only column, so it’s just a matter of grouping all the identical states into one row.
Sign up to read
Learn more about book
eBook - ePub
SQL for Data Analytics
Harness the power of SQL to extract insights from data, 3rd Edition
- Jun Shan, Matt Goldwasser, Upom Malik, Benjamin Johnston(Authors)
- 2022(Publication Date)
- Packt Publishing
  (Publisher)
GROUP BY clause.
Note
To access the source code for this specific section, please refer to https://packt.link/OU9zr .

Aggregate Functions with the GROUP BY Clause

So far, you have used aggregate functions to calculate statistics for an entire column. However, most times you are interested in not only the aggregate values for a whole table but also the values for smaller groups in the table. To illustrate this, refer back to the customers table. You know that the total number of customers is 50,000. However, you might want to know how many customers there are in each state. But how can you calculate this?
You could determine how many states there are with the following query: SELECT DISTINCT state FROM customers;
You will see 50 distinct states, Washington D.C., and NULL returned as a result of the preceding query, totaling 52 rows. Once you have the list of states, you could then run the following query for each state:
SELECT COUNT(*) FROM customers WHERE state='{state}'
Although you can do this, it is incredibly tedious and can take a long time if there are many states. The GROUP BY clause provides a much more efficient solution.

The GROUP BY Clause

GROUP BY is a clause that divides the rows of a dataset into multiple groups based on some sort of key that is specified in the clause. An aggregate function is then applied to all the rows within a single group to produce a single number for that group. The GROUP BY key and the aggregate value for the group are then displayed in the SQL output. The following diagram illustrates this general process:

Figure 4.11: General GROUP BY computational model
In the preceding diagram, you can see that the dataset has multiple groups (Group 1 , Group 2 , …, Group N ). Here, the aggregate function is applied to all the rows in Group 1 and generates the result Aggregate 1 . Then, the aggregate function is applied to all the rows in Group 2 and generates the result Aggregate 2
Sign up to read
Learn more about book
eBook - PDF
Oracle SQL
Jumpstart with Examples
- Gavin JT Powell, Carol McCullough-Dieter(Authors)
- 2004(Publication Date)
- Digital Press
  (Publisher)
HAVING . Filter to remove selected groups from the result, much like the WHERE clause is used to filter rows retrieved by the SELECT statement. ROLLUP AND CUBE . Further group the summary rows created by the GROUP BY clause to produce groups of groups or super aggre-gates. GROUPING SETS . Add filtering and the capability for multiple super aggregates using the ROLLUP and CUBE clauses. SPREADSHEET . The SPREADSHEET clause allows representation and manipulation of data into a spreadsheet-type format. The SPREADSHEET clause literally allows the construction of a spread-sheet from within SQL. The SPREADSHEET clause will be explained later on in this chapter. Figure 11.1 The Syntax of the GROUP BY Clause. 11.2 Types of Group Functions 237 Chapter 11 11.2 Types of Group Functions Group functions are different from single-row functions in that group func-tions work on data in sets, or groups of rows, rather than on data in a single row. For example, you can use a group function to add up all payments made in one month. You can combine single-row and group functions to further refine the results of the GROUP BY clause. There are many group functions available to use with the GROUP BY clause. Functions operating on groups of rows fall into the following cate-gories: Aggregate Functions . Functions that summarize data into a single value, such as the MAX function, returning the highest value among the group of rows. Statistical Functions . These functions are essentially aggregation functions in that they perform explicit calculations on specified groups of rows. However, statistical functions are appropriate to both aggregation and analytics. Analytic Functions . Functions that summarize data into multiple values based on a sliding window of rows using an analytic clause. These structures are used most frequently in data warehousing to analyze historical trends in data.
Sign up to read
Learn more about book
eBook - PDF
Oracle Data Warehouse Tuning for 10g
- Gavin JT Powell(Author)
- 2011(Publication Date)
- Digital Press
  (Publisher)
GROUP BY clause extensions are SQL tuning tools for data warehous-ing simply because they exist. These extension clauses allow for better per-formance because they will execute faster than building highly complex SQL statements to cope with this type of functionality. Obviously, simplifi-cation of coding saves time and places the burden of performance on the optimizer, not the programmer. Additionally, where complex SQL code might be executed across a network between client and server machines as a highly complex GROUP BY clause, GROUP BY extensions place the pro-cessing burden squarely on the shoulders of the server. And, obviously, cre-ating special objects, such as materialized views, provides for reduced I/O activity and lower concurrency requirements. 8.2 GROUP BY Clause Extensions The GROUP BY clause in its most basic form consists of the GROUP BY clause, any columns or expressions not subjected to aggregation in the query, plus an optional HAVING clause. The HAVING clause allows the filtering out of rows from the resulting aggregated row set. In other words, the HAVING clause allows retention of specific summary rows and exclu-sion of others. 8.2 GROUP BY Clause Extensions 217 Chapter 8 GROUP BY clause extensions allow operations on aggregations pro-duced by a query with a GROUP BY clause. Let’s begin with the easiest extension clauses, the ROLLUP and CUBE clauses. 8.2.1 The ROLLUP and CUBE Clauses The ROLLUP clause builds two-dimensional structures, and the CUBE clause builds multiple-dimensional structures. The ROLLUP Clause The ROLLUP clause is best suited to build two-dimensional summaries over multiple hierarchical layers. In simple terms, the ROLLUP clause lets you create multiple layers of subtotals within subtotals for all rows in a row set. The result also includes a grand total for all subtotal layers. ROLLUP Clause Syntax Figure 8.1 highlights the syntax of the ROLLUP clause. How the ROLLUP Clause Helps Performance The ROLLUP clause is very simple.
Sign up to read
Learn more about book
eBook - ePub
Mastering PostgreSQL 13
Build, administer, and maintain database applications efficiently with PostgreSQL 13, 4th Edition
- Hans-Jürgen Schönig(Author)
- 2020(Publication Date)
- Packt Publishing
  (Publisher)
Handling Advanced SQL

In Chapter 3 , Making Use of Indexes , you learned about indexing, as well as about PostgreSQL's ability to run custom indexing code to speed up queries. In this chapter, you will learn about advanced SQL. Most of the people who read this book will have some experience of using SQL. However, experience has shown that the advanced features outlined in this book are not widely known, and therefore it makes sense to cover them in this context to help people to achieve their goals faster and more efficiently. There has been a long discussion about whether the database is just a simple data store or whether the business logic should be in the database. Maybe this chapter will shed some light and show how capable a modern relational database really is. SQL is not what it used to be back when SQL-92 was around. Over the years, the language has grown and become more and more powerful.
This chapter is about modern SQL and its features. A variety of different and sophisticated SQL features are covered and presented in detail. We will cover the following topics in this chapter:

Introducing grouping sets

Using ordered sets

Understanding hypothetical aggregates

Utilizing windowing functions and analytics

Writing your own aggregates

By the end of this chapter, you will understand and be able to use advanced SQL.

Introducing grouping sets

Every advanced user of SQL should be familiar with the GROUP BY and HAVING clauses. But are they also aware of CUBE, ROLLUP, and
Sign up to read
Learn more about book
eBook - PDF
Understanding Databases
Concepts and Practice
- Suzanne W. Dietrich(Author)
- 2021(Publication Date)
- Wiley
  (Publisher)
There can be multiple group- ing attributes as illustrated by the query sql_CourseOfferingCount that computes the count of the number of employees who took a course on a date. The count computes the number of tuples in the grouping that includes both cID and tDate. As a generalization with grouping, the select clause contains the grouping attributes specified in the group by clause and the columns repre- senting the desired aggregation for that group. sql_CourseOfferingCount: √ select T.cID, T.tDate, count(*) as emptookoffering from takes T group by T.cID, T.tDate This introductory text follows the semantics for grouping in the SQL standard, requiring that the non-aggregate columns in the select clause must appear in the group by clause as the only grouping attributes. This requirement makes sense because the query is asking for an aggregate result for the same values of those grouping attributes. A having clause specifies a filtering condition on the result of the grouping. SQL also supports the ability to place a selection condition on the results of grouping using the having clause. Essentially a having clause is specifying a filtering, similar to a where clause, for the group. What if you only wanted the titles with the count of the number of employees who took a course such that there are at least four such employees with that title. This query can be answered by appending the following having clause to sql_CountByTitleTookCourses: having emptookcoursescount >= 4 Note that some database products do not recognize the renaming of the attribute within the having clause. In that case, the aggregate operator must be used again: having count(distinct T.eID) >= 4 Note that the having clause is just a shortcut, allowing the use of one query instead of using a sec- ond query to filter the result of the grouping. Also, the query can be rewritten with the aggregation query specified as an inline view using a where to filter the result.
Sign up to read
Learn more about book
No longer available |Learn more
SQL Pocket Primer
- Oswald Campesato(Author)
- 2022(Publication Date)
- Mercury Learning and Information
  (Publisher)
Listing 3.5 is that you can generalize the result by adding an arbitrary number of departments, or by changing the number of values that you want from each classroom, or both. Try replacing the number 3 by 1, 2, 4, 5, or any other positive integer and verify that the output of the modified SQL statement is correct.

SQL AND HISTOGRAMS

A histogram in SQL refers to a SQL statement that displays the distribution (i.e., frequency) of items in a database table. For example, we can display the contents of the item_desc table as follows:

select * from item_desc; +---------+-------------+------------+ | item_id | item_desc   | item_price | +---------+-------------+------------+ |     100 | hammer      |      20.00 | |     200 | screwdriver |       8.00 | |     300 | wrench      |      10.00 | +---------+-------------+------------+ 3 rows in set (0.001 sec)

We display only the values of the item_price attribute in the item_desc table as follows:

select item_price from item_desc; +------------+ | item_price | +------------+ |      20.00 | |       8.00 | |      10.00 | +------------+ 3 rows in set (0.000 sec)

The next portion of this chapter contains examples of SQL statements that specify each of the clause ORDER BY , GROUP BY , and HAVING , followed by examples that use a combination of these SQL clauses. For simplicity, the SQL queries in the upcoming sections are based on a single table; however, you can generate more sophisticated reports that contain JOIN clauses that involve multiple tables.

WHAT ARE GROUP BY, ORDER BY, AND HAVING CLAUSES?

The GROUP BY clause enables you to count items that are “grouped together” based on the same attribute value. For example, the following SQL statement counts the number of occurrences of the same city value in the weather table:

SELECT city, COUNT(city)
FROM weather GROUP BY city; +------+-------------+ | city | count(city) | +------+-------------+ | sf   |           7 | | se   |           1 | |      |           2 | | chi  |           1 | +------+-------------+ 4 rows in set (0.003 sec)

The ORDER BY clause enables you to specify the order in which items are displayed. For example, the following SQL statement counts the number of occurrences of the same city name in the weather table and also orders the output alphabetically by city
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

1 of 8

View all

GROUP BY SQL

Related key terms

7 Key excerpts on "GROUP BY SQL"

Beginning SQL

SQL for Data Analytics

Harness the power of SQL to extract insights from data, 3rd Edition

Aggregate Functions with the GROUP BY Clause

The GROUP BY Clause

Oracle SQL

Jumpstart with Examples

Oracle Data Warehouse Tuning for 10g

Mastering PostgreSQL 13

Build, administer, and maintain database applications efficiently with PostgreSQL 13, 4th Edition

Introducing grouping sets

Understanding Databases

Concepts and Practice

SQL Pocket Primer

SQL AND HISTOGRAMS

WHAT ARE GROUP BY, ORDER BY, AND HAVING CLAUSES?

Explore more topic indexes