Sometimes we need to operate on the results of several queries as if they were sets. We do not have to use a programming language to do these operations on the data, but we can use constructs from the SQL language. The operators available allow us to both join and exclude the results two sets, as well as define the intersection. With some examples we will understand how to use these constructs appropriately.

Share

Reading time: 4 minutes

Thanks to the knowledge gained in the previous articles, SELECT: simple query structure, SELECT: query with JOIN and GROUP BY, SELECT: IN operator and SELECT: NOT IN operator and tuple constructor, we are able to write quite complex queries. In fact, we can define groups, calculate statistics, include and/or exclude records based on the result of other queries. We really have a lot of tools! The SQL language, however, provides us with even more features and constructs to increase our productivity and the possibilities of extracting data of interest with very particular constraints. In this article we will address the use of set operators that allow us to join the results of two queries using set theory.

As in previous articles, we will use the now well-known pattern related to movie showings in theaters. We quote it below for those who have not read the other articles.

FILM (CodF, Title, Release_Date, Genre, DurationMinutes)
CINEMA (CodC, Name, Address, City, Website*)
ROOM (CodC, RoomNumber, Capacity)
PROJECTION (CodC, RoomNumber, Date, StartTime, EndTime, CodF)

As a reminder, the notation used to describe the schema uses underlining to indicate the fields that make up the primary key and an asterisk to indicate the possibility of assigning NULL values to the field. Finally, for simplicity, foreign keys have the same name as the primary keys to which they refer. Let us then proceed to look at the various operators.

Set operators

The set operators are as follows:

  • UNION
  • INTERSECT
  • EXCEPT

They have the function of putting together the results that result from two queries. However, they have a special feature: they require the result patterns to be compatible with each other. Therefore, unlike the IN and NOT IN operators that put two queries “in communication,” these operate similarly if and only if the selected attributes are the same. Let us analyze each operator in more detail to understand their peculiarities.

UNION

This operator performs the union of the two relational expressions A and B, that is, the results of two SELECT statements. The syntax is as follows:

				
					A UNION [ALL] B
				
			

where the ALL parameter is optional. In case it is omitted duplicates of selected rows are removed. Otherwise all results are returned to the user. Let us better understand by means of an example how it works.

Suppose we want to find the names of movie theaters in Turin or that have at least one theater with a capacity greater than 200. We again divide the problem into two subproblems. The first will be related to finding all cinemas in Turin and is solved by a very simple query.

				
					SELECT Name
FROM CINEMA
WHERE City=’Turin’;
				
			

The second problem, on the other hand, concerns the identification of cinemas with at least one hall with a capacity greater than 200. We must first filter the rows of the ROOM table by the value of the capacity attribute. After that, since we need the name of the cinema it will be sufficient to perform a join with the CINEMA table. The final query will be as follows:

				
					SELECT Name
FROM CINEMA C, ROOM R
WHERE C.CodC=R.CodC
AND Capacity > 200;
				
			

Now all we have to do is put the two results together. Since we have selected only the Name attribute in both queries, the patterns are by definition compatible. We can, therefore, apply the UNION operator so as to obtain the desired result. Obviously if there were a cinema that is located in Turin and has a hall with a capacity greater than 200 it would appear in both results of the two queries just written. To prevent it from being repeated simply omit the ALL parameter. The final query will be:

				
					SELECT Name
FROM CINEMA
WHERE City=’Turin’

UNION

SELECT Name
FROM CINEMA C, ROOM R
WHERE C.CodC=R.CodC
AND Capacity > 200

				
			

INTERSECT

As the name implies, this operator performs an intersection of the results of two SELECT statements. Given, therefore, two queries A and B, its syntax is

				
					A INTERSECT B

				
			

Let us try to apply this to a concrete example. Suppose we want to find the dates on which movies were released and also some movies were shown in the theaters we have in our database. Therefore, we write two queries that select the release dates of the movies and the screening dates. The intersection between these two sets will be the ones we want to find. The first query will return the release dates from the FILM table, while the second will return the screening dates. To get the final result we will use the INTERSECT operator.

				
					SELECT Release_Date
FROM FILM

INTERSECT 

SELECT Date
FROM PROJECTION;
				
			

As you can see, the field names are different. This, in many databases, does not affect schema compatibility since the selected attributes are of the same type (i.e., DATE). Of course, some databases may require the name to be the same, and in that case it will be sufficient to rename using the AS operator the selected fields.

The intersection operation can be replaced by join or the IN operator. In the case of the join, it is sufficient to enter in the FROM clause the tables affected by the intersection, and in the WHERE clause the join conditions between the attributes in the SELECT clauses of the two queries. Using these rules the previous query would result:

				
					SELECT F.Release_Date
FROM FILM F, PROJECTION P
WHERE F.Release_date=P.Date;
				
			

The join condition, then, is not the “classic” one on primary/external keys, which obviously should not be inserted so as not to affect the desired result.

If we want to use the IN operator instead, we will have to use one of the two queries as a nested query, while the attributes in the outer SELECT clause, joined by a tuple constructor, form the left side of the IN operator. In practice, the query from before we can write it like this:

				
					SELECT Release_Date
FROM FILM
WHERE Release_date IN (SELECT Date
                      FROM PROJECTION);

				
			

EXCEPT

Finally, the EXCEPT operator performs a difference between the results of two queries. In addition to schema compatibility, this operator does not enjoy the commutative property. This means that the order in which the queries are written, and consequently executed, is relevant. Given two queries A and B, the syntax is:

				
					A EXCEPT B
				
			

Let’s take the example about dates again. Suppose we want to find the release dates of movies where no movies were shown. The queries are as seen previously but we will need to use the EXCEPT operator.

				
					SELECT Release_Date
FROM FILM

EXCEPT

SELECT Date
FROM PROJECTION;
				
			

Being a difference, the concept on which it operates is the same as the exclusion of the NOT IN operator that can be used in its place. In order to use it, the second query will become nested within the NOT IN operator, while the attributes in the SELECT clause of the first query, joined by a tuple constructor, will form the left side of the NOT IN operator. The end result is as follows:

				
					SELECT Release_Date
FROM FILM
WHERE Release_date NOT IN (SELECT Date
                          FROM PROJECTION);

				
			

More To Explore

DBMS

SQL: correlation

In the SQL language, it is possible to condition the execution of a query on the values of the tuples being parsed. This behavior is defined by correlation, that is, conditions in the WHERE clause of a nested query that relate to attributes of the tables in the outer query. Through some examples we will discover which operators use this construct, how to appropriately define correlations, and its alternatives.

DBMS

SQL: Common Table Expression

Expressing complex queries in SQL language can be really difficult. Dividing the problem into mini-queries is the best strategy to get the desired results quickly without running into errors. Common Table Expressions, also called CTEs, allow the queries of our interest to be defined once to be reused multiple times when needed. Their clean syntax increases the readability of SQL scripts, allows us to debug written code and to create very complex logic to filter and extract data. Through some examples we will discover their potential.

Leave a Reply

Your email address will not be published. Required fields are marked *

Design with MongoDB

Design with MongoDB!!!

Buy the new book that will help you to use MongoDB correctly for your applications. Available now on Amazon!